Exam Code/Number: | NCA-GENMJoin the discussion |
Exam Name: | NVIDIA Generative AI Multimodal |
Certification: | NVIDIA |
Question Number: | 403 |
Publish Date: | Oct 16, 2025 |
Rating
100%
|
You're developing a multimodal model that takes both image and audio inputs to predict a relevant text description. You observe that the model is heavily biased towards the image data, effectively ignoring the audio input. Which of the following techniques could you employ to address this modality imbalance and ensure the model effectively utilizes both input modalities?
Consider the following code snippet intended to generate an image embedding using CLIP. What is the most likely reason for the 'RuntimeErroN?
You're tasked with building a generative A1 model for music composition. You have a large dataset of MIDl files, but the data is inconsistent in terms of tempo, key, and instrumentation. What are the crucial data transformation steps needed before training the model?
You are tasked with building a Generative A1 model to generate realistic images of outdoor scenes. The training dataset contains a large number of images with varying lighting conditions, weather conditions, and object compositions. Which data augmentation techniques would be MOST effective in improving the model's robustness and generalization ability?
You're building a system that takes a medical image (e.g., X-ray) and a patient's medical history (text) as input, predicting the likelihood of a specific disease. You want to use SHAP (SHapley Additive exPlanations) values to explain the model's predictions. How would you adapt SHAP to handle both image and text inputs effectively?