How can Retrieval Augmented Generation (RAG) help developers to build a trustworthy AI system?
Correct Answer: D
Retrieval-Augmented Generation (RAG) enhances trustworthy AI by generating responses that cite reference material from an external knowledge base, ensuring transparency and verifiability, as discussed in NVIDIA's Generative AI and LLMs course. RAG combines a retriever to fetch relevant documents with a generator to produce responses, allowing outputs to be grounded in verifiable sources, reducing hallucinations and improving trust. Option A is incorrect, as RAG does not focus on security features like confidential computing. Option B is wrong, as RAG is unrelated to energy efficiency. Option C is inaccurate, as RAG does not align models but integrates retrieved knowledge. The course notes: "RAG enhances trustworthy AI by generating responses with citations from external knowledge bases, improving transparency and verifiability of outputs." References: NVIDIA Building Transformer-Based Natural Language Processing Applications course; NVIDIA Introduction to Transformer-Based Natural Language Processing.
Question 37
In the context of data preprocessing for Large Language Models (LLMs), what does tokenization refer to?
Correct Answer: A
Tokenization is the process of splitting text into smaller units, such as words, subwords, or characters, which serve as the basic units for processing by LLMs. NVIDIA's NeMo documentation on NLP preprocessing explains that tokenization is a critical step in preparing text data, with popular tokenizers (e.g., WordPiece, BPE) breaking text into subword units to handle out-of-vocabulary words and improve model efficiency. For example, the sentence "I love AI" might be tokenized into ["I", "love", "AI"] or subword units like ["I", "lov", "##e", "AI"]. Option B (numerical representations) refers to embedding, not tokenization. Option C (removing stop words) is a separate preprocessing step. Option D (data augmentation) is unrelated to tokenization. References: NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp /intro.html
Question 38
When designing prompts for a large language model to perform a complex reasoning task, such as solving a multi-step mathematical problem, which advanced prompt engineering technique is most effective in ensuring robust performance across diverse inputs?
Correct Answer: C
Chain-of-thought (CoT) prompting is an advanced prompt engineering technique that significantly enhances a large language model's (LLM) performance on complex reasoning tasks, such as multi-step mathematical problems. By including examples that explicitly demonstrate step-by-step reasoning in the prompt, CoT guides the model to break down the problem into intermediate steps, improving accuracy and robustness. NVIDIA's NeMo documentation on prompt engineering highlights CoT as a powerful method for tasks requiring logical or sequential reasoning, as it leverages the model's ability to mimic structured problem- solving. Research by Wei et al. (2022) demonstrates that CoT outperforms other methods for mathematical reasoning. Option A (zero-shot) is less effective for complex tasks due to lack of guidance. Option B (few- shot with random examples) is suboptimal without structured reasoning. Option D (RAG) is useful for factual queries but less relevant for pure reasoning tasks. References: NVIDIA NeMo Documentation: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/intro.html Wei, J., et al. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models."