Free Access to Cloudera.CDP-3002.v2025-09-26.q117 with Valid Practice Test (Page 7)

Question 26

What impact does the Spark configuration parameter spark.network.timeout have on Spark streaming applications?

A.It specifies the timeout for network operations such as data shuffle and broadcast.
B.It controls the maximum time a Spark streaming application will wait for data before timing out.
C.It sets the timeout for Akka-based network operations, crucial for Spark streaming's stability.
D.It determines the timeout for connections between executors and the Spark driver, impacting task retries and executor loss detection.

Question 27

You need to design an Airflow DAG that waits for a specific file to become available before proceeding with the downstream tasks. How can you achieve this dependency?

A.Use the File sensor operator to check for the file's existence and trigger downstream tasks upon its arrival.
B.Implement a custom loop within a Python operator to continuously check for the file until it appears.
C.Configure the source system to notify Airflow when the file is ready for processing.
D.Schedule the DAG to run periodically, hoping the file becomes available eventually.

Question 28

In the context of Hive, what mechanism ensures that data is evenly distributed across buckets?

A.Natural key distribution
B.Manual data insertion scripts
C.A hash function applied to the bucketing column
D.External data balancing tools

Question 29

You're building a Spark application that involves complex iterative data processing. Which option allows you to efficiently access and update intermediate results between iterations?

A.Store intermediate results in temporary tables using Spark SQL
B.Leverage Spark's in-memory caching capabilities with rdd.cache()
C.Implement custom data structures for managing intermediate data
D.Use Spark's broadcast variables for frequently accessed data across iterations

Question 30

Explain the concept of lineage tracking in Spark and its benefits for fault tolerance and debugging.

A.Lineage tracks the dependencies between data transformations, enabling efficient re-execution on failures.
B.It creates a log of all operations performed on data, aiding in debugging issues.
C.Lineage allows for caching intermediate results, improving performance.
D.Both A and B.

Question 26

Question 27

Question 28

Question 29

Question 30

Download PDF File