Free Access to Cloudera.CDP-3002.v2025-09-26.q117 with Valid Practice Test (Page 20)

Question 91

In a PySpark application running on Kubernetes, you want to enable dynamic allocation of Executors. Which configuration setting is essential to turn on this feature?

A.'spark.dynamicAllocation.enabled'
B.'spark.kubernetes.dynamicAllocation.enabled'
C.'spark.kubernetes.executor.dynamicAllocation'
D.'spark.executor.instances'

Question 92

You encounter an error message stating "Schema mismatch" when joining two DataFrames in Spark. What could be the potential causes and how can you resolve them?

A.The DataFrames have different column names but the same data types
B.The DataFrames have the same column names but different data types
C.The join condition references non-existent columns in one of the DataFrames
D.All of the above

Question 93

How does Airflow handle task dependencies?

A.By using the ExternalTaskSensor to pause execution until an external condition is met
B.By manually triggering tasks in the correct order
C.By specifying dependencies through the depends_on_past parameter in task definitions
D.By using the set_upstream() or set_downstream() methods, or the bitshift operators (]] and [[)

Question 94

You need to filter data from a Hive table based on a specific date range. Which approach would be most efficient and maintainable?

A.Use Spark SQL functions like filter with a date comparison expression
B.Convert the Hive table to a temporary table and then use Spark SQL filtering
C.Leverage HiveQL's built-in filtering capabilities with a WHERE clause
D.Implement a custom filter function in Spark to process each row individually

Question 95

You are working on a project that involves processing large datasets stored in HDFS. You need to read a CSV file into a DataFrame using PySpark. Which of the following code snippets correctly achieves this?

A.
B.
C.
D.

Question 91

Question 92

Question 93

Question 94

Question 95

Download PDF File