Free Access to Cloudera.CDP-3002.v2025-11-21.q109 with Valid Practice Test (Page 15)

Question 66

Consider the following code snippet:# Sample DataFrame (assuming it exists) df = spark.createDataFrame(...)
# Attempt to add a new column with a case-when expression (fix the error) df = df.withColumn("category", F.when(df["price"] ] 100, "Expensive").otherwise("Cheap")) df.show() What is the error in this code, and how can it be fixed?

A.The error is missing parentheses around the conditions in the when function. Fix: F.when((df["price"] ] 100), "Expensive").otherwise("Cheap")
B.The error is using the wrong syntax for case-when expressions. Fix: Use SQL-like syntax with CASE WHEN and END.
C.The error is attempting to modify the original DataFrame in-place. Fix: Use df.withColumn to create a new DataFrame with the added column.
D.There is no error in the code snippet.

Question 67

What are the potential trade-offs to consider when using checkpointing in Spark applications?

A.Checkpointing always improves performance and has no drawbacks
B.Checkpointing introduces overhead for storing and recovering data, impacting performance
C.Checkpointing requires manual configuration and can be error-prone
D.All of the above

Question 68

You notice degraded read performance on an Iceberg table after many updates and deletes. What maintenance task should you perform to improve this?

A.Rewrite manifest files
B.Compact data files
C.Delete old snapshots
D.Rebuild the Iceberg metadata table

Question 69

Which of the following strategies would NOT be recommended for managing skewed data during join operations in Spark?

A.Salting the keys to distribute the data more evenly across partitions.
B.Using a broadcast join assuming one dataset is small enough to fit into memory.
C.Applying a filter to remove outliers that cause data skewness before joining.
D.Increasing the number of partitions to distribute the skewed data more evenly.

Question 70

You need to create a new Hive table from a Spark DataFrame. What are the different approaches you can consider?

A.Directly write the DataFrame to a directory in HDFS and define a corresponding Hive table schema
B.Use the DataFrame.write.saveAsTable("table_name") method with appropriate options
C.Convert the DataFrame to a temporary table and then use HiveQL commands to create a permanent table
D.All of the above

Question 66

Question 67

Question 68

Question 69

Question 70

Download PDF File