Free Access to Cloudera.CDP-3002.v2025-09-26.q117 with Valid Practice Test (Page 24)

Question 111

When creating a partitioned table in Hive, what does the clause PARTITIONED BY specify?

A.The compression algorithm used for data storage
B.The column(s) used to divide the table into partitions
C.The default file format for data storage
D.The replication factor for the HDFS data blocks

Question 112

You're working with a large DAG that contains numerous tasks and complex dependencies. How can you improve the DAG's readability and maintainability?

A.Utilize comments sparingly within the DAG code, as the logic should be self-explanatory.
B.Break down the DAG into smaller sub-DAGs with well-defined functionalities and clear naming conventions.
C.Use cryptic and abbreviated names for tasks and variables, assuming everyone understands the context.
D.Implement extensive logging within each task, regardless of its purpose, to capture detailed execution information.

Question 113

You want to use Spark to perform aggregations on data stored in Hive tables. How can you achieve this efficiently and seamlessly?

A.Write custom aggregation logic using Spark functions and loop through the entire DataFrame
B.Leverage Spark SQL's built-in aggregation functions like SUM and COUNT
C.Use HiveQL's aggregation capabilities and then convert the results back to a Spark DataFrame
D.Implement custom UDFs (User-Defined Functions) in Spark for complex aggregations

Question 114

In a PySpark application, you're writing a function that reads a CSV file and shows the first few rows. Which of the following code snippets correctly accomplishes this task?

A.Option A
B.Option B
C.Option C
D.Option D

Question 115

You're given a DataFrame containing information about flights, including columns "origin", "destination", and "delay_minutes". How can you find the top 5 origin airports with the most delayed flights on average?

A.Use groupBy and avg on "delay_minutes", then sort by the average in descending order and limit to top 5
B.Implement a custom function to calculate average delays for each origin and then sort and filter
C.Leverage Spark SQL's RANK function along with windowing to identify top 5 origins
D.Use Spark's machine learning library (MLIiB. for ranking and classification

Question 111

Question 112

Question 113

Question 114

Question 115

Download PDF File