Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-20.q72 with Valid Practice Test (Page 16)

Question 71

42 of 55.
A developer needs to write the output of a complex chain of Spark transformations to a Parquet table called events.liveLatest.
Consumers of this table query it frequently with filters on both year and month of the event_ts column (a timestamp).
The current code:
from pyspark.sql import functions as F
final = df.withColumn("event_year", F.year("event_ts")) \
.withColumn("event_month", F.month("event_ts")) \
.bucketBy(42, ["event_year", "event_month"]) \
.saveAsTable("events.liveLatest")
However, consumers report poor query performance.
Which change will enable efficient querying by year and month?

A.Replace .bucketBy() with .partitionBy("event_year", "event_month")
B.Change the bucket count (42) to a lower number
C.Add .sortBy() after .bucketBy()
D.Replace .bucketBy() with .partitionBy("event_year") only

Question 72

A data engineer needs to persist a file-based data source to a specific location. However, by default, Spark writes to the warehouse directory (e.g., /user/hive/warehouse). To override this, the engineer must explicitly define the file path.
Which line of code ensures the data is saved to a specific location?
Options:

A.users.write(path="/some/path").saveAsTable("default_table")
B.users.write.saveAsTable("default_table").option("path", "/some/path")
C.users.write.option("path", "/some/path").saveAsTable("default_table")
D.users.write.saveAsTable("default_table", path="/some/path")

Question 71

Question 72

Download PDF File