Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-20.q72 with Valid Practice Test (Page 9)

Question 36

A Spark engineer must select an appropriate deployment mode for the Spark jobs.
What is the benefit of using cluster mode in Apache Spark™?

A.In cluster mode, resources are allocated from a resource manager on the cluster, enabling better performance and scalability for large jobs
B.In cluster mode, the driver is responsible for executing all tasks locally without distributing them across the worker nodes.
C.In cluster mode, the driver runs on the client machine, which can limit the application's ability to handle large datasets efficiently.
D.In cluster mode, the driver program runs on one of the worker nodes, allowing the application to fully utilize the distributed resources of the cluster.

Question 37

2 of 55. Which command overwrites an existing JSON file when writing a DataFrame?

A.df.write.json("path/to/file")
B.df.write.mode("append").json("path/to/file")
C.df.write.option("overwrite").json("path/to/file")
D.df.write.mode("overwrite").json("path/to/file")

Question 38

A data scientist of an e-commerce company is working with user data obtained from its subscriber database and has stored the data in a DataFrame df_user. Before further processing the data, the data scientist wants to create another DataFrame df_user_non_pii and store only the non-PII columns in this DataFrame. The PII columns in df_user are first_name, last_name, email, and birthdate.
Which code snippet can be used to meet this requirement?

A.df_user_non_pii = df_user.drop("first_name", "last_name", "email", "birthdate")
B.df_user_non_pii = df_user.drop("first_name", "last_name", "email", "birthdate")
C.df_user_non_pii = df_user.dropfields("first_name", "last_name", "email", "birthdate")
D.df_user_non_pii = df_user.dropfields("first_name, last_name, email, birthdate")

Question 39

An engineer has a large ORC file located at/file/test_data.orcand wants to read only specific columns to reduce memory usage.
Which code fragment will select the columns, i.e.,col1,col2, during the reading process?

A.spark.read.orc("/file/test_data.orc").filter("col1 = 'value' ").select("col2")
B.spark.read.format("orc").select("col1", "col2").load("/file/test_data.orc")
C.spark.read.orc("/file/test_data.orc").selected("col1", "col2")
D.spark.read.format("orc").load("/file/test_data.orc").select("col1", "col2")

Question 40

Given this code:

.withWatermark("event_time","10 minutes")
.groupBy(window("event_time","15 minutes"))
.count()
What happens to data that arrives after the watermark threshold?
Options:

A.Records that arrive later than the watermark threshold (10 minutes) will automatically be included in the aggregation if they fall within the 15-minute window.
B.Any data arriving more than 10 minutes after the watermark threshold will be ignored and not included in the aggregation.
C.Data arriving more than 10 minutes after the latest watermark will still be included in the aggregation but will be placed into the next window.
D.The watermark ensures that late data arriving within 10 minutes of the latest event_time will be processed and included in the windowed aggregation.

Question 36

Question 37

Question 38

Question 39

Question 40

Download PDF File