Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-20.q72 with Valid Practice Test (Page 15)

Question 66

In the code block below, aggDF contains aggregations on a streaming DataFrame:

Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

A.complete
B.append
C.replace
D.aggregate

Question 67

A data scientist has identified that some records in the user profile table contain null values in any of the fields, and such records should be removed from the dataset before processing. The schema includes fields like user_id, username, date_of_birth, created_ts, etc.
The schema of the user profile table looks like this:

Which block of Spark code can be used to achieve this requirement?
Options:

A.filtered_df = users_raw_df.na.drop(thresh=0)
B.filtered_df = users_raw_df.na.drop(how='all')
C.filtered_df = users_raw_df.na.drop(how='any')
D.filtered_df = users_raw_df.na.drop(how='all', thresh=None)

Question 68

40 of 55.
A developer wants to refactor older Spark code to take advantage of built-in functions introduced in Spark 3.5.
The original code:
from pyspark.sql import functions as F
min_price = 110.50
result_df = prices_df.filter(F.col("price") > min_price).agg(F.count("*")) Which code block should the developer use to refactor the code?

A.result_df = prices_df.filter(F.col("price") > F.lit(min_price)).agg(F.count("*"))
B.result_df = prices_df.where(F.lit("price") > min_price).groupBy().count()
C.result_df = prices_df.withColumn("valid_price", when(col("price") > F.lit(min_price), True))
D.result_df = prices_df.filter(F.lit(min_price) > F.col("price")).count()

Question 69

A Spark application is experiencing performance issues in client mode because the driver is resource- constrained.
How should this issue be resolved?

A.Add more executor instances to the cluster
B.Increase the driver memory on the client machine
C.Switch the deployment mode to cluster mode
D.Switch the deployment mode to local mode

Question 70

10 of 55.
What is the benefit of using Pandas API on Spark for data transformations?

A.It executes queries faster using all the available cores in the cluster as well as provides Pandas's rich set of features.
B.It is available only with Python, thereby reducing the learning curve.
C.It runs on a single node only, utilizing memory efficiently.
D.It computes results immediately using eager execution.

Question 66

Question 67

Question 68

Question 69

Question 70

Download PDF File