Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-20.q72 with Valid Practice Test (Page 7)

Question 26

A Data Analyst needs to retrieve employees with 5 or more years of tenure.
Which code snippet filters and shows the list?

A.employees_df.filter(employees_df.tenure >= 5).show()
B.employees_df.where(employees_df.tenure >= 5)
C.filter(employees_df.tenure >= 5)
D.employees_df.filter(employees_df.tenure >= 5).collect()

Question 27

A data engineer needs to persist a file-based data source to a specific location. However, by default, Spark writes to the warehouse directory (e.g., /user/hive/warehouse). To override this, the engineer must explicitly define the file path.
Which line of code ensures the data is saved to a specific location?
Options:

A.users.write(path="/some/path").saveAsTable("default_table")
B.users.write.saveAsTable("default_table").option("path", "/some/path")
C.users.write.option("path", "/some/path").saveAsTable("default_table")
D.users.write.saveAsTable("default_table", path="/some/path")

Question 28

A data engineer observes that an upstream streaming source sends duplicate records, where duplicates share the same key and have at most a 30-minute difference inevent_timestamp. The engineer adds:
dropDuplicatesWithinWatermark("event_timestamp", "30 minutes")
What is the result?

A.It is not able to handle deduplication in this scenario
B.It removes duplicates that arrive within the 30-minute window specified by the watermark
C.It removes all duplicates regardless of when they arrive
D.It accepts watermarks in seconds and the code results in an error

Question 29

An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.
The initial code is:

def in_spanish_inner(df: pd.Series) -> pd.Series:
model = get_translation_model(target_lang='es')
return df.apply(model)
in_spanish = sf.pandas_udf(in_spanish_inner, StringType())
How can the MLOps engineer change this code to reduce how many times the language model is loaded?

A.Convert the Pandas UDF to a PySpark UDF
B.Convert the Pandas UDF from a Series → Series UDF to a Series → Scalar UDF
C.Run the in_spanish_inner() function in a mapInPandas() function call
D.Convert the Pandas UDF from a Series → Series UDF to an Iterator[Series] → Iterator[Series] UDF

Question 30

A Spark application developer wants to identify which operations cause shuffling, leading to a new stage in the Spark execution plan.
Which operation results in a shuffle and a new stage?

A.DataFrame.groupBy().agg()
B.DataFrame.filter()
C.DataFrame.withColumn()
D.DataFrame.select()

Question 26

Question 27

Question 28

Question 29

Question 30

Download PDF File