Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-08-30.q31 with Valid Practice Test (Page 2)

Question 1

A data engineer is working ona Streaming DataFrame streaming_df with the given streaming data:

Which operation is supported with streaming_df?

A.streaming_df.select(countDistinct("Name"))
B.streaming_df.groupby("Id").count()
C.streaming_df.orderBy("timestamp").limit(4)
D.streaming_df.filter(col("count") < 30).show()

Question 2

A Spark engineer must select an appropriate deployment mode for the Spark jobs.
What is the benefit of using cluster mode in Apache Spark™?

A.In cluster mode, resources are allocated from a resource manager on the cluster, enabling better performance and scalability for large jobs
B.In cluster mode, the driver is responsible for executing all tasks locally without distributing them across the worker nodes.
C.In cluster mode, the driver runs on the client machine, which can limit the application's ability to handle large datasets efficiently.
D.In cluster mode, the driver program runs on one of the worker nodes, allowing the application to fully utilize the distributed resources of the cluster.

Question 3

A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set forspark.sql.
adaptive.maxShuffledHashJoinLocalMapThreshold.
Which type of join will Adaptive Query Execution (AQE) choose in this case?

A.A Cartesian join
B.A shuffled hash join
C.A broadcast nested loop join
D.A sort-merge join

Question 4

A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.
Which combination of Apache Spark modules should the data scientist use in this scenario?
Options:

A.Spark DataFrames, Structured Streaming, and GraphX
B.Spark SQL, Pandas API on Spark, and Structured Streaming
C.Spark Streaming, GraphX, and Pandas API on Spark
D.Spark DataFrames, Spark SQL, and MLlib

Question 5

A data scientist is analyzing a large dataset and has written a PySpark script that includes several transformations and actions on a DataFrame. The script ends with acollect()action to retrieve the results.
How does Apache Spark™'s execution hierarchy process the operations when the data scientist runs this script?

A.The script is first divided into multiple applications, then each application is split into jobs, stages, and finally tasks.
B.The entire script is treated as a single job, which is then divided into multiple stages, and each stage is further divided into tasks based on data partitions.
C.Thecollect()action triggers a job, which is divided into stages at shuffle boundaries, and each stage is split into tasks that operate on individual data partitions.
D.Spark creates a single task for each transformation and action in the script, and these tasks are grouped into stages and jobs based on their dependencies.

Question 1

Question 2

Question 3

Question 4

Question 5

Download PDF File