Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-20.q72 with Valid Practice Test (Page 5)

Question 16

Given the code fragment:

import pyspark.pandas as ps
psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?

A.psdf.to_spark()
B.psdf.to_pyspark()
C.psdf.to_pandas()
D.psdf.to_dataframe()

Question 17

A data scientist is analyzing a large dataset and has written a PySpark script that includes several transformations and actions on a DataFrame. The script ends with acollect()action to retrieve the results.
How does Apache Spark™'s execution hierarchy process the operations when the data scientist runs this script?

A.The script is first divided into multiple applications, then each application is split into jobs, stages, and finally tasks.
B.The entire script is treated as a single job, which is then divided into multiple stages, and each stage is further divided into tasks based on data partitions.
C.Thecollect()action triggers a job, which is divided into stages at shuffle boundaries, and each stage is split into tasks that operate on individual data partitions.
D.Spark creates a single task for each transformation and action in the script, and these tasks are grouped into stages and jobs based on their dependencies.

Question 18

A data engineer is streaming data from Kafka and requires:
Minimal latency
Exactly-once processing guarantees
Which trigger mode should be used?

A..trigger(processingTime='1 second')
B..trigger(continuous=True)
C..trigger(continuous='1 second')
D..trigger(availableNow=True)

Question 19

45 of 55.
Which feature of Spark Connect should be considered when designing an application that plans to enable remote interaction with a Spark cluster?

A.It is primarily used for data ingestion into Spark from external sources.
B.It provides a way to run Spark applications remotely in any programming language.
C.It can be used to interact with any remote cluster using the REST API.
D.It allows for remote execution of Spark jobs.

Question 20

A developer initializes a SparkSession:

spark = SparkSession.builder \
.appName("Analytics Application") \
.getOrCreate()
Which statement describes the spark SparkSession?

A.The getOrCreate() method explicitly destroys any existing SparkSession and creates a new one.
B.A SparkSession is unique for each appName, and calling getOrCreate() with the same name will return an existing SparkSession once it has been created.
C.If a SparkSession already exists, this code will return the existing session instead of creating a new one.
D.A new SparkSession is created every time the getOrCreate() method is invoked.

Question 16

Question 17

Question 18

Question 19

Question 20

Download PDF File