Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-08-30.q31 with Valid Practice Test (Page 7)

Question 26

Given this code:

.withWatermark("event_time","10 minutes")
.groupBy(window("event_time","15 minutes"))
.count()
What happens to data that arrives after the watermark threshold?
Options:

A.Records that arrive later than the watermark threshold (10 minutes) will automatically be included in the aggregation if they fall within the 15-minute window.
B.Any data arriving more than 10 minutes after the watermark threshold will be ignored and not included in the aggregation.
C.Data arriving more than 10 minutes after the latest watermark will still be included in the aggregation but will be placed into the next window.
D.The watermark ensures that late data arriving within 10 minutes of the latest event_time will be processed and included in the windowed aggregation.

Question 27

A developer initializes a SparkSession:

spark = SparkSession.builder \
.appName("Analytics Application") \
.getOrCreate()
Which statement describes thesparkSparkSession?

A.ThegetOrCreate()method explicitly destroys any existing SparkSession and creates a new one.
B.A SparkSession is unique for eachappName, and callinggetOrCreate()with the same name will return an existing SparkSession once it has been created.
C.If a SparkSession already exists, this code will return the existing session instead of creating a new one.
D.A new SparkSession is created every time thegetOrCreate()method is invoked.

Question 28

Given the code fragment:

import pyspark.pandas as ps
psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?

A.psdf.to_spark()
B.psdf.to_pyspark()
C.psdf.to_pandas()
D.psdf.to_dataframe()

Question 29

A data scientist of an e-commerce company is working with user data obtained from its subscriber database and has stored the data in a DataFrame df_user. Before further processing the data, the data scientist wants to create another DataFrame df_user_non_pii and store only the non-PII columns in this DataFrame. The PII columns in df_user are first_name, last_name, email, and birthdate.
Which code snippet can be used to meet this requirement?

A.df_user_non_pii = df_user.drop("first_name", "last_name", "email", "birthdate")
B.df_user_non_pii = df_user.drop("first_name", "last_name", "email", "birthdate")
C.df_user_non_pii = df_user.dropfields("first_name", "last_name", "email", "birthdate")
D.df_user_non_pii = df_user.dropfields("first_name, last_name, email, birthdate")

Question 30

Which command overwrites an existing JSON file when writing a DataFrame?

A.df.write.mode("overwrite").json("path/to/file")
B.df.write.overwrite.json("path/to/file")
C.df.write.json("path/to/file", overwrite=True)
D.df.write.format("json").save("path/to/file", mode="overwrite")

Question 26

Question 27

Question 28

Question 29

Question 30

Download PDF File