Free Access to Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-20.q72 with Valid Practice Test (Page 3)

Question 6

15 of 55.
A data engineer is working on a Streaming DataFrame (streaming_df) with the following streaming data:
id
name
count
timestamp
1
Delhi
20
2024-09-19T10:11
1
Delhi
50
2024-09-19T10:12
2
London
50
2024-09-19T10:15
3
Paris
30
2024-09-19T10:18
3
Paris
20
2024-09-19T10:20
4
Washington
10
2024-09-19T10:22
Which operation is supported with streaming_df?

A.streaming_df.count()
B.streaming_df.filter("count < 30")
C.streaming_df.select(countDistinct("name"))
D.streaming_df.show()

Question 7

25 of 55.
A Data Analyst is working on employees_df and needs to add a new column where a 10% tax is calculated on the salary.
Additionally, the DataFrame contains the column age, which is not needed.
Which code fragment adds the tax column and removes the age column?

A.employees_df = employees_df.withColumn("tax", col("salary") * 0.1).drop("age")
B.employees_df = employees_df.withColumn("tax", lit(0.1)).drop("age")
C.employees_df = employees_df.dropField("age").withColumn("tax", col("salary") * 0.1)
D.employees_df = employees_df.withColumn("tax", col("salary") + 0.1).drop("age")

Question 8

Which command overwrites an existing JSON file when writing a DataFrame?

A.df.write.mode("overwrite").json("path/to/file")
B.df.write.overwrite.json("path/to/file")
C.df.write.json("path/to/file", overwrite=True)
D.df.write.format("json").save("path/to/file", mode="overwrite")

Question 9

A developer initializes a SparkSession:

spark = SparkSession.builder \
.appName("Analytics Application") \
.getOrCreate()
Which statement describes thesparkSparkSession?

A.ThegetOrCreate()method explicitly destroys any existing SparkSession and creates a new one.
B.A SparkSession is unique for eachappName, and callinggetOrCreate()with the same name will return an existing SparkSession once it has been created.
C.If a SparkSession already exists, this code will return the existing session instead of creating a new one.
D.A new SparkSession is created every time thegetOrCreate()method is invoked.

Question 10

What is the risk associated with this operation when converting a large Pandas API on Spark DataFrame back to a Pandas DataFrame?

A.The conversion will automatically distribute the data across worker nodes
B.The operation will fail if the Pandas DataFrame exceeds 1000 rows
C.Data will be lost during conversion
D.The operation will load all data into the driver's memory, potentially causing memory overflow

Question 6

Question 7

Question 8

Question 9

Question 10

Download PDF File