Free Access to Databricks.Databricks-Certified-Professional-Data-Engineer.v2025-10-27.q109 with Valid Practice Test (Page 5)

Question 16

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.
When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

A.The five Minute Load Average remains consistent/flat
B.Bytes Received never exceeds 80 million bytes per second
C.Total Disk Space remains constant
D.Network I/O never spikes
E.Overall cluster CPU utilization is around 25%

Question 17

Which of the following is a correct statement on how the data is organized in the storage when when managing a DELTA table?

A.All of the data is broken down into one or many parquet files, log files are broken down into one or many JSON files, and each transaction creates a new data file(s) and log file.
(Correct)
B.All of the data and log are stored in a single parquet file
C.All of the data is broken down into one or many parquet files, but the log file is stored as a single json file, and every transaction creates a new data file(s) and log file gets appended.
D.All of the data is broken down into one or many parquet files, log file is removed once the transaction is committed.
E.All of the data is stored into one parquet file, log files are broken down into one or many json files.

Question 18

The data governance team is reviewing user for deleting records for compliance with GDPR. The following logic has been implemented to propagate deleted requests from the user_lookup table to the user aggregate table.

Assuming that user_id is a unique identifying key and that all users have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?

A.No: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.
B.Yes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.
C.No: the change data feed only tracks inserts and updates not deleted records.
D.No: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

Question 19

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

A.* Total VMs; 1
* 400 GB per Executor
* 160 Cores / Executor
B.* Total VMs: 8
* 50 GB per Executor
* 20 Cores / Executor
C.* Total VMs: 4
* 100 GB per Executor
* 40 Cores/Executor
D.* Total VMs:2
* 200 GB per Executor
* 80 Cores / Executor

Question 20

Which statement characterizes the general programming model used by Spark Structured Streaming?

A.Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.
B.Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.
C.Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.
D.Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.
E.Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.

Question 16

Question 17

Question 18

Question 19

Question 20

Download PDF File