Free Access to Databricks.Databricks-Certified-Professional-Data-Engineer.v2024-05-28.q108 with Valid Practice Test (Page 17)

Question 76

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

A.In the Executor's log file, by grippingfor "predicate push-down"
B.In the Stage's Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
C.In the Storage Detail screen, by noting which RDDs are not stored on disk
D.In the Delta Lake transaction log. by noting the column statistics
E.In the Query Detail screen, by interpreting the Physical Plan

Question 77

Which statement describes Delta Lake Auto Compaction?

A.An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 GB.
B.Before a Jobs cluster terminates, optimize is executed on all tables modified during the most recent job.
C.Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
D.Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
E.An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 128 MB.

Question 78

A table is registered with the following code:

Bothusersandordersare Delta Lake tables. Which statement describes the results of queryingrecent_orders?

A.Results will be computed and cached when the table is defined; these cached results will incrementally update as new records are inserted into source tables.
B.All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.
C.All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.
D.The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.
E.All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.

Question 79

Operations team is using a centralized data quality monitoring system, a user can publish data quality metrics through a webhook, you were asked to develop a process to send messages using a webhook if there is atleast one duplicate record, which of the following approaches can be taken to integrate an alert with current data quality monitoring system

A.Use notebook and Jobs to use python to publish DQ metrics
B.Setup an alert to send an email, use python to parse email, and publish a webhook message
C.Setup an alert with custom template
D.Setup an alert with custom Webhook destination
E.Setup an alert with dynamic template

Question 80

When investigating a data issue you realized that a process accidentally updated the table, you want to query the same table with yesterday's version of the data so you can review what the prior version looks like, what is the best way to query historical data so you can do your analysis?

A.SELECT * FROM TIME_TRAVEL(table_name) WHERE time_stamp = 'timestamp'
B.TIME_TRAVEL FROM table_name WHERE time_stamp = date_sub(current_date(), 1)
C.SELECT * FROM table_name TIMESTAMP AS OF date_sub(current_date(), 1)
D.DISCRIBE HISTORY table_name AS OF date_sub(current_date(), 1)
E.SHOW HISTORY table_name AS OF date_sub(current_date(), 1)

Question 76

Question 77

Question 78

Question 79

Question 80

Download PDF File