Free Access to GAQM.Databricks-Certified-Data-Engineer-Associate.v2024-09-16.q91 with Valid Practice Test (Page 10)

Question 41

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

A.trigger(once="5 seconds")
B.trigger(continuous="5 seconds")
C.trigger("5 seconds")
D.trigger(processingTime="5 seconds")
E.trigger()

Question 42

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.
Which of the following tools can the data engineer use to solve this problem?

A.Unity Catalog
B.Data Explorer
C.Delta Lake
D.Delta Live Tables
E.Auto Loader

Question 43

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.
Which of the following code blocks successfully completes this task?

A.Option C
B.Option B
C.Option A
D.Option D
E.Option E

Question 44

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.
They have the following incomplete code block:
____(f"SELECT customer_id, spend FROM {table_name}")
Which of the following can be used to fill in the blank to successfully complete the task?

A.spark.delta.sql
B.spark.delta.table
C.spark.table
D.dbutils.sql
E.spark.sql

Question 45

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.
Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.
B.They can set up the dashboard's SQL endpoint to be serverless.
C.They can turn on the Auto Stop feature for the SQL endpoint.
D.They can reduce the cluster size of the SQL endpoint.
E.They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.

Question 41

Question 42

Question 43

Question 44

Question 45

Download PDF File