Free Access to Databricks.Databricks-Certified-Professional-Data-Engineer.v2024-05-28.q108 with Valid Practice Test (Page 15)

Question 66

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.
Which code block accomplishes this task while minimizing potential compute costs?

A.
B.
C.preds.write.format("delta").save("/preds/churn_preds")
D.
E.preds.write.mode("append").saveAsTable("churn_preds")

Question 67

The viewupdatesrepresents an incremental batch of all newly ingested data to be inserted or updated in the customerstable.
The following logic is used to process these records.

Which statement describes this implementation?

A.The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.
B.The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.
C.The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.
D.The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.
E.The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

Question 68

A data engineering team is in the process of converting their existing data pipeline to utilize Auto Loader for
incremental processing in the ingestion of JSON files. One data engineer comes across the following code
block in the Auto Loader documentation:
1. (streaming_df = spark.readStream.format("cloudFiles")
2. .option("cloudFiles.format", "json")
3. .option("cloudFiles.schemaLocation", schemaLocation)
4. .load(sourcePath))
Assuming that schemaLocation and sourcePath have been set correctly, which of the following changes does
the data engineer need to make to convert this code block to use Auto Loader to ingest the data?

A.The data engineer needs to change the format("cloudFiles") line to format("autoLoader")
B.There is no change required. Databricks automatically uses Auto Loader for streaming reads
C.There is no change required. The inclusion of format("cloudFiles") enables the use of Auto Loader
D.The data engineer needs to add the .autoLoader line before the .load(sourcePath) line
E.There is no change required. The data engineer needs to ask their administrator to turn on Auto Loader

Question 69

Which of the following describes a scenario in which a data engineer will want to use a Job cluster instead of
an all-purpose cluster?

A.An ad-hoc analytics report needs to be developed while minimizing compute costs
B.A Databricks SQL query needs to be scheduled for upward reporting
C.A data team needs to collaborate on the development of a machine learning model
D.An automated workflow needs to be run every 30 minutes
E.A data engineer needs to manually investigate a production error

Question 70

Which of the following statement is true about Databricks repos?

A.You can approve the pull request if you are the owner of Databricks repos
B.A workspace can only have one instance of git integration
C.Databricks Repos and Notebook versioning are the same features
D.You cannot create a new branch in Databricks repos
E.Databricks repos allow you to comment and commit code changes and push them to a remote branch

Question 66

Question 67

Question 68

Question 69

Question 70

Download PDF File