Free Access to Databricks.Databricks-Certified-Professional-Data-Engineer.v2025-10-27.q109 with Valid Practice Test (Page 7)

Question 26

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.
A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production dat a. Because all users have Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.
Which statement captures best practices for this situation?

A.Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.
B.All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.
C.In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.
D.Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

Question 27

You are working on a dashboard that takes a long time to load in the browser, due to the fact that each visualization contains a lot of data to populate, which of the following approaches can be taken to address this issue?

A.Increase size of the SQL endpoint cluster
B.Increase the scale of maximum range of SQL endpoint cluster
C.Use Databricks SQL Query filter to limit the amount of data in each visualization
D.Remove data from Delta Lake
E.Use Delta cache to store the intermediate results

Question 28

A data engineer wants to create a cluster using the Databricks CLI for a big ETL pipeline. The cluster should havefive workers,one driverof type i3.xlarge, and should use the '14.3.x-scala2.12' runtime.
Which command should the data engineer use?

A.databricks clusters create 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name DataEngineer_cluster
B.databricks clusters add 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster
C.databricks compute add 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster
D.databricks compute create 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

Question 29

The following table consists of items found in user carts within an e-commerce website.

The following MERGE statement is used to update this table using an updates view, with schema evaluation enabled on this table.

How would the following update be handled?

A.The update is moved to separate ''restored'' column because it is missing a column expected in the target schema.
B.The new restored field is added to the target schema, and dynamically read as NULL for existing unmatched records.
C.The update throws an error because changes to existing columns in the target schema are not supported.
D.The new nested field is added to the target schema, and files underlying existing records are updated to include NULL values for the new field.

Question 30

A junior data engineer is working to implement logic for a Lakehouse table namedsilver_device_recordings.
The source data contains 100 unique fields in a highly nested JSON structure.
Thesilver_device_recordingstable will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the 100 fields are being used in at least one of these applications.
The data engineer is trying to determine the best approach for dealing with schema declaration given the highly-nested structure of the data and the numerous fields.
Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?

A.The Tungsten encoding used by Databricks is optimized for storing string data; newly-added native support for querying JSON strings means that string types are always most efficient.
B.Because Delta Lake uses Parquet for data storage, data types can be easily evolved by just modifying file footer information in place.
C.Human labor in writing code is the largest cost associated with data engineering workloads; as such, automating table declaration logic should be a priority in all migration workloads.
D.Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.
E.Schema inference and evolution on .Databricks ensure that inferred types will always accurately match the data types used by downstream systems.

Question 26

Question 27

Question 28

Question 29

Question 30

Download PDF File