Free Access to Databricks.Databricks-Certified-Professional-Data-Engineer.v2024-05-28.q108 with Valid Practice Test (Page 20)

Question 91

At the end of the inventory process, a file gets uploaded to the cloud object storage, you are asked to build a process to ingest data which of the following method can be used to ingest the data in-crementally, schema of the file is expected to change overtime ingestion process should be able to handle these changes automatically.
Below is the auto loader to command to load the data, fill in the blanks for successful execution of below code.
1.spark.readStream
2..format("cloudfiles")
3..option("_______","csv)
4..option("_______", 'dbfs:/location/checkpoint/')
5..load(data_source)
6..writeStream
7..option("_______",' dbfs:/location/checkpoint/')
8..option("_______", "true")
9..table(table_name))

A.format, checkpointlocation, schemalocation, overwrite
B.cloudfiles.format, checkpointlocation, cloudfiles.schemalocation, overwrite
C.cloudfiles.format, cloudfiles.schemalocation, checkpointlocation, mergeSchema
D.cloudfiles.format, cloudfiles.schemalocation, checkpointlocation, overwrite
E.cloudfiles.format, cloudfiles.schemalocation, checkpointlocation, append

Question 92

Data engineering team is required to share the data with Data science team and both the teams are using different workspaces in the same organizationwhich of the following techniques can be used to simplify sharing data across?
*Please note the question is asking how data is shared within an organization across multiple workspaces.

A.Data Sharing
B.Unity Catalog
C.DELTA lake
D.Use a single storage location
E.DELTA LIVE Pipelines

Question 93

A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.
The proposed directory structure is displayed below:
Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

A.No; Delta Lake manages streaming checkpoints in the transaction log.
B.Yes; both of the streams can share a single checkpoint directory.
C.No; only one stream can write to a Delta Lake table.
D.Yes; Delta Lake supports infinite concurrent writers.
E.No; each of the streams needs to have its own checkpoint directory.

Question 94

Which statement characterizes the general programming model used by Spark Structured Streaming?

A.Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.
B.Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.
C.Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.
D.Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.
E.Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.

Question 95

What is the best way to describe a data lakehouse compared to a data warehouse?

A.A data lakehouse provides a relational system of data management
B.A data lakehouse captures snapshots of data for version control purposes.
C.A data lakehouse couples storage and compute for complete control.
D.A data lakehouse utilizes proprietary storage formats for data.
E.A data lakehouse enables both batch and streaming analytics.

Correct Answer: E

Explanation
Anser is A data lakehouse enables both batch and streaming analytics.
A lakehouse has the following key features:
*Transaction support: In an enterprise lakehouse many data pipelines will often be reading and writing data concurrently. Support for ACID transactions ensures consistency as multi-ple parties concurrently read or write data, typically using SQL.
*Schema enforcement and governance: The Lakehouse should have a way to support schema enforcement and evolution, supporting DW schema architectures such as star/snowflake-schemas. The system should be able to reason about data integrity, and it should have robust governance and auditing mechanisms.
*BI support: Lakehouses enable using BI tools directly on the source data. This reduces staleness and improves recency, reduces latency, and lowers the cost of having to operationalize two copies of the data in both a data lake and a warehouse.
*Storage is decoupled from compute: In practice this means storage and compute use sepa-rate clusters, thus these systems are able to scale to many more concurrent users and larger data sizes. Some modern data warehouses also have this property.
*Openness: The storage formats they use are open and standardized, such as Parquet, and they provide an API so a variety of tools and engines, including machine learning and Py-thon/R libraries, can efficiently access the data directly.
*Support for diverse data types ranging from unstructured to structured data: The lakehouse can be used to store, refine, analyze, and access data types needed for many new data applications, including images, video, audio, semi-structured data, and text.
*Support for diverse workloads: including data science, machine learning, and SQL and analytics. Multiple tools might be needed to support all these workloads but they all rely on the same data repository.
*End-to-end streaming: Real-time reports are the norm in many enterprises. Support for streaming eliminates the need for separate systems dedicated to serving real-time data applications.

Question 91

Question 92

Question 93

Question 94

Question 95

Download PDF File