Free Access to Databricks.Databricks-Certified-Professional-Data-Engineer.v2024-05-28.q108 with Valid Practice Test (Page 13)

Question 56

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.
In which location can one review the timeline for cluster resizing events?

A.Driver's log file
B.Executor's log file
C.Cluster Event Log
D.Ganglia
E.Workspace audit logs

Question 57

At the end of the inventory process a file gets uploaded to the cloud object storage, you are asked to build a process to ingest data which of the following method can be used to ingest the data incrementally, the schema of the file is expected to change overtime ingestion process should be able to handle these changes automatically. Below is the auto loader command to load the data, fill in the blanks for successful execution of the below code.
1.spark.readStream
2..format("cloudfiles")
3..option("cloudfiles.format","csv)
4..option("_______", 'dbfs:/location/checkpoint/')
5..load(data_source)
6..writeStream
7..option("_______",' dbfs:/location/checkpoint/')
8..option("mergeSchema", "true")
9..table(table_name))

A.checkpointlocation, schemalocation
B.checkpointlocation, cloudfiles.schemalocation
C.schemalocation, checkpointlocation
D.cloudfiles.schemalocation, checkpointlocation
E.cloudfiles.schemalocation, cloudfiles.checkpointlocation

Question 58

A data architect has heard about lake's built-in versioning and time travel capabilities. For auditing purposes they have a requirement to maintain a full of all valid street addresses as they appear in the customers table.
The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.
Which piece of information is critical to this decision?

A.Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.
B.Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.
C.Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.
D.Data corruption can occur if a query fails in a partially completed state because Type 2 tables requires Setting multiple fields in a single update.

Question 59

A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.
The proposed directory structure is displayed below:

Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

A.No; Delta Lake manages streaming checkpoints in the transaction log.
B.Yes; both of the streams can share a single checkpoint directory.
C.No; only one stream can write to a Delta Lake table.
D.Yes; Delta Lake supports infinite concurrent writers.
E.No; each of the streams needs to have its own checkpoint directory.

Question 60

You are trying to calculate total sales made by all the employees by parsing a complex struct data type that stores employee and sales data, how would you approach this in SQL Table definition, batchId INT, performance ARRAY<STRUCT<employeeId: BIGINT, sales: INT>>, in-sertDate TIMESTAMP Sample data of performance column
1.[
2.{ "employeeId":1234
3."sales" : 10000},
4.
5.{ "employeeId":3232
6."sales" : 30000}
7.]
Calculate total sales made by all the employees?
Sample data with create table syntax for the data:
1.create or replace table sales as
2.select 1 as batchId ,
3.from_json('[{ "employeeId":1234,"sales" : 10000 },{ "employeeId":3232,"sales" : 30000 }]',
4. 'ARRAY<STRUCT<employeeId: BIGINT, sales: INT>>') as performance,
5. current_timestamp() as insertDate
6.union all
7.select 2 as batchId ,
8. from_json('[{ "employeeId":1235,"sales" : 10500 },{ "employeeId":3233,"sales" : 32000 }]',
9. 'ARRAY<STRUCT<employeeId: BIGINT, sales: INT>>') as performance,
10. current_timestamp() as insertDate

A.1.WITH CTE as (SELECT EXPLODE (performance) FROM table_name)
2.SELECT SUM (performance.sales) FROM CTE
B.1.WITH CTE as (SELECT FLATTEN (performance) FROM table_name)
2.SELECT SUM (sales) FROM CTE
C.1.select aggregate(flatten(collect_list(performance.sales)), 0, (x, y) -> x + y)
2.as total_sales from sales
D.SELECT SUM(SLICE (performance, sales)) FROM employee
E.1.select reduce(flatten(collect_list(performance:sales)), 0, (x, y) -> x + y)
2.as total_sales from sales

Question 56

Question 57

Question 58

Question 59

Question 60

Download PDF File