Free Access to GAQM.Databricks-Certified-Data-Engineer-Associate.v2024-09-16.q91 with Valid Practice Test (Page 15)

Question 66

Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?

A.When they are working interactively with a small amount of data
B.When they are running automated reports to be refreshed as quickly as possible
C.When they are working with SQL within Databricks SQL
D.When they are concerned about the ability to automatically scale with larger data
E.When they are manually running reports with a large amount of data

Correct Answer: A

The scenario in which a data engineer will want to use a single-node cluster is when they are working interactively with a small amount of data. A single-node cluster is a cluster consisting of an Apache Spark driver and no Spark workers1. A single-node cluster supports Spark jobs and all Spark data sources, including Delta Lake1. A single-node cluster is helpful for single-node machine learning workloads that use Spark to load and save data, and for lightweight exploratory data analysis1. A single-node cluster can run Spark locally, spawn one executor thread per logical core in the cluster, and save all log output in the driver log1. A single-node cluster can be created by selecting the Single Node button when configuring a cluster1.
The other options are not suitable for using a single-node cluster. When running automated reports to be refreshed as quickly as possible, a data engineer will want to use a multi-node cluster that can scale up and down automatically based on the workload demand2. When working with SQL within Databricks SQL, a data engineer will want to use a SQL Endpoint that can execute SQL queries on a serverless pool or an existing cluster3. When concerned about the ability to automatically scale with larger data, a data engineer will want to use a multi-node cluster that can leverage the Databricks Lakehouse Platform and the Delta Engine to handle large-scale data processing efficiently and reliably4. When manually running reports with a large amount of data, a data engineer will want to use a multi-node cluster that can distribute the computation across multiple workers and leverage the Spark UI to monitor the performance and troubleshoot the issues.
References:
* 1: Single Node clusters | Databricks on AWS
* 2: Autoscaling | Databricks on AWS
* 3: SQL Endpoints | Databricks on AWS
* 4: Databricks Lakehouse Platform | Databricks on AWS
* : [Spark UI | Databricks on AWS]

Question 67

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

A.trigger("5 seconds")
B.trigger()
C.trigger(once="5 seconds")
D.trigger(processingTime="5 seconds")
E.trigger(continuous="5 seconds")

Question 68

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?

A.They can clone the existing task in the existing Job and update it to run the new notebook.
B.They can create a new task in the existing Job and then add it as a dependency of the original task.
C.They can create a new task in the existing Job and then add the original task as a dependency of the new task.
D.They can create a new job from scratch and add both tasks to run concurrently.
E.They can clone the existing task to a new Job and then edit it to run the new notebook.

Question 69

Which of the following benefits is provided by the array functions from Spark SQL?

A.An ability to work with data in a variety of types at once
B.An ability to work with data within certain partitions and windows
C.An ability to work with time-related data in specified intervals
D.An ability to work with complex, nested data ingested from JSON files
E.An ability to work with an array of tables for procedural automation

Question 70

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

Which of the following will be returned by the above query?

A.Option A
B.Option B
C.Option C
D.Option D
E.Option E

Question 66

Question 67

Question 68

Question 69

Question 70

Download PDF File