Free Access to Google.Professional-Data-Engineer.v2024-01-19.q177 with Valid Practice Test (Page 14)

Question 61

You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in sequence, and some of the jobs run concurrently. You need to automate this process. What should you do?

A.Create a Cloud Dataproc Workflow Template
B.Create an initialization action to execute the jobs
C.Create a Directed Acyclic Graph in Cloud Composer
D.Create a Bash script that uses the Cloud SDK to create a cluster, execute jobs, and then tear down the cluster

Question 62

You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

A.Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery
B.Rewrite your models on TensorFlow, and start using Cloud ML Engine
C.Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery
D.Use Cloud ML Engine for training existing Spark ML models

Question 63

Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop and Spark workloads that they cannot move to BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should they do?

A.Store the common data encoded as Avro in Google Cloud Storage.
B.Store the common data in BigQuery as partitioned tables.
C.Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.
D.Store the common data in BigQuery and expose authorized views.

Question 64

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?

A.Cloud Scheduler
B.Cloud Composer
C.cron
D.Workflow Templates on Cloud Dataproc

Question 65

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

A.Include ORDER BY DESK on timestamp column and LIMIT to 1.
B.Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
C.Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
D.Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Question 61

Question 62

Question 63

Question 64

Question 65

Download PDF File