Free Access to Google.Professional-Data-Engineer.v2022-10-14.q166 with Valid Practice Test (Page 22)

Question 101

You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

A.Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery
B.Use Cloud ML Engine for training existing Spark ML models
C.Rewrite your models on TensorFlow, and start using Cloud ML Engine
D.Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery

Question 102

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

A.Use bq load to load a batch of sensor data every 60 seconds.
B.Use a Cloud Dataflow pipeline to stream data into the BigQuery table.
C.Use the MERGE statement to apply updates in batch every 60 seconds.
D.Use the INSERT statement to insert a batch of data every 60 seconds.

Question 103

Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?

A.Cloud Dataflow
B.Cloud Composer
C.Cloud Dataprep
D.Cloud Dataproc

Question 104

When you design a Google Cloud Bigtable schema it is recommended that you _________.

A.Avoid schema designs that are based on NoSQL concepts
B.Create schema designs that are based on a relational database design
C.Avoid schema designs that require atomicity across rows
D.Create schema designs that require atomicity across rows

Question 105

You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

A.Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
B.Use a larger instance type for your Cloud Dataflow workers
C.Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
D.Increase the number of max workers
E.Change the zone of your Cloud Dataflow pipeline to run in us-central1

Question 101

Question 102

Question 103

Question 104

Question 105

Download PDF File