Free Access to Google.Professional-Data-Engineer.v2022-10-14.q166 with Valid Practice Test (Page 24)

Question 111

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud.
Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

A.Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
B.Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
C.Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
D.Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.

Question 112

You operate a database that stores stock trades and an application that retrieves average stock price for a given company over an adjustable window of time. The data is stored in Cloud Bigtable where the datetime of the stock trade is the beginning of the row key. Your application has thousands of concurrent users, and you notice that performance is starting to degrade as more stocks are added. What should you do to improve the performance of your application?

A.Change the data pipeline to use BigQuery for storing stock trades, and update your application.
B.Use Cloud Dataflow to write summary of each day's stock trades to an Avro file on Cloud Storage.
Update your application to read from Cloud Storage and Cloud Bigtable to compute the responses.
C.Change the row key syntax in your Cloud Bigtable table to begin with a random number per second.
D.Change the row key syntax in your Cloud Bigtable table to begin with the stock symbol.

Question 113

You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available.
How should you use this data to train the model?

A.Continuously retrain the model on just the new data.
B.Continuously retrain the model on a combination of existing data and the new data.
C.Train on the existing data while using the new data as your test set.
D.Train on the new data while using the existing data as your test set.

Question 114

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

A.Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
B.Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.
C.Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.
D.Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.

Question 115

You used Cloud Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a daily upload of data with the same schema, after the load job with variable execution time completes. What should you do?

A.Create a cron schedule in Cloud Dataprep.
B.Export the recipe as a Cloud Dataprep template, and create a job in Cloud Scheduler.
C.Export the Cloud Dataprep job as a Cloud Dataflow template, and incorporate it into a Cloud Composer job.
D.Create an App Engine cron job to schedule the execution of the Cloud Dataprep job.

Question 111

Question 112

Question 113

Question 114

Question 115

Download PDF File