Free Access to Google.Professional-Data-Engineer.v2024-01-19.q177 with Valid Practice Test (Page 16)

Question 71

What are two methods that can be used to denormalize tables in BigQuery?

A.1) Split table into multiple tables; 2) Use a partitioned table
B.1) Join tables into one table; 2) Use nested repeated fields
C.1) Use a partitioned table; 2) Join tables into one table
D.1) Use nested repeated fields; 2) Use a partitioned table

Question 72

You operate a logistics company, and you want to improve event delivery reliability for vehicle-based sensors. You operate small data centers around the world to capture these events, but leased lines that provide connectivity from your event collection infrastructure to your event processing infrastructure are unreliable, with unpredictable latency. You want to address this issue in the most cost-effective way. What should you do?

A.Deploy small Kafka clusters in your data centers to buffer events.
B.Have the data acquisition devices publish data to Cloud Pub/Sub.
C.Establish a Cloud Interconnect between all remote data centers and Google.
D.Write a Cloud Dataflow pipeline that aggregates all data in session windows.

Question 73

You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

A.Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
B.Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery
C.Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery
D.Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table

Question 74

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.
What should you do?

A.Select random samples from the tables using the HASH() function and compare the samples.
B.Select random samples from the tables using the RAND() function and compare the samples.
C.Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.
D.Create stratified random samples using the OVER() function and compare equivalent samples from each table.

Question 75

You need to move 2 PB of historical data from an on-premises storage appliance to Cloud Storage within six months, and your outbound network capacity is constrained to 20 Mb/sec. How should you migrate this data to Cloud Storage?

A.Use Transfer Appliance to copy the data to Cloud Storage
B.Use gsutil cp J to compress the content being uploaded to Cloud Storage
C.Create a private URL for the historical data, and then use Storage Transfer Service to copy the data to Cloud Storage
D.Use trickle or ionice along with gsutil cp to limit the amount of bandwidth gsutil utilizes to less than 20 Mb/sec so it does not interfere with the production traffic

Question 71

Question 72

Question 73

Question 74

Question 75

Download PDF File