Free Access to Google.Professional-Data-Engineer.v2022-10-14.q166 with Valid Practice Test (Page 27)

Question 126

You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?

A.The message body for the sensor event is too large.
B.Your custom endpoint has an out-of-date SSL certificate.
C.The Cloud Pub/Sub topic has too many messages published to it.
D.Your custom endpoint is not acknowledging messages within the acknowledgement deadline.

Question 127

You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time.
Which two methods can accomplish this? Choose 2 answers.

A.Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.
B.Use managed exportm, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.
C.Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.
D.Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.
E.Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.

Question 128

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

A.1 continuous and 2 categorical
B.3 categorical
C.3 continuous
D.2 continuous and 1 categorical

Question 129

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low. You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

A.Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
B.Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
C.Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
D.Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
E.Introduce data compression for each file to increase the rate file of file transfer.

Question 130

Your company currently runs a large on-premises cluster using Spark Hive and Hadoop Distributed File System (HDFS) in a colocation facility. The duster is designed to support peak usage on the system, however, many jobs are batch n nature, and usage of the cluster fluctuates quite dramatically.
Your company is eager to move to the cloud to reduce the overhead associated with on-premises infrastructure and maintenance and to benefit from the cost savings. They are also hoping to modernize their existing infrastructure to use more servers offerings m order to take advantage of the cloud Because of the tuning of their contract renewal with the colocation facility they have only 2 months for their initial migration How should you recommend they approach thee upcoming migration strategy so they can maximize their cost savings in the cloud will still executing the migration in time?

A.Migrate the Spark workload to Dataproc plus HDFS, and modernize the Hive workload for BigQuery
B.Migrate the workloads to Dataproc plus HOPS, modernize later
C.Modernize the Spark workload for Dataflow and the Hive workload for BigQuery
D.Migrate the workloads to Dataproc plus Cloud Storage modernize later

Question 126

Question 127

Question 128

Question 129

Question 130

Download PDF File