You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data.
How can you adjust your application design?
You work for a large real estate firm and are preparing 6 TB of home sales data to be used for machine learning. You will use SQL to transform the data and use BigQuery ML to create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?
You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?
You have spent a few days loading data from comma-separated values (CSV) files into the Google
BigQuery table CLICK_STREAM. The column DTstores the epoch time of click events. For convenience,
you chose a simple schema where every field is treated as the STRINGtype. Now, you want to compute
web session durations of users who visit your site, and you want to change its data type to the
TIMESTAMP. You want to minimize the migration effort without making future queries computationally
expensive. What should you do?
Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service
in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If
there are any concerns about a transmission, the system re-transmits the data. How should you
deduplicate the data most efficiency?
Enter your email address to download Google.Professional-Data-Engineer.v2022-05-18.q125 Dumps