In the context of Cloudera's Optimization Framework, what role does data statistics collection play?
A data engineer needs to query a table stored in Apache Hive using SparkSQL. Which of the following commands correctly retrieves data from a Hive table named 'sales data'?
You have an Airflow DAG that includes tasks for data extraction, transformation, and loading. You notice that the transformation tasks are computationally intensive and are causing delays in the DAG's execution. To optimize performance, you decide to offload these tasks to a cloud-based service that can scale dynamically. Which approach ensures minimal changes to the DAG structure while integrating this optimization?
In Apache Airflow, how can you dynamically generate tasks for each table in your database that needs a quality check?
You want to track changes to an Iceberg table over time for auditing purposes. Which combination of Iceberg features would best support this?