If a Spark Driver pod in Kubernetes is reaching its CPU limit and experiencing performance issues, what is the most appropriate first action?
You're designing a schema for an Iceberg table that will store time-series sensor dat a. Which of the following considerations is most important for optimal query performance and storage efficiency?
You have deployed a Spark application on Kubernetes, which is experiencing intermittent failures. To improve fault tolerance, you decide to implement checkpointing. Which of the following is the best approach to add checkpointing in a PySpark application?
If you want to set a minimum and maximum number of Executor pods for a Spark application in Kubernetes, which pair of PySpark configuration settings would you use?
What does setting the Spark configuration parameter 'spark.sql.shuffle.partitions' impact?
A The default level of parallelism for joins and aggregations