In a PySpark application, you're writing a function that reads a CSV file and shows the first few rows. Which of the following code snippets correctly accomplishes this task?
You're working with a large dataset stored in multiple Parquet files across different HDFS directories. How can you efficiently load and process this data using Spark, ensuring data locality and minimizing shuffle operations?
In a Kubernetes environment, you want to restrict the communication to your Spark application pods to only allow traffic from pods in a specific namespace. Which Kubernetes feature would you use to implement this?
You have deployed a Spark application on Kubernetes, which is experiencing intermittent failures. To improve fault tolerance, you decide to implement checkpointing. Which of the following is the best approach to add checkpointing in a PySpark application?
You've discovered that a production Iceberg table has several corrupted data files. Which of the following actions could help address this issue?