Your project involves integrating Spark with a NoSQL database, MongoDB. You need to write a DataFrame 'df into a MongoDB collection named 'orders'. Which PySpark code snippet correctly achieves this?
For a Hive table that is both partitioned and bucketed, what considerations must be taken into account to optimize a join query involving this table?
Which command line tool is essential for interacting with Cloudera's Hadoop ecosystem for file operations?
Which approach can help mitigate issues with schema inference for complex data types in a big data environment?
In Apache Airflow, what is the purpose of setting max_active_runs in a DAG's configuration?