Note
Access to this page requires authorization. You can try signing in or .
Access to this page requires authorization. You can try .
Apache Spark guidelines
This article provides various guidelines for using Apache Spark on Azure HDInsight.
How do I run or submit Spark jobs?
How do I monitor and debug Spark jobs?
| Option | Documents |
|---|---|
| Azure Toolkit for IntelliJ | Failure spark job debugging with Azure Toolkit for IntelliJ (preview) |
| Azure Toolkit for IntelliJ through SSH | Debug Apache Spark applications locally or remotely on an HDInsight cluster with Azure Toolkit for IntelliJ through SSH |
| Azure Toolkit for IntelliJ through VPN | Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely in HDInsight through VPN |
| Job graph on Apache Spark History Server | Use extended Apache Spark History Server to debug and diagnose Apache Spark applications |
How do I make my Spark jobs run more efficiently?
| Option | Documents |
|---|---|
| IO Cache | Improve performance of Apache Spark workloads using Azure HDInsight IO Cache (Preview) |
| Configuration options | Optimize Apache Spark jobs |
How do I connect to other Azure Services?
| Option | Documents |
|---|---|
| Apache Hive on HDInsight | Integrate Apache Spark and Apache Hive with the Hive Warehouse Connector |
| Apache HBase on HDInsight | Use Apache Spark to read and write Apache HBase data |
| Apache Kafka on HDInsight | Tutorial: Use Apache Spark Structured Streaming with Apache Kafka on HDInsight |
| Azure Cosmos DB | Azure Synapse Link for Azure Cosmos DB |
What are my storage options?
| Option | Documents |
|---|---|
| Azure Data Lake Storage Gen2 | Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters |
| Azure Blob Storage | Use Azure storage with Azure HDInsight clusters |
Next steps
Feedback
Was this page helpful?
