VOOZH about

URL: https://docs.datadoghq.com/data_observability/jobs_monitoring/kubernetes/

⇱ Data Observability: Jobs Monitoring for Spark on Kubernetes


For AI agents: A markdown version of this page is available at https://docs.datadoghq.com/data_observability/jobs_monitoring/kubernetes.md. A documentation index is available at /llms.txt.
This product is not supported for your selected Datadog site. ().

Data Observability: Jobs Monitoring gives visibility into the performance and reliability of Apache Spark applications on Kubernetes.

Setup

Data Observability: Jobs Monitoring requires Datadog Agent version 7.64.0 or later, and Java tracer version 1.38.0 or later.

Follow these steps to enable Data Observability: Jobs Monitoring for Spark on Kubernetes.

  1. Install the Datadog Agent on your Kubernetes cluster.
  2. Enable Single Step Instrumentation.

Install the Datadog Agent on your Kubernetes cluster

If you have already installed the Datadog Agent on your Kubernetes cluster, make sure you’ve enabled the Datadog Admission Controller. You can then go to the next step, Enable Single Step Instrumentation.

You can install the Datadog Agent using the Datadog Operator or Helm.

Prerequisites

Installation

  1. Install the Datadog Operator by running the following commands:

    helm repo add datadog https://helm.datadoghq.com
    helm install my-datadog-operator datadog/datadog-operator
    
  2. Create a Kubernetes Secret to store your Datadog API key.

    kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY>
    

    Replace <DATADOG_API_KEY> with your Datadog API key.

  3. Create a file, datadog-agent.yaml, that contains the following configuration:

    kind:DatadogAgentapiVersion:datadoghq.com/v2alpha1metadata:name:datadogspec:features:apm:enabled:truehostPortConfig:enabled:truehostPort:8126admissionController:enabled:truemutateUnlabelled:false# (Optional) Uncomment the next three lines to enable logs collection# logCollection:# enabled: true# containerCollectAll: trueglobal:site:<DATADOG_SITE>credentials:apiSecret:secretName:datadog-secretkeyName:api-keyoverride:nodeAgent:image:tag:<DATADOG_AGENT_VERSION>

    Replace <DATADOG_SITE> with your Datadog site. Your site is . (Ensure the correct SITE is selected on the right).

    Replace <DATADOG_AGENT_VERSION> with version 7.64.0 or later.

    Optional: Uncomment the logCollection section to start collecting application logs which will be correlated to Spark job run traces. Once enabled, logs are collected from all discovered containers by default. See the Kubernetes log collection documentation for more details on the setup process.

  4. Deploy the Datadog Agent with the above configuration file:

    kubectl apply -f /path/to/your/datadog-agent.yaml
    
  1. Create a Kubernetes Secret to store your Datadog API key.

    kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY>
    

    Replace <DATADOG_API_KEY> with your Datadog API key.

  2. Create a file, datadog-values.yaml, that contains the following configuration:

    datadog:apiKeyExistingSecret:datadog-secretsite:<DATADOG_SITE>apm:portEnabled:trueport:8126# (Optional) Uncomment the next three lines to enable logs collection# logs:# enabled: true# containerCollectAll: trueagents:image:tag:<DATADOG_AGENT_VERSION>clusterAgent:admissionController:enabled:truemuteUnlabelled:false

    Replace <DATADOG_SITE> with your Datadog site. Your site is . (Ensure the correct SITE is selected on the right).

    Replace <DATADOG_AGENT_VERSION> with version 7.64.0 or later.

    Optional: Uncomment the logs section to start collecting application logs which will be correlated to Spark job run traces. Once enabled, logs are collected from all discovered containers by default. See the Kubernetes log collection documentation for more details on the setup process.

  3. Run the following command:

    helm install <RELEASE_NAME> \
     -f datadog-values.yaml \
     --set targetSystem=<TARGET_SYSTEM> \
     datadog/datadog
    
    • Replace <RELEASE_NAME> with your release name. For example, datadog-agent.

    • Replace <TARGET_SYSTEM> with the name of your OS. For example, linux or windows.

Enable Single Step Instrumentation

Single Step Instrumentation (SSI) injects the Java tracer into your Spark driver and executor pods at startup. It works regardless of whether your Spark driver runs in cluster mode (as a dedicated Kubernetes pod) or client mode (as a process inside your submitter pod; for example, an Airflow scheduler or worker).

Spark automatically sets spark-role: driver on driver pods and spark-role: executor on executor pods. In client mode, replace spark-role: driver with the labels that identify your submitter pod instead. To find those labels, run:

kubectl get pod <SUBMITTER_POD_NAME> -n <NAMESPACE> --show-labels

Requires Datadog Operator version 1.13.0 or later.

Add the features.apm.instrumentation section to your datadog-agent.yaml and apply it:

features:apm:instrumentation:enabled:trueenabledNamespaces:- <NAMESPACE> # namespace where your Spark jobs runtargets:- name:spark-driverpodSelector:matchLabels:spark-role:driver # replace with your submitter pod labels if running in client modeddTraceVersions:java:"latest"ddTraceConfigs:- name:DD_DATA_JOBS_ENABLEDvalue:"true"- name:spark-executorpodSelector:matchLabels:spark-role:executorddTraceVersions:java:"latest"ddTraceConfigs:- name:DD_DATA_JOBS_ENABLEDvalue:"true"
kubectl apply -f /path/to/your/datadog-agent.yaml

Add the following to your datadog-values.yaml and apply it:

datadog:apm:instrumentation:enabled:trueenabledNamespaces:- <NAMESPACE> # namespace where your Spark jobs runtargets:- name:spark-driverpodSelector:matchLabels:spark-role:driver # replace with your submitter pod labels if running in client modeddTraceVersions:java:"latest"ddTraceConfigs:- name:DD_DATA_JOBS_ENABLEDvalue:"true"- name:spark-executorpodSelector:matchLabels:spark-role:executorddTraceVersions:java:"latest"ddTraceConfigs:- name:DD_DATA_JOBS_ENABLEDvalue:"true"
helm upgrade <RELEASE_NAME> datadog/datadog -f datadog-values.yaml

After applying the configuration, restart the targeted pods. SSI injects the init container into each pod on startup.

Validation

In Datadog, view the Data Observability: Jobs Monitoring page to see a list of all your data processing jobs.

Advanced Configuration

Set service, environment, and version tags

To attach service, environment, and version tags to your job traces, pass the following JVM options in your spark-submit configuration or spark-defaults.conf:

spark.driver.extraJavaOptions=-Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION>
spark.executor.extraJavaOptions=-Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION>

Tag spans at runtime

You can set tags on Spark spans at runtime. These tags are applied only to spans that start after the tag is added.

// Add tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)

To remove a runtime tag:

// Remove tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", null)

Further Reading

Additional helpful documentation, links, and articles: