VOOZH about

URL: https://docs.datadoghq.com/llm_observability/monitoring/metrics/

⇱ Agent Observability Metrics


For AI agents: A markdown version of this page is available at https://docs.datadoghq.com/llm_observability/monitoring/metrics.md. A documentation index is available at /llms.txt.

Agent Observability Metrics

This product is not supported for your selected Datadog site. ().

After you instrument your application with Agent Observability, you can access Agent Observability metrics for use in dashboards and monitors. These metrics capture span counts, error counts, token usage, and latency measures for your LLM applications. These metrics are calculated based on 100% of the application’s traffic.

The ml_obs.* entries on this page are Datadog Metrics: numerical values that describe an aspect of your LLM application over time, derived from your LLM spans (counts, distributions of cost, tokens, latency, errors). They are 100%-sampled, follow standard Datadog metric retention (15 months at full granularity), and are queryable from dashboards, monitors, and notebooks like any other Datadog metric.

They are distinct from two other things in Agent Observability:
  • Per-span operational data (cost, tokens, latency, errors on each individual trace or span): the raw values these metrics roll up from. Stored with spans, follow Agent Observability trace retention, and are queried from the Traces explorer rather than as metrics.
  • Evaluation scores (also called "evals"): quality and safety judgments (for example, hallucination, faithfulness, custom LLM-as-a-judge) attached to individual spans or experiment rows. These are not derived from operational telemetry, and follow Agent Observability trace and experiment retention rather than Datadog metric retention.
Other tags set on spans are not available as tags on Agent Observability metrics.

Span metrics

Metric NameDescriptionMetric TypeTags
ml_obs.spanTotal number of spans with a span kindCountenv, error, ml_app, model_name, model_provider, service, span_kind, version
ml_obs.span.durationTotal duration of spans in secondsDistributionenv, error, ml_app, model_name, model_provider, service, span_kind, version
ml_obs.span.errorNumber of errors that occurred in the spanCountenv, error, ml_app, model_name, model_provider, service, span_kind, version

LLM token metrics

Metric NameDescriptionMetric TypeTags
ml_obs.span.llm.input.tokensNumber of tokens in the input sent to the LLMDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.output.tokensNumber of tokens in the outputDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.output.reasoning.tokensNumber of reasoning tokens in the outputDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.prompt.tokensNumber of tokens used in the promptDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.completion.tokensTokens generated as a completion during the spanDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.total.tokensTotal tokens consumed during the span (input + output + prompt)Distributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.input.cache_write.tokensNumber of input tokens written to the prompt cache in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.input.cache_read.tokensNumber of input tokens served from the prompt cache in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.input.non_cached.tokensNumber of input tokens that did not interact with the prompt cache in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.input.charactersNumber of characters in the input sent to the LLMDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider
ml_obs.span.llm.output.charactersNumber of characters in the outputDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider

Embedding metrics

Metric NameDescriptionMetric TypeTags
ml_obs.span.embedding.input.tokensNumber of input tokens used for generating an embeddingDistributionenv, error, ml_app, model_name, model_provider, service, version, matched_model_name, matched_model_provider

LLM cost metrics

The unit for estimated cost metrics for Agent Observability is nanodollars.
Metric NameDescriptionMetric TypeTags
ml_obs.span.llm.input.costEstimated input cost in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.embedding.input.costEstimated input cost in an embedding spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.llm.output.reasoning.costEstimated reasoning output cost in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.llm.output.costEstimated output cost in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.llm.total.costEstimated total cost in an LLM or embedding spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.llm.input.cache_write.costEstimated cache write input cost in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.llm.input.cache_read.costEstimated cache read input cost in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider
ml_obs.span.llm.input.non_cached.costEstimated non cached input cost in an LLM spanDistributionenv, error, ml_app, model_name, model_provider, service, version, source, matched_model_name, matched_model_provider

Trace metrics

Metric NameDescriptionMetric TypeTags
ml_obs.traceNumber of tracesCountenv, error, ml_app, service, span_kind, version
ml_obs.trace.durationTotal duration of all traces across all spansDistributionenv, error, ml_app, service, span_kind, version
ml_obs.trace.errorNumber of errors that occurred during the traceCountenv, error, ml_app, service, span_kind, version

Estimated usage metrics

Metric NameDescriptionMetric TypeTags
ml_obs.estimated_usage.llm.input.tokensEstimated number of input tokens usedDistributionevaluation_name, ml_app, model_name, model_provider, model_server

Deprecated metrics

The following metrics are deprecated, and are maintained only for backward compatibility. Datadog strongly recommends using non-deprecated token metrics for all token usage measurement use cases.
Metric NameDescriptionMetric TypeTags
ml_obs.estimated_usage.llm.output.tokensEstimated number of output tokens generatedDistributionevaluation_name, ml_app, model_name, model_provider, model_server
ml_obs.estimated_usage.llm.total.tokensTotal estimated tokens (input + output) usedDistributionevaluation_name, ml_app, model_name, model_provider, model_server

Next steps


Further Reading