Soft deprecation notice: For new monitor resources, prefer
DatadogGenericResource with
type: monitor.
DatadogMonitor remains supported for existing users, but
DatadogGenericResource is the preferred path for new Datadog API capabilities.
To deploy a Datadog monitor, you can use the Datadog Operator and DatadogMonitor custom resource definition (CRD).
Prerequisites
Setup
Create a file with the spec of your DatadogMonitor deployment configuration.
Example:
The following spec creates a metric monitor that alerts on the query avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5.
datadog-metric-monitor.yaml
apiVersion:datadoghq.com/v1alpha1kind:DatadogMonitormetadata:name:datadog-monitor-testnamespace:datadogspec:query:"avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5"type:"metric alert"name:"Test monitor made from DatadogMonitor"message:"1-2-3 testing"tags:- "test:datadog"priority:5controllerOptions:disableRequiredTags:falseoptions:evaluationDelay:300includeTags:truelocked:falsenewGroupDelay:300notifyNoData:truenoDataTimeframe:30renotifyInterval:1440thresholds:critical:"0.5"warning:"0.28"
See the complete list of configuration fields.
Deploy your DatadogMonitor:
kubectl apply -f /path/to/your/datadog-metric-monitor.yaml
Additional examples
Metric monitors
Other monitors
All available configuration fields
The following table lists all available configuration fields for the DatadogMonitor custom resource.
message- required - string
A message to include with notifications for this monitor. name- required - string
The monitor name. query- required - string
The monitor query. type- required - enum
The type of the monitor.
Allowed enum values: metric alert, query alert, service check, event alert, log alert, process alert, rum alert, trace-analytics alert, slo alert, event-v2 alert, audit alert, composite controllerOptions.disableRequiredTags- boolean
Disables the automatic addition of required tags to monitors. priority- int64
An integer from 1 (high) to 5 (low) indicating alert severity. restrictedRoles- [string]
A list of unique role identifiers to define which roles are allowed to edit the monitor. The unique identifiers for all roles can be pulled from the Roles API and are located in the data.id field. tags- [string]
Tags associated to your monitor. options- object
List of options associated with your monitor. See Options.
Options
The following fields are set in the options property.
For example:
apiVersion:datadoghq.com/v1alpha1kind:DatadogMonitormetadata:name:datadog-monitor-testnamespace:datadogspec:query:"avg(last_10m):avg:system.disk.in_use{*} by {host} > 0.5"type:"metric alert"name:"Test monitor made from DatadogMonitor"message:"1-2-3 testing"options:enableLogsSample:truethresholds:critical:"0.5"warning:"0.28"
enableLogsSample- boolean
Whether or not to send a log sample when the log monitor triggers. escalationMessage- string
A message to include with a re-notification. evaluationDelay- int64
Time (in seconds) to delay evaluation, as a non-negative integer. For example: if the value is set to 300 (5min), the timeframe is set to last_5m, and the time is 7:00, then the monitor evaluates data from 6:50 to 6:55. This is useful for AWS CloudWatch and other backfilled metrics to ensure the monitor always has data during evaluation. groupRetentionDuration- string
The time span after which groups with missing data are dropped from the monitor state. The minimum value is one hour, and the maximum value is 72 hours. Example values are: 60m, 1h, and 2d. This option is only available for APM Trace Analytics, Audit Trail, CI, Error Tracking, Event, Logs, and RUM monitors. groupbySimpleMonitor- boolean
DEPRECATED: Whether the log alert monitor triggers a single alert or multiple alerts when any group breaches a threshold. Use notifyBy instead. includeTags- boolean
A Boolean indicating whether notifications from this monitor automatically inserts its triggering tags into the title. locked- boolean
DEPRECATED: Whether or not the monitor is locked (only editable by creator and admins). Use restrictedRoles instead. newGroupDelay- int64
Time (in seconds) to allow a host to boot and applications to fully start before starting the evaluation of monitor results. Should be a non-negative integer. noDataTimeframe- int64
The number of minutes before a monitor notifies after data stops reporting. Datadog recommends at least 2x the monitor timeframe for metric alerts or 2 minutes for service checks. If omitted, 2x the evaluation timeframe is used for metric alerts, and 24 hours is used for service checks. notificationPresetName- enum
Toggles the display of additional content sent in the monitor notification.
Allowed enum values: show_all, hide_query, hide_handles, hide_all
Default: show_all notifyAudit- boolean
A Boolean indicating whether tagged users are notified on changes to this monitor. notifyBy- [string]
A string indicating the granularity a monitor alerts on. Only available for monitors with groupings. For example, if you have a monitor grouped by cluster, namespace, and pod, and you set notifyBy to ["cluster"], then your monitor only notifies on each new cluster violating the alert conditions.
Tags mentioned in notifyBy must be a subset of the grouping tags in the query. For example, a query grouped by cluster and namespace cannot notify on region.
Setting notifyBy to [*] configures the monitor to notify as a simple-alert. notifyNoData- boolean
A Boolean indicating whether this monitor notifies when data stops reporting.
Default: false. onMissingData- enum
Controls how groups or monitors are treated if an evaluation does not return any data points. The default option results in different behavior depending on the monitor query type. For monitors using Count queries, an empty monitor evaluation is treated as 0 and is compared to the threshold conditions. For monitors using any query type other than Count, for example Gauge, Measure, or Rate, the monitor shows the last known status. This option is only available for APM Trace Analytics, Audit Trail, CI, Error Tracking, Event, Logs, and RUM monitors.
Allowed enum values: default, show_no_data, show_and_notify_no_data, resolve renotifyInterval- int64
The number of minutes after the last notification before a monitor re-notifies on the current status. It only re-notifies if it’s not resolved. renotifyOccurrences- int64
The number of times re-notification messages should be sent on the current status at the provided re-notification interval. renotifyStatuses- [string]
The types of monitor statuses for which re-notification messages are sent.
If renotifyInterval is null, defaults to null.
If renotifyInterval is not null, defaults to ["Alert", "No Data"]
Values for monitor status: Alert, No Data, Warn requireFullWindow- boolean
A Boolean indicating whether this monitor needs a full window of data before it’s evaluated. Datadog highly recommends you set this to false for sparse metrics, otherwise some evaluations are skipped.
Default: false. schedulingOptions- object
Configuration options for scheduling:customSchedule- object
Configuration options for the custom schedule:recurrence- [object]
Array of custom schedule recurrences.rrule- string
The recurrence rule in iCalendar format. For example, FREQ=MONTHLY;BYMONTHDAY=28,29,30,31;BYSETPOS=-1. start- string
The start date of the recurrence rule defined in YYYY-MM-DDThh:mm:ss format. If omitted, the monitor creation time is used. timezone- string
The timezone in tz database format, in which the recurrence rule is defined. For example, America/New_York or UTC.
evaluationWindow- object
Configuration options for the evaluation window. If hour_starts is set, no other fields may be set. Otherwise, day_starts and month_starts must be set together.dayStarts- string
The time of the day at which a one day cumulative evaluation window starts. Must be defined in UTC time in HH:mm format. hourStarts- integer
The minute of the hour at which a one hour cumulative evaluation window starts. monthStarts- integer
The day of the month at which a one month cumulative evaluation window starts.
thresholdWindows- object
Alerting time window options:recoveryWindow- string
Describes how long an anomalous metric must be normal before the alert recovers. triggerWindow- string
Describes how long a metric must be anomalous before an alert triggers.
thresholds- object
List of the different monitor thresholds available:critical- string
The monitor CRITICAL threshold. criticalRecovery- string
The monitor CRITICAL recovery threshold. ok- string
The monitor OK threshold. unknown- string
The monitor UNKNOWN threshold. warning- string
The monitor WARNING threshold. warningRecovery- string
The monitor WARNING recovery threshold.
timeoutH- int64
The number of hours of the monitor not reporting data before it automatically resolves from a triggered state.
Further reading
Additional helpful documentation, links, and articles: