![]() |
VOOZH | about |
Logs
The Sensitive Data Scanner processor scans logs to detect and redact or hash sensitive information such as PII, PCI, and custom sensitive data. You can pick from Datadog’s library of predefined rules, or input custom Regex rules to scan for sensitive data.
You can set up the pipeline and processor in the UI, API, or Terraform.
See Best practices to optimize performance for tips on reducing resource usage.
To set up the processor:
outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is excluded.outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is scanned.After adding scanning rules from the library, you can edit each rule separately and add additional keywords to the keyword dictionary.
\C “single-byte” directive (which breaks UTF-8 sequences)\R newline match\K start of match reset directivevisa, credit, and card. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is excluded.outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is scanned.To delete a rule in the Sensitive Data Scanner:
For this log structure example:
{
"outer_key": {
"inner_key": "inner_value",
"a": {
"double_inner_key": "double_inner_value",
"b": "b value"
},
"c": "c value"
},
"d": "d value"
}
Follow these reference rules:
outer_key.inner_key to reference the key with the value inner_value.outer_key.a.double_inner_key to reference the key with the value double_inner_value.To specify a nested field with a literal . in the attribute key, wrap the key in escaped quotes in the search query. For example, the search query "service.status":disabled matches the event {"service.status": "disabled"}.
You can use the Datadog Observability Pipeline Terraform resource to set up a pipeline with the Sensitive Data Scanner processor. To add a rule to the Sensitive Data Scanner processor using Terraform:
Use the Datadog Sensitive Data Scanner Standard Pattern data source to retrieve the rule ID of the Sensitive Data Scanner library rule.
data "datadog_sensitive_data_scanner_standard_pattern" "<RULE_IDENTIFIER>" {
filter = "<RULE_NAME>"
}
Replace the placeholders:
<RULE_IDENTIFIER> with a name to use when you later set up the Sensitive Data Scanner processor in the Observability Pipeline resource.<RULE_NAME> with the exact name of the rule. See Library Rules for the full list of rules.For example, if you want to use the AWS Access Key ID Scanner, configure the data source as follows:
data "datadog_sensitive_data_scanner_standard_pattern" "aws_access_key" {
filter = "AWS Access Key ID Scanner"
}
Add a rule block in your Observability Pipeline resource for the library rule.
...
sensitive_data_scanner {
rule {
name = "<YOUR_RULE_NAME>"
tags = []
on_match {
redact {
replace = "***"
}
}
pattern {
library {
id = data.datadog_sensitive_data_scanner_standard_pattern.<RULE_IDENTIFIER>.id
use_recommended_keywords = true
}
}
scope {
all = true
}
}
}
Replace the placeholders:
<YOUR_RULE_NAME> with a name for the rule. This name is shown in the Pipelines UI.<RULE_IDENTIFIER> with the rule identifier you used in the data source in step 1.For example, if you use the AWS Access Key ID Scanner data source from step 1, configure the rule block as follows:
...
sensitive_data_scanner {
rule {
name = "Redact AWS Access Key IDs"
tags = []
on_match {
redact {
replace = "***"
}
}
pattern {
library {
id = data.datadog_sensitive_data_scanner_standard_pattern.aws_access_key.id
use_recommended_keywords = true
}
}
scope {
all = true
}
}
}
See the full configuration example on how to add multiple rules.
Repeat steps 1 and 2 for all library rules you want to add.
If you want to use the Sensitive Data Scanner processor to scan for AWS Access Key IDs and US Social Security Numbers, and redact them by replacing them with the string ***:
data "datadog_sensitive_data_scanner_standard_pattern" "aws_access_key" {
filter = "AWS Access Key ID Scanner"
}
data "datadog_sensitive_data_scanner_standard_pattern" "us_ssn" {
filter = "US Social Security Number Scanner"
}
resource "datadog_observability_pipeline" "sensitive_data_pipeline" {
name = "Sensitive Data Pipeline"
config {
source {
id = "source-0"
datadog_agent {}
}
processor_group {
display_name = "Processors"
enabled = true
id = "group-0"
include = "*"
inputs = ["source-0"]
processor {
display_name = "Sensitive Data Scanner"
enabled = true
id = "processor-sds-0"
include = "*"
sensitive_data_scanner {
rule {
name = "Redact AWS Access Key IDs"
tags = []
on_match {
redact {
replace = "***"
}
}
pattern {
library {
id = data.datadog_sensitive_data_scanner_standard_pattern.aws_access_key.id
use_recommended_keywords = true
}
}
scope {
all = true
}
}
rule {
name = "Redact US SSNs"
tags = []
on_match {
redact {
replace = "***"
}
}
pattern {
library {
id = data.datadog_sensitive_data_scanner_standard_pattern.us_ssn.id
use_recommended_keywords = true
}
}
scope {
all = true
}
}
}
}
}
destination {
id = "destination-0"
inputs = ["group-0"]
datadog_logs {}
}
}
}The Sensitive Data Scanner processor is CPU intensive. Use the following best practices to optimize performance.
Rules that are enabled but not used consume unnecessary resources. Check the Sensitive Data Scanner processor to view how many matches each rule has had over the past 24 hours.
See Delete a rule to delete an unused rule.
The time it takes the Sensitive Data Scanner to scan an event roughly scales with the size of the event. To optimize processor performance:
If you know the types of events you want to scan, define a processor query that only sends the events you want to the processor.
Reduce scanning time by targeting specific event attributes for scanning or excluding event attributes from being scanned. See the Define rule target and conditions step in Set up the processor.
Use the pipelines.component_latency_seconds metric to:
To view the pipelines.component_latency_seconds metric:
pipelines.component_latency_seconds.component_id:<COMPONENT_ID>, where <COMPONENT_ID> is the ID for your Sensitive Data Scanner processor.Note: pipelines.component_latency_seconds is a distribution metric so you must enable percentiles for that metric. See Enabling advanced query functionality for instructions.
Additional helpful documentation, links, and articles:
| |