![]() |
VOOZH | about |
Learn to build and modify log pipelines, manage them with the Pipeline Scanner, and standardize attribute names across processed logs for consistency.
Datadog automatically parses JSON-formatted logs. For other formats, Datadog allows you to enrich your logs with the help of Grok Parser. The Grok syntax provides an easier way to parse logs than pure regular expressions. The Grok Parser enables you to extract attributes from semi-structured text messages.
Grok comes with reusable patterns to parse integers, IP addresses, hostnames, etc. These values must be sent into the grok parser as strings.
You can write parsing rules with the %{MATCHER:EXTRACT:FILTER} syntax:
Matcher: A rule (possibly a reference to another token rule) that describes what to expect (number, word, notSpace, etc.).
Extract (optional): An identifier representing the capture destination for the piece of text matched by the Matcher.
Filter (optional): A post-processor of the match to transform it.
Example for a classic unstructured log:
john connected on 11/08/2017
With the following parsing rule:
MyParsingRule %{word:user} connected on %{date("MM/dd/yyyy"):date}
After processing, the following structured log is generated:
{
"user": "john",
"date": 1575590400000
}
Note:
_, and .. It must start with an alphanumeric character.\n and \s+ to account for newlines and whitespace.Here is a list of all the matchers and filters natively implemented by Datadog:
Query-time and ingest-time matchers:
The following matchers are available for both query-time parsing (Log Explorer) and ingest-time parsing (Grok Parser):
word_ (underscore) character; and ends with a word boundary. Equivalent to \b\w+\b in regex.notSpacenumberintegerdata.* in regex. Use when none of above patterns is appropriate.Ingest-time only matchers:
The following matchers are only available for ingest-time parsing with the Grok Parser processor and cannot be used in the Log Explorer:
date("pattern"[, "timezoneId"[, "localeId"]])regex("pattern")boolean("truePattern", "falsePattern")true and false, ignoring case).numberStrnumberExtStrnumberExtintegerStrintegerExtStrintegerExtdoubleQuotedStringsingleQuotedStringquotedStringuuidmacipv4ipv6iphostnameipOrHostportQuery-time and ingest-time filters:
The following filters are available for both query-time parsing (Log Explorer) and ingest-time parsing (Grok Parser):
numberintegerIngest-time only filters:
The following filters are only available for ingest-time parsing with the Grok Parser processor and cannot be used in the Log Explorer:
booleannullIf("value")jsonrubyhash{name => "John", "job" => {"company" => "Big Company", "title" => "CTO"}}useragent([decodeuricomponent:true/false])querystring?productId=superproduct&promotionCode=superpromo).decodeuricomponent%2Fservice%2Ftest into /service/test.lowercaseuppercasekeyvalue([separatorStr[, characterAllowList[, quotingStr[, delimiter]]]])xmlcsv(headers[, separator[, quotingcharacter]])scale(factor)array([[openCloseStr, ] separator][, subRuleOrFilter)urlUse the Advanced Settings section at the bottom of your Grok processor to parse a specific attribute instead of the default message attribute, or to define helper rules that reuse common patterns across multiple parsing rules.
Use the Extract from field to apply your Grok processor on a given text attribute instead of the default message attribute.
For example, consider a log containing a command.line attribute that should be parsed as a key-value. Extract from command.line to parse its contents and create structured attributes from the command data.
Use the Helper Rules field to define tokens for your parsing rules. Helper rules let you reuse common Grok patterns across your parsing rules. This is useful when you have several rules in the same Grok parser that use the same tokens.
Example for a classic unstructured log:
john id:12345 connected on 11/08/2017 on server XYZ in production
Use the following parsing rule:
MyParsingRule %{user} %{connection} %{server}
With the following helpers:
user %{word:user.name} id:%{integer:user.id}
connection connected on %{date("MM/dd/yyyy"):connect_date}
server on server %{notSpace:server.name} in %{notSpace:server.env}
Some examples demonstrating how to use parsers:
This is the key-value core filter: keyvalue([separatorStr[, characterAllowList[, quotingStr[, delimiter]]]]) where:
separatorStr: defines the separator between key and values. Defaults to =.characterAllowList: defines extra non-escaped value chars in addition to the default \\w.\\-_@. Used only for non-quoted values (for example, key=@valueStr).quotingStr: defines quotes, replacing the default quotes detection: <>, "", ''.delimiter: defines the separator between the different key values pairs (for example, |is the delimiter in key1=value1|key2=value2). Defaults to (normal space), , and ;.Use filters such as keyvalue to more-easily map strings to attributes for keyvalue or logfmt formats:
Log:
user=john connect_date=11/08/2017 id=123 action=click
Rule:
rule %{data::keyvalue}
You don’t need to specify the name of your parameters as they are already contained in the log.
If you add an extract attribute my_attribute in your rule pattern you will see:
{
"my_attribute": {
"user": "john",
"id": 123,
"action": "click"
}
}
If = is not the default separator between your key and values, add a parameter in your parsing rule with a separator.
Log:
user: john connect_date: 11/08/2017 id: 123 action: click
Rule:
rule %{data::keyvalue(": ")}
If logs contain special characters in an attribute value, such as / in a url for instance, add it to the allowlist in the parsing rule:
Log:
url=https://app.datadoghq.com/event/stream user=john
Rule:
rule %{data::keyvalue("=","/:")}
Other examples:
| Raw string | Parsing rule | Result |
|---|---|---|
| key=valueStr | %{data::keyvalue} | {“key”: “valueStr”} |
| key=<valueStr> | %{data::keyvalue} | {“key”: “valueStr”} |
| “key”=“valueStr” | %{data::keyvalue} | {“key”: “valueStr”} |
| key:valueStr | %{data::keyvalue(":")} | {“key”: “valueStr”} |
| key:"/valueStr" | %{data::keyvalue(":", "/")} | {“key”: “/valueStr”} |
| /key:/valueStr | %{data::keyvalue(":", "/")} | {"/key": “/valueStr”} |
| key:={valueStr} | %{data::keyvalue(":=", "", "{}")} | {“key”: “valueStr”} |
| key1=value1|key2=value2 | %{data::keyvalue("=", "", "", "|")} | {“key1”: “value1”, “key2”: “value2”} |
| key1=“value1”|key2=“value2” | %{data::keyvalue("=", "", "", "|")} | {“key1”: “value1”, “key2”: “value2”} |
Multiple QuotingString example: When multiple quotingstring are defined, the default behavior is replaced with a defined quoting character.
The key-value always matches inputs without any quoting characters, regardless of what is specified in quotingStr. When quoting characters are used, the characterAllowList is ignored as everything between the quoting characters is extracted.
Log:
key1:=valueStr key2:=</valueStr2> key3:="valueStr3"
Rule:
rule %{data::keyvalue(":=","","<>")}
Result:
{"key1": "valueStr", "key2": "/valueStr2"}
Note:
key=) or null values (key=null) are not displayed in the output JSON.data object, and this filter is not matched, then an empty JSON {} is returned (for example, input: key:=valueStr, parsing rule: rule_test %{data::keyvalue("=")}, output: {})."" as quotingStr keeps the default configuration for quoting.The date matcher transforms your timestamp in the EPOCH format (unit of measure millisecond).
| Raw string | Parsing rule | Result |
|---|---|---|
| 14:20:15 | %{date("HH:mm:ss"):date} | {“date”: 51615000} |
| 02:20:15 PM | %{date("hh:mm:ss a"):date} | {“date”: 51615000} |
| 11/10/2014 | %{date("dd/MM/yyyy"):date} | {“date”: 1412978400000} |
| Thu Jun 16 08:29:03 2016 | %{date("EEE MMM dd HH:mm:ss yyyy"):date} | {“date”: 1466065743000} |
| Tue Nov 1 08:29:03 2016 | %{date("EEE MMM d HH:mm:ss yyyy"):date} | {“date”: 1466065743000} |
| 06/Mar/2013:01:36:30 +0900 | %{date("dd/MMM/yyyy:HH:mm:ss Z"):date} | {“date”: 1362501390000} |
| 2016-11-29T16:21:36.431+0000 | %{date("yyyy-MM-dd'T'HH:mm:ss.SSSZ"):date} | {“date”: 1480436496431} |
| 2016-11-29T16:21:36.431+00:00 | %{date("yyyy-MM-dd'T'HH:mm:ss.SSSZZ"):date} | {“date”: 1480436496431} |
| 06/Feb/2009:12:14:14.655 | %{date("dd/MMM/yyyy:HH:mm:ss.SSS"):date} | {“date”: 1233922454655} |
| 2007-08-31 19:22:22.427 ADT | %{date("yyyy-MM-dd HH:mm:ss.SSS z"):date} | {“date”: 1188598942427} |
| Thu Jun 16 08:29:03 20161 | %{date("EEE MMM dd HH:mm:ss yyyy","Europe/Paris"):date} | {“date”: 1466058543000} |
| Thu Jun 16 08:29:03 20161 | %{date("EEE MMM dd HH:mm:ss yyyy","UTC+5"):date} | {“date”: 1466047743000} |
| Thu Jun 16 08:29:03 20161 | %{date("EEE MMM dd HH:mm:ss yyyy","+3"):date} | {“date”: 1466054943000} |
1 Use the timezone parameter if you perform your own localizations and your timestamps are not in UTC.
The supported format for timezones are:
GMT, UTC, UT or Z+hh:mm, -hh:mm, +hhmm, -hhmm. The maximum supported range is from +18:00 to -18:00 inclusive.UTC+, UTC-, GMT+, GMT-, UT+ or UT-. The maximum supported range is from +18:00 to -18:00 inclusive.Note: Parsing a date doesn’t set its value as the log official date. For this use the Log Date Remapper in a subsequent Processor.
If you have logs with two possible formats which differ in only one attribute, set a single rule using alternating with (<REGEX_1>|<REGEX_2>). This rule is equivalent to a Boolean OR.
Log:
john connected on 11/08/2017
12345 connected on 11/08/2017
Rule: Note that “id” is an integer and not a string.
MyParsingRule (%{integer:user.id}|%{word:user.firstname}) connected on %{date("MM/dd/yyyy"):connect_date}
Results:%{integer:user.id}
{
"user": {
"id": 12345
},
"connect_date": 1510099200000
}
%{word:user.firstname}
{
"user": {
"firstname": "john"
},
"connect_date": 1510099200000
}
Some logs contain values that only appear part of the time. In this case, make attribute extraction optional with ()?.
Log:
john 1234 connected on 11/08/2017
john connected on 11/08/2017
Rule:
MyParsingRule %{word:user.firstname} (%{integer:user.id} )?connected on %{date("MM/dd/yyyy"):connect_date}
Note: A rule will not match if you include a space after the first word in the optional section.
Result:(%{integer:user.id} )?
{
"user": {
"firstname": "john",
"id": 1234
},
"connect_date": 1510099200000
}
%{word:user.firstname} (%{integer:user.id} )?
{
"user": {
"firstname": "john",
},
"connect_date": 1510099200000
}
Use the json filter to parse a JSON object nested after a raw text prefix:
Log:
Sep 06 09:13:38 vagrant program[123]: server.1 {"method":"GET", "status_code":200, "url":"https://app.datadoghq.com/logs/pipelines", "duration":123456}
Rule:
parsing_rule %{date("MMM dd HH:mm:ss"):timestamp} %{word:vm} %{word:app}\[%{number:logger.thread_id}\]: %{notSpace:server} %{data::json}
Result:
{
"timestamp": 1567761218000,
"vm": "vagrant",
"app": "program",
"logger": {
"thread_id": 123
}
}
Log:
john_1a2b3c4 connected on 11/08/2017
Rule:
MyParsingRule %{regex("[a-z]*"):user.firstname}_%{regex("[a-zA-Z0-9]*"):user.id} .*
Result:
{
"user": {
"firstname": "john",
"id": "1a2b3c4"
}
}
Use the array([[openCloseStr, ] separator][, subRuleOrFilter) filter to extract a list into an array in a single attribute. The subRuleOrFilter is optional and accepts these filters.
Log:
Users [John, Oliver, Marc, Tom] have been added to the database
Rule:
myParsingRule Users %{data:users:array("[]",",")} have been added to the database
Result:
{
"users": [
"John",
" Oliver",
" Marc",
" Tom"
]
}
Log:
Users {John-Oliver-Marc-Tom} have been added to the database
Rule:
myParsingRule Users %{data:users:array("{}","-")} have been added to the database
Rule using subRuleOrFilter:
myParsingRule Users %{data:users:array("{}","-", uppercase)} have been added to the database
Kubernetes components sometimes log in the glog format; this example is from the Kube Scheduler item in the Pipeline Library.
Example log line:
W0424 11:47:41.605188 1 authorization.go:47] Authorization is disabled
Parsing rule:
kube_scheduler %{regex("\\w"):level}%{date("MMdd HH:mm:ss.SSSSSS"):timestamp}\s+%{number:logger.thread_id} %{notSpace:logger.name}:%{number:logger.lineno}\] %{data:msg}
And extracted JSON:
{
"level": "W",
"timestamp": 1587728861605,
"logger": {
"thread_id": 1,
"name": "authorization.go"
},
"lineno": 47,
"msg": "Authorization is disabled"
}
The XML parser transforms XML formatted messages into JSON.
Log:
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
</book>
Rule:
rule %{data::xml}
Result:
{
"book": {
"year": "2005",
"author": "J K. Rowling",
"category": "CHILDREN",
"title": {
"lang": "en",
"value": "Harry Potter"
}
}
}
Notes:
value attribute is generated. For example: <title lang="en">Harry Potter</title> is converted to {"title": {"lang": "en", "value": "Harry Potter" } }<bookstore><book>Harry Potter</book><book>Everyday Italian</book></bookstore> is converted to { "bookstore": { "book": [ "Harry Potter", "Everyday Italian" ] } }Use the csv filter to more-easily map strings to attributes when separated by a given character (, by default).
The CSV filter is defined as csv(headers[, separator[, quotingcharacter]]) where:
headers: Defines the keys name separated by ,. Keys names must start with alphabetical character and can contain any alphanumerical character in addition to _.separator: Defines separators used to separate the different values. Only one character is accepted. Default: ,. Note: Use tab for the separator to represent the tabulation character for TSVs.quotingcharacter: Defines the quoting character. Only one character is accepted. Default: "Note:
"" within a quoted value represents ".Log:
John,Doe,120,Jefferson St.,RiversideRule:
myParsingRule %{data:user:csv("first_name,name,st_nb,st_name,city")}Result:
{
"user": {
"first_name": "John",
"name": "Doe",
"st_nb": 120,
"st_name": "Jefferson St.",
"city": "Riverside"
}
}Other examples:
| Raw string | Parsing rule | Result |
|---|---|---|
John,Doe | %{data::csv("firstname,name")} | {“firstname”: “John”, “name”:“Doe”} |
"John ""Da Man""",Doe | %{data::csv("firstname,name")} | {“firstname”: “John "Da Man"”, “name”:“Doe”} |
'John ''Da Man''',Doe | %{data::csv("firstname,name",",","'")} | {“firstname”: “John ‘Da Man’”, “name”:“Doe”} |
John|Doe | %{data::csv("firstname,name","|")} | {“firstname”: “John”, “name”:“Doe”} |
value1,value2,value3 | %{data::csv("key1,key2")} | {“key1”: “value1”, “key2”:“value2”} |
value1,value2 | %{data::csv("key1,key2,key3")} | {“key1”: “value1”, “key2”:“value2”} |
value1,,value3 | %{data::csv("key1,key2,key3")} | {“key1”: “value1”, “key3”:“value3”} |
Value1 Value2 Value3 (TSV) | %{data::csv("key1,key2,key3","tab")} | {“key1”: “value1”, “key2”: “value2”, “key3”:“value3”} |
If you have a log where after you have parsed what is needed and know that the text after that point is safe to discard, you can use the data matcher to do so. For the following log example, you can use the data matcher to discard the % at the end.
Log:
Usage: 24.3%
Rule:
MyParsingRule Usage\:\s+%{number:usage}%{data:ignore}
Result:
{
"usage": 24.3,
"ignore": "%"
}
If your logs contain ASCII control characters, they are serialized upon ingestion. These can be handled by explicitly escaping the serialized value within your grok parser.
Additional helpful documentation, links, and articles:
| |