![]() |
VOOZH | about |
This page provides troubleshooting guidance for common issues you may encounter when deploying or operating Datadog BYOC (Bring Your Own Cloud) Logs. It includes typical error messages, diagnostic steps, and tips for resolving problems related to access permissions, storage configuration, and component health. Use this guide to quickly diagnose issues or to gather context before reaching out to Datadog support.
Check pod events:
kubectl describe pod -n datadog-byoc-logs <pod-name>
Common issues:
kubectl describe nodeskubectl get secrets -n datadog-byoc-logsThe most common errors come from access permissions to the object storage or to the metastore. To troubleshoot, use kubectl and verify logs from BYOC Logs components: indexer pods, metastore pods, and searcher pods.
Error: failed to connect to metastore: connection error: pool timed out
Solution: Verify that PostgreSQL is reachable from the cluster:
kubectl run psql-client \
--rm -it \
--image=bitnami/postgresql:latest \
--command -- psql "host=<HOST> port=<PORT> dbname=<DATABASE> user=<USERNAME> password=<PASSWORD>"
Common causes:
byoc-logs-metastore-uri secretError: failed to connect to metastore: invalid port number
Solution: Confirm the password in the metastore URI is URL-encoded. Special characters must be escaped:
# Correct format
postgresql://user:abc%2Fdef%2Bghi%3D@host:5432/byoc-logs
# Incorrect format (fails)
postgresql://user:abc/def+ghi=@host:5432/byoc-logs
Symptom: Indexer logs repeatedly show:
ERRORquickwit:commandfailederror=metastoreerror`index`datadog`notfound`The cluster eventually crashes, and the BYOC Logs console shows multiple clusters where you expect one.
Cause: The metastore URI is not set correctly, so the metastore falls back to a local file-backed store. Each time the metastore pod restarts, the file is wiped and a fresh metastore is created—all index metadata is lost. An earlier error in the logs often points to the misconfiguration:
ERROR quickwit: command failed error=failed to resolve metastore uri postgresql://user:***redacted***@<host>/<database>
Solution: Verify the metastore URI is set correctly. Port-forward to the metastore pod and inspect the running configuration:
kubectl port-forward -n datadog-byoc-logs <pod-name> 7280:7280
curl -s http://localhost:7280/api/v1/config
Confirm metastore_uri points to your PostgreSQL instance. If the password contains special characters, verify it is URL-encoded (see Metastore cannot connect to PostgreSQL).
If you set the wrong AWS/GKE/Azure credentials or region, you see this error message with kind Unauthorized or Internal in the logs of your indexers:
Command failed: Another error occurred. `Metastore error`. Cause: `StorageError(kind=Unauthorized, source=failed to fetch object: s3://my-bucket/datadog-index/some-id.split)`
Action: Check if your pod has access to the bucket.
Symptom: Indexer logs show ingest errors, or the ingest_requests.count metric shows failures in the OOTB dashboard.
Common causes:
pending_merge_ops.gauge metric. If merge operations are backing up, indexers need more CPU or additional pods.disk.available_space.gauge. If the write-ahead log (WAL) fills up, indexers stop accepting new data. Increase persistent volume size or add more indexer pods.In case you use Observability Pipelines in front of BYOC Logs, you will need to check what’s happening there, see OP Scaling and Performance.
A low rate of 429 errors is not a problem in itself. The Datadog Agent or Observability Pipelines buffers payloads and retries the request automatically.
429s usually mean the cluster is temporarily short on shards. Common triggers:
The real concern is sustained 429s that overflow the client buffer. If 429s persist, the cluster is likely undersized—add indexer pods or increase indexer.podSize.
Monitor for client-side log loss: Watch the following Datadog Agent metrics to detect dropped logs:
| Metric | What it measures |
|---|---|
datadog.logs.bytes_missed | Bytes from logs that could not be tailed after a file rotation (the Agent had not finished reading the previous file). |
datadog.logs_client_http_destination.payloads_dropped | Log payloads dropped because of HTTP errors. The Agent retries on almost all errors, so a non-zero value indicates a real issue. |
If either metric is rising, contact Datadog support.
Symptom: Search queries in Log Explorer return errors or take longer than expected. Dashboard widgets show “No Data” or loading spinners.
Common causes:
searcher.podSize.*:*abcd*) are significantly more expensive than prefix or term queries targeting one single field. Consider using more specific fields and query terms.Additional helpful documentation, links, and articles:
| |