![]() |
VOOZH | about |
This page outlines operational best practices for running BYOC (Bring Your Own Cloud) Logs in production to help reduce common deployment issues.
Set up BYOC Logs monitoring before sending production traffic. Without monitoring, diagnosing ingestion or search issues is difficult.
indexed_events.count, search_requests.count, disk.available_space.gauge.See Monitor BYOC Logs for detailed setup instructions.
Always use the indexer.podSize and searcher.podSize parameters in the Helm chart instead of manually setting CPU and memory resource limits.
The podSize parameter automatically configures component-specific settings (cache sizes, queue limits, concurrent search settings) that are tuned for each tier. Manually setting resources without these tuned values can lead to degraded performance.
indexer:podSize:xlarge # 4 vCPUs, 16 GB RAM + tuned settingssearcher:podSize:2xlarge # 8 vCPUs, 32 GB RAM + tuned settingsSee Helm chart sizing tiers for the full list of available sizes and their configurations.
Indexers use a write-ahead log (WAL) to temporarily buffer data before uploading splits to object storage. Configure persistent volumes to prevent data loss if an indexer pod restarts.
Recommended configuration:
gp3 on AWS, Persistent Disk on GCP, Managed Disks on Azure)Note: Local SSDs are not recommended because the WAL is not replicated. Ephemeral disks can result in data loss if the disk fails. Use network-attached storage for built-in redundancy, and always enable persistent volumes for production deployments.
Example Helm values:
indexer:persistentVolume:enabled:truestorage:250GistorageClass:gp3Indexer disk usage grows as the WAL accumulates data and during merge operations. If disk space runs out, indexers stop ingesting logs.
disk.available_space.gauge that notifies you when available space drops below 20% of total capacity.pending_merge_ops.gauge. A growing backlog of pending merges can indicate indexers are falling behind.Searcher sizing depends more on query patterns than ingestion volume:
status:error AND service:web): Lower resource requirementsIf you observe search timeouts or slow dashboard loads, adjust capacity:
searcher.podSize for more memory per pod (especially for aggregation-heavy workloads).On AWS, BYOC Logs can offload leaf search operations to AWS Lambda. Instead of provisioning searcher pods for peak query load, Lambda handles overflow automatically.
This is useful when:
With Lambda offloading enabled, you can run fewer searcher pods sized for your baseline query load, and let Lambda absorb spikes. See Lambda Search Offloading for setup instructions.
BYOC Logs improvements and bug fixes are delivered through Helm chart updates.
Refresh the Datadog repository and upgrade to the latest chart version with your existing values file:
helm repo update datadog
helm upgrade <RELEASE_NAME> datadog/cloudprem \
--namespace datadog-byoc-logs \
--values values.yaml
To list the available chart versions before upgrading:
helm search repo datadog/cloudprem --versions
Additional helpful documentation, links, and articles:
| |