VOOZH about

URL: https://thenewstack.io/mastering-deadman-alerts-to-prevent-silent-failures/

⇱ Mastering Deadman Alerts To Prevent Silent Failures - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-07-28 08:00:05
Mastering Deadman Alerts To Prevent Silent Failures
sponsor-influxdata,sponsored-post-contributed,
Databases / Observability / Software Testing

Mastering Deadman Alerts To Prevent Silent Failures

Unlike alerts that trigger when values exceed expected ranges, deadman alerts fire when expected data simply doesn't arrive.
Jul 28th, 2025 8:00am by Anais Dotis Georgiou
👁 Featued image for: Mastering Deadman Alerts To Prevent Silent Failures
Featured image by A. C. on Unsplash+.
InfluxData sponsored this post.

In the world of monitoring and observability, silence often speaks louder than noise. When your Internet of Things (IoT) sensors stop reporting, your application metrics go dark or your system logs cease flowing, these gaps in data can signal critical failures that demand immediate attention. Missing data isn’t just an inconvenience; it’s often the first indicator of network outages, device failures, security breaches or stalled processes that could cascade into major operational disruptions.

This is where deadman alerts become invaluable. Unlike traditional threshold-based alerts that trigger when values exceed expected ranges, deadman alerts fire when expected data simply doesn’t arrive. They’re an early warning system for the silent failures that might otherwise go unnoticed until it’s too late.

This tutorial explores implementing deadman checks using the InfluxDB 3 time series database and its Python Processing Engine with schedule triggers — specifically, deadman triggers. These specialized triggers provide immediate notification when anticipated data streams fall silent, offering a crucial layer of operational visibility that can mean the difference between catching issues early and discovering them after significant damage has occurred.

Why Time Series Databases Excel at Deadman Alerts

Time series databases are uniquely positioned to handle deadman alert scenarios, particularly in DevOps environments where monitoring infrastructure health is paramount. Here’s why this combination is so powerful:

Temporal precision and context: Time series databases inherently understand time as a first-class citizen. They can efficiently query for data gaps, calculate time-based aggregations and maintain historical context about when data was last received. This temporal awareness is crucial for deadman alerts, which fundamentally depend on time-based thresholds.

High-performance gap detection: Traditional relational databases struggle with time-based queries across large datasets. Time series databases are optimized for these operations, making it efficient to scan millions of data points to determine if recent writes have occurred within specified time windows.

DevOps-centric benefits: In DevOps workflows, deadman alerts serve multiple critical functions:

  • Infrastructure monitoring: Detect when servers, containers or services stop reporting health metrics.
  • Pipeline reliability: Identify when data ingestion pipelines, ETL jobs or streaming processes stall.
  • Application health: Monitor when applications stop sending telemetry, logs or performance metrics.
  • Distributed system oversight: Track when microservices or distributed components become unresponsive.
  • Compliance and SLA monitoring: Ensure continuous data flow for regulatory requirements and service-level agreements (SLAs).

Scalability for modern operations: DevOps teams often manage hundreds or thousands of monitored endpoints. Time series databases can handle the scale and cardinality required to track individual deadman states across massive infrastructures while maintaining query performance.

Integration with existing toolchains: Time series databases naturally integrate with popular DevOps tools like Grafana, Prometheus and various alerting platforms, making deadman alerts part of a comprehensive monitoring strategy.

Getting Started With Deadman Alerts

The InfluxDB deadman check plugin monitors target tables for recent writes and sends Slack alerts when no new data arrives within configurable time thresholds. This approach transforms silence into actionable intelligence.

This guide will walk you through:

  • Requirements and setup.
  • Configuring Slack webhook integration.
  • Creating and managing InfluxDB 3 resources.
  • Testing deadman alert functionality.
  • Leveraging the new Model Context Protocol (MCP) server for streamlined setup.

Requirements and Setup

Begin by downloading InfluxDB 3 Core or Enterprise, following the appropriate installation guide. While you can run this locally, we recommend Docker for simplified setup, better isolation and easier cleanup. This tutorial assumes a Docker containerized environment.

Ensure Docker is installed on your system and pull the latest InfluxDB 3 image for your chosen edition. I’ll use InfluxDB 3 Core as the open source option. If you need long-term storage and advanced features after setup, you can easily upgrade to 3 Enterprise.

After cloning the plugin repository, save the deadman alert file as deadman_alert.py in your configured plugin directory (e.g., /path/to/plugins/). Then execute:

docker run -it --rm --name test_influx -v \
 ~/influxdb3/data:/var/lib/influxdb3 \
 -v /path/to/plugins/:/plugins \
 -p 8181:8181 \
 quay.io/influxdb/influxdb3-core:latest serve \
 --node-id my_host \
 --object-store file \
 --data-dir /var/lib/influxdb3 \
 --plugin-dir /plugin

This command creates a temporary InfluxDB 3 Core container named test_influx using the latest image. It mounts your local data directory for persistence and the plugin directory containing the deadman check plugin. Port 8181 is exposed for local database access, and the server starts with file-based object storage (AWS S3 buckets are also supported), a custom node ID and the mounted plugin directory.

For Slack integration, follow the official documentation to create a webhook URL. You’ll need this webhook as an argument during trigger creation. Alternatively, use our public webhook for testing InfluxDB-related notifications, available in the #notifications-testing channel of the InfluxDB Slack.

Generating Deadman Alerts

Start by creating a database to monitor for heartbeat signals:

influxdb3 create database my_database

Write initial data to establish a baseline:

influxdb3 write --database my_database "sensor_data temp=20"

Create and enable the deadman trigger:

influxdb3 create trigger \
 --trigger-spec "every:10s" \
 --plugin-filename "deadman_alert.py" \
 --trigger-arguments table=sensor_data,threshold_minutes=1,slack_webhook=https://hooks.slack.com/services/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 \
 --database my_database \
 sensor_deadman

The deadman check plugin executes every 10 seconds, monitoring the sensor_data table in my_database for data written within the last minute. When data exists, you’ll see this log output:

INFO influxdb3_py_api::system_py: processing engine: Data exists in 'sensor_data' in the last 1 minutes.

If no data has been written within the threshold period, you’ll receive a Slack notification alerting you to the silence.

The trigger continues monitoring until disabled:

influxdb3 disable trigger --database my_database sensor_deadman
Trigger sensor_deadman disabled successfully

Streamlining Setup With the MCP Server

The new InfluxDB MCP server enables you to manage deadman alerts and time series infrastructure through natural language interactions. This open source service connects InfluxDB 3 to AI tools like Claude Desktop, eliminating the need for manual command-line operations.

Database Management

Instead of manually creating databases and configuring triggers, you can use natural language prompts:

  • “Create a new database called ‘production_monitoring’ for deadman alert monitoring”
  • “Set up a deadman trigger for the ‘api_health’ table with a 5-minute threshold”
  • “Configure a Slack webhook for the sensor monitoring alerts”

Operational Efficiency

The MCP server transforms complex time series operations into conversational workflows:

Schema exploration: Ask “What tables exist in my monitoring database?” or “Show me the structure of the sensor_data table” to understand your data landscape without writing queries.

Token management: Manage authentication through prompts like “Create a read-only token for the monitoring team” or “List all active admin tokens.”

Health monitoring: Get real-time status updates with requests like “Check the connection status of my InfluxDB instance” or “Show me recent write activity across all databases.”

Query Generation

The MCP server can analyze your schema and generate appropriate deadman alert queries:

  • “Find all tables that haven’t received data in the last hour.”
  • “Show me the last write time for each measurement in the production database.”
  • “Identify sensors that stopped reporting in the past 24 hours.”

Final Thoughts and Next Steps

Deadman alerts represent a critical component of comprehensive monitoring strategies, particularly in DevOps environments where silence often indicates serious issues.

This deadman check plugin provides real-time monitoring of data pipeline durability, helping you maintain operational visibility across your infrastructure. We encourage you to explore the InfluxData/influxdb3_plugins repository for additional examples and contribute your own plugins to the community.

The future of monitoring lies in intelligent, proactive systems that can detect problems before they escalate. Deadman alerts are a crucial piece of that puzzle, and InfluxDB 3’s processing engine can be a good option to build robust, scalable monitoring solutions that keep your systems running smoothly.

InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
Learn More
The latest from InfluxData
TRENDING STORIES
Anais Dotis-Georgiou is a developer advocate for InfluxData with a passion for making data beautiful with the use of data analytics, AI, and machine learning. She takes the data that she collects, does a mix of research, exploration, and engineering...
Read more from Anais Dotis Georgiou
InfluxData sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Docker.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Join the millions of developers using InfluxDB to predict, respond, and adapt in real-time.