VOOZH about

URL: https://thenewstack.io/creating-an-iot-data-pipeline-using-influxdb-and-aws/

⇱ Creating an IoT Data Pipeline Using InfluxDB and AWS - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-06-05 10:25:11
Creating an IoT Data Pipeline Using InfluxDB and AWS
sponsor-influxdata,sponsored-post-contributed,
Data / Edge Computing

Creating an IoT Data Pipeline Using InfluxDB and AWS

Reliable data pipelines are critical to industry, providing key information about the state of machinery and to train predictive models to improve operations.
Jun 5th, 2023 10:25am by Jason Myers
👁 Featued image for: Creating an IoT Data Pipeline Using InfluxDB and AWS
InfluxData sponsored this post.
The Internet of Things (IoT) and operations technology (OT) spaces are awash in time series data. Sensors churn out time-stamped data at a steady and voluminous pace, and wrangling all that data can be a challenge. But you don’t want to collect a bunch of data and let it just sit there, taking up storage space without providing any value. So, in the spirit of gaining value from data for IoT/OT, let’s look at how to configure a sensor using InfluxDB and Amazon Web Services (AWS) to collect the type of time-stamped data generated by industrial equipment. You can adapt, expand and scale the concepts here to meet the needs of full industrial production needs. Here’s what we’ll use in the example and why.
  • InfluxDB: This time series database just released an updated version that outperforms previous versions in several key areas, such as query performance against high cardinality data, compression and data ingest. Critically for this example, InfluxDB’s cloud products are available in AWS. Having everything in the same place reduces the number of data transfers and data-latency issues in many instances.
  • Telegraf: Telegraf is an open source, plugin-based data-collection agent. Lightweight and written in Go, you can deploy Telegraf instances anywhere. With more than 300 available plugins, Telegraf can collect data from any source. You can also write custom plugins to collect data from any source that doesn’t already have a plugin.
  • AWS: Amazon Web Services has a whole range of tools and services geared toward IoT. We can tap into some of these services to streamline data processing and analysis.
  • M5stickC+: This is a simple IoT device that can detect a range of measurements, including position, acceleration metrics, gyroscope metrics, humidity, temperature and pressure. This device provides multiple data streams, which is similar to what industrial operators face with manufacturing equipment.
InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
Learn More
The latest from InfluxData

InfluxDB and AWS IoT in Action

The following example illustrates one of many possible data pipelines that you can scale to meet the needs of a range of industrial equipment. That may include machines on the factory floor or distributed devices in the field. Let’s start with a quick overview; then we’ll dive into the details. The basic data flow is: Device → AWS IoT Core → MQTT (rules/routing) → Kinesis → Telegraf → InfluxDB → Visualization 👁 Image
The M5stickC+ device is set up to authenticate directly with AWS. The data comes into AWS IoT Core, which is an AWS service that uses MQTT to add the data to a topic that’s specific to the device. Then there’s a rule that selects any data that goes through the topic and redirects it to the monitoring tool, Amazon Kinesis. A virtual machine running in AWS also runs two Docker containers. One contains an instance of Telegraf and the other an instance of InfluxDB. Telegraf collects the data stream from Kinesis and writes it to InfluxDB. Telegraf also uses a DynamoDB table as a checkpoint in this setup so if the container or the instance goes down, it will restart at the correct point in the data stream when the application is up again. Once the data is in InfluxDB, we can use it to create visualizations. OK, so that’s the basic data pipeline. Now, how do we make it work?

Device Firmware

The first step is creating a data connection from the M5 device to AWS. To accomplish this we use UI Flow, a drag-and-drop editor that’s part of the M5 stack. Check out the blocks in the image below to get a sense for what the device is collecting and how these things map to our eventual output. 👁 Image
We can see here that this data publishes to the MQTT topic m5sticks/MyThingGG/sensors.

AWS IoT Core Rules

With the device data publishing to an MQTT broker, next we need to subscribe to that topic in AWS IoT Core. In the topic filter field, enter “`m5sticks/+/sensors“` to ensure that all the data from the device ends up in the MQTT topic. Next, we need to create another rule to ensure that the data in the MQTT topic goes to Kinesis. In IoT Core, you can use a SQL query to accomplish this: SELECT *, topic(2) as thing, 'sensors' as measurement, timestamp() as timestamp FROM 'm5sticks/+/sensors' In an industrial setting, each device should have a unique name. So, to scale this data pipeline to accommodate multiple devices, we use the “`+“ wildcard in the MQTT topic to ensure that all the data from all devices end up in the correct place. This query adds a timestamp to the data so that it adheres to line protocol, InfluxDB’s data model.

Telegraf Policy

Now that data is flowing from the device to Kinesis, we need to get it into InfluxDB. To do that, we use Telegraf. The code below is the policy that determines how Telegraf interacts with Kinesis. It enables Telegraf to read from the Kinesis stream and enables read and write access to DynamoDB for checkpointing.
{
	"Version": "2012-10-17"
	"Statement": [
		{
			"Sid": "AllowReadFromKinesis",
			"Effect": "Allow",
			"Action": [
				"kinesis:GetShardIterator",
				"kinesis:GetRecords",
				"kinesis:DescribeStream"
			],
			"Resource": [
				"arn:aws:kinesis:eu-west-3:xxxxxxxxx:stream/InfluxDBStream"
			]
		},
		{
			"Sid": "AllowReadAndWriteDynamoDB",
			"Effect": "Allow",
			"Action": [
				"dynamodb:PutItem",
				"dynamodb:GetItem"
			],
			"Resource": [
				"arn:aws:kinesis:eu-west-3:xxxxxxxxx:table/influx-db-telegraf"
			]		
		}
	]
}

Telegraf Config

The following Telegraf config uses Docker container networking, the Telegraf Kinesis Consumer plugin to read the data from Kinesis and the InfluxDB v2 output plugin to write that data to InfluxDB. Note that the string fields match the values from the device firmware UI.
[agent]
debug = true

[[outputs.influxdb_v2]]

## The URLs of the InfluxDB Cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## urls exp: http://127.0.0.1:8086
urls = ["http://influxdb:8086"]

## Token for authentication.
token = "toto-token"

## Organization is the name of the organization you wish to write to; must exist.
organization = "aws"

## Destination bucket to write into.
bucket = "toto-bucket"


[[inputs.kinesis_consumer]]
## Amazon REGION of kinesis endpoint.
region = "eu-west-3"

## Amazon Credentials
## Credentials are loaded in the following order
## 1) Web identity provider credentials via STS if role_arn and web_identity_token_file are specified
## 2) Assumed credentials via STS if role_arn is specified
## 3) explicit credentials from 'access_key' and 'secret_key'
## 4) shared profile from 'profile'
## 5) environment variables
## 6) shared credentials file
## 7) EC2 Instance Profile

## Endpoint to make request against, the correct endpoint is automatically
## determined and this option should only be set if you wish to override the
## default.
## ex: endpoint_url = "http://localhost:8000"
# endpoint_url = ""

## Kinesis StreamName must exist prior to starting telegraf.
streamname = "InfluxDBStream"

## Shard iterator type (only 'TRIM_HORIZON' and 'LATEST' currently supported)
# shard_iterator_type = "TRIM_HORIZON"

## Max undelivered messages
## This plugin uses tracking metrics, which ensure messages are read to
## outputs before acknowledging them to the original broker to ensure data
## is not lost. This option sets the maximum messages to read from the
## broker that have not been written by an output.
##
## This value needs to be picked with awareness of the agent's
## metric_batch_size value as well. Setting max undelivered messages too high
## can result in a constant stream of data batches to the output. While
## setting it too low may never flush the broker's messages.
# max_undelivered_messages = 1000

## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json"

## Tag keys is an array of keys that should be added as tags.
tag_keys = [
	"thing"
]

## String fields is an array of keys that should be added as string fields.
json_string_fields = [
	"pressure",
	"xGyr",
	"yAcc",
	"batteryPower",
	"xAcc",
	"temperature",
	"zAcc",
	"zGyr",
	"y",
	"x",
	"yGry",
	"humidity"

]

## Name key is the key used as the measurement name.
json_name_key = "measurement"

## Time key is the key containing the time that should be used to create the
## metric.
json_time_key "timestamp"

## Time format is the time layout that should be used to interpret the
## json_time_key. The time must be 'unix', 'unix_ms' or a time in the 
## "reference_time".
## ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
## json_time_format = "2006-01-02T15:04:05Z07:00"
## json_time_format = "unix"
## json_time_format = "unix_ms"
json_time_format = "unix_ms"

## Optional
## Configuration for a dynamodb checkpoint
[inputs.kinesis_consumer.checkpoint_dynamodb]
## unique name for this consumer
app_name = "default"
table_name = "influx-db-telegraf"

Docker Compose

This Docker compose file gets uploaded to an EC2 instance using SSH.
version: '3'

services:
 influxdb:
 image: influxdb:2.0.6
 volumes:
 # Mount for influxdb data directory and configuration
 - influxdbv2:/root/.influxdbv2
 ports:
 - "8086:8086"
# Use the influx cli to set up an influxdb instance
 influxdb_cli:
 links:
 - influxdb
 image: influxdb:2.0.6
# Use these same configurations parameters in you telegraf configuration, mytelegraf.conf
	# Wait for the influxd service in the influxdb has been fully bootstrapped before trying to set up an influxdb
 restart: on-failure:10
 depends_on:
 	- influxdb
 telegraf:
 image: telegraf:1.25-alpine
 links:
 - influxdb
 volumes:
 # Mount for telegraf config
 - ./telegraf.conf:/etc/telegraf/telegraf.conf
 depends_on:
 - influxdb_cli

volumes:
 influxdbv2:

Visualization

Once your data is in InfluxDB, users running InfluxDB can create visualizations and dashboards using your tool of choice. InfluxDB offers a native integration with Grafana and supports SQL queries using Flight SQL-compatible tools.

Conclusion

Building reliable data pipelines is a critical aspect of industrial operations. This data provides key information about the current state of machinery and equipment, and can train predictive models used to improve machinery operations and effectiveness. Combining leading technologies like InfluxDB and AWS provides the tools necessary to capture, store, analyze and visualize this critical process data. The use case described here is just one way to build a data pipeline. You can update, scale and amend it to accommodate a wide range of industrial IoT/OT operations or build something completely custom using the range of IoT solutions available for InfluxDB and AWS.
InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
Learn More
The latest from InfluxData
TRENDING STORIES
Jason Myers earned a PhD in modern Irish history from Loyola University Chicago. Since then, he has used the writing skills he developed in his academic work to create content for a range of startup and technology companies. When he's...
Read more from Jason Myers
InfluxData sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, Docker, Statement.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Join the millions of developers using InfluxDB to predict, respond, and adapt in real-time.