In this tutorial we’ll learn how to use Python to get time-series data from the
OpenWeatherMap API and convert it to a
Pandas DataFrame. Next we’ll write that data to
InfluxDB, a time-series data platform, with the
InfluxDB Python Client. We’ll convert the JSON response from our API call to a Pandas DataFrame because I find that that’s the easiest way to write data to InfluxDB. We’re writing to InfluxDB because it’s a
purpose-built database that’s meant to handle the high ingest requirements and of
time-series data.
Requirements
This tutorial was executed on a macOS system with
Python 3 installed via Homebrew. I recommend setting up additional tooling like
virtualenv,
pyenv, or
conda-env to simplify Python and Client installations. Otherwise, the full requirements are here:
txt
influxdb-client=1.30.0
pandas=1.4.3
requests>=2.27.1
This tutorial also assumes that you have either created a
Free Tier InfluxDB Cloud account or are using
InfluxDB OSS. It also assumes that you have:
Finally, this tutorial requires that you have created an account with OpenWeatherMap and have
created a token.
InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
The latest from InfluxData
Request Weather Data
First, we need to request our data. We’ll use the request library to return hourly weather data from a specified longitude and latitude with the OpenWeatherMap API.
python
# Get time series data from OpenWeatherMap API
params = {'lat':openWeatherMap_lat, 'lon':openWeatherMap_lon, 'exclude': "minutely,daily", 'appid':openWeatherMap_token}
r = requests.get(openWeather_url, params = params).json()
hourly = r['hourly']
Convert the Data to a Pandas DataFrame
Next, convert the JSON data to a Pandas DataFrame. We’ll also convert the timestamp from a Unix timestamp with second precision to a datetime object. This conversion is being made because the InfluxDB write method requires that the timestamp is in datetime object format. We’ll use this method next to write our data to InfluxDB. We also drop columns that we don’t want to write to InfluxDB.
python
# Convert data to Pandas DataFrame and convert timestamp to datetime object
df = pd.json_normalize(hourly)
df = df.drop(columns=['weather', 'pop'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
print(df.head)
Write the Pandas DataFrame to InfluxDB
Anais Dotis-Georgiou
Anais is a developer advocate for InfluxData with a passion for making data beautiful with the use of data analytics, AI, and machine learning. She takes the data that she collects, does a mix of research, exploration and engineering to translate the data into something of function, value, and beauty.
Now instantiate the InfluxDB Python client library and write the DataFrame to InfluxDB. We specify a measurement name. A measurement contains data within a bucket. You can think of it as the second-highest hierarchical level of data organization in InfluxDB after a bucket.
You can also specify which columns to convert to tags with the `data_frame__tag_columns` parameter.
Since we haven’t specified any columns as tags, all of our columns will be converted into fields in InfluxDB. Tags are used to write metadata about your time-series data and can be used to query data subsets together more efficiently. Fields are where you store your actual time-series data within InfluxDB.
This documentation covers these data concepts within InfluxDB in more detail.
python
# Write data to InfluxDB
with InfluxDBClient(url=url, token=token, org=org) as client:
df = df
client.write_api(write_options=SYNCHRONOUS).write(bucket=bucket,record=df,
data_frame_measurement_name="weather",
data_frame_timestamp_column="dt")
Full Script
To review, let’s take a look at the full script all together. We followed the following steps:
- Import libraries
- Gather the following:
- InfluxDB bucket
- InfluxDB org
- InfluxDB token
- InfluxDB URL
- OpenWeatherMap URL
- OpenWeatherMap token
- Build your request.
- Convert the JSON response to a Pandas DataFrame.
- Drop any columns you don’t want to write to InfluxDB.
- Convert the timestamp column from unix time to a Pandas datetime object.
- Instantiate the InfluxDB Python Client Library.
- Write the DataFrame and specify the measurement name and timestamp column.
python
import requests
import influxdb_client
import pandas as pd
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
bucket = "OpenWeather"
org = "<my_InfluxDB_org>" # or email you used to create your Free Tier InfluxDB Cloud account
token = "<my_InfluxDB_token"
url = "<my_InfluxDB_url>" # for example, https://us-west-2-1.aws.cloud2.influxdata.com/
openWeatherMap_token = "<my_OpenWeatherMap_token>"
openWeatherMap_lat = "33.44"
openWeatherMap_lon = "-94.04"
openWeather_url = "https://api.openweathermap.org/data/2.5/onecall"
# Get time series data from OpenWeatherMap API
params = {'lat':openWeatherMap_lat, 'lon':openWeatherMap_lon, 'exclude': "minutely,daily", 'appid':openWeatherMap_token}
r = requests.get(openWeather_url, params = params).json()
hourly = r['hourly']
# Convert data to Pandas DataFrame and convert timestamp to datetime object
df = pd.json_normalize(hourly)
df = df.drop(columns=['weather', 'pop'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
print(df.head)
# Write data to InfluxDB
with InfluxDBClient(url=url, token=token, org=org) as client:
df = df
client.write_api(write_options=SYNCHRONOUS).write(bucket=bucket,record=df,
data_frame_measurement_name="weather",
data_frame_timestamp_column="dt")
Query the Data
Now that we’ve written our data to InfluxDB, we can use the InfluxDB UI to query for our data. Navigate to the Data Explorer (from the left-hand navigation bar). Use the
Query Builder to select for the data that you want to visualize and the range that you want to
visualize it for and then hit
Submit.
👁 ImageA default materialized view of our weather data. InfluxDB automatically aggregates the time-series data so that new users don’t accidentally query for too much data and get a timeout.
Pro Tip: InfluxDB automatically downsamples your data when you query for it with the
Query Builder. To query for your raw data, navigate to the
Script Editor to view the underlying Flux query.
Flux is the native query and scripting language for InfluxDB, which can be used for
analyzing and
creating forecasts with your time-series data. Uncomment or delete the line with the
aggregateWindow() function to view your raw data.
👁 Image
Navigate to the
Script Editor and uncomment or delete the aggregateWindow() function to view the raw weather data.
Conclusion
I hope this blog post inspires you to take advantage of the InfluxDB Python Client library to obtain and store time-series data in InfluxDB. If you want to learn more about using the Python Client Library to query data from InfluxDB, I encourage you to take a look at
this post. It’s also worth noting that you can use Flux to obtain data from the OpenWeatherMap API and store it with InfluxDB with a task. If you’re using InfluxDB Cloud this means that this Flux script will be hosted and executed regularly, so you can get a reliable stream of weather data into your instance. To learn more about how to obtain weather data with Flux on a user-defined schedule read
this post.
InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
The latest from InfluxData