8 Visualizations with Python to Handle Multiple Time-Series Data
Visualization ideas for coping with overlapping lines in multiple time-series plots.
Dr. Strange, a fictional character, is my favorite superhero. One of his incredible abilities is seeing the possibilities of events. It would be cool if I could see many things simultaneously like him. In fact, I can do that without having any superpower. Just a correct type of visualization and Python are enough.
Multiple Time-Series Data
A time-series plot with a single line is a helpful graph to express data with long sequences. It consists of an X-axis representing the timeline and a Y-axis showing the value. This is a standard method since the concept is simple and easy to understand. The plot can help us extract some insight information such as trends and seasonal effects.
However, many lines on the multiple time-series plot can make things difficult. Let’s consider the examples below.
This may seem exaggerated, but sometimes it is possible to encounter this kind of data, as will be shown next. The first picture is hard to distinguish between the lines, and it is hard to read. With the same dataset, the Radial Plots in the second picture, which will be explained as an idea in this article, help handle the overlapping plot.
This article will demonstrate 8 visualization ideas with Python code to cope with the chaos in plotting multiple time-series data. Let´s get started.
Get Data
To work with a real case example, I will use Air Pollution in Seoul dataset from Kaggle(link). The data was provided by the Seoul Metropolitan Government. It is about air pollution information which consists of SO2, NO2, CO, O3, PM10, and PM2.5 between 2017 and 2019 from 25 districts in Seoul, the capital city of South Korea.
In this article, PM2.5 from 25 districts will be the primary variable plotted as multiple time-series lines. PM2.5 is defined as a fine particle matter with a diameter smaller than 2.5 µm. It is considered a type of pollution that causes short-term health effects.
Visualizing PM2.5 from many locations helps compare how pollution affects the city.
Import Data
Start with import libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
Read Measurement_summary.csv with Pandas.
df = pd.read_csv('<file locaion>/Measurement_summary.csv')
df.head()
Explore Data
Now that we have already imported the dataset. Continue with checking the missing value and the data type of each column.
df.info()
The good news is that there is no missing value. The next step is to check the number of distinct station codes.
df['Station code'].nunique()
###output:
###25
There are 25 stations in total. Check the station codes.
list_scode = list(set(df['Station code']))
list_scode
###output:
### [101, 102, 103, 104, ..., 125]
Preprocess Data
From 101 to 125, the Station codes represent the districts in Seoul. Personally, using the district names is more convenient for labeling the visualization since it is more convenient to read. The names will be exacted from the ‘Address’ column to create the ‘District’ column.
list_add = list(df['Address'])
District = [i.split(', ')[2] for i in list_add]
df['District'] = District
Create a list with the 25 district names for use later.
list_district = list(set(District))
Prepare another three columns, YM(Year-Month), Year, and Month to apply with some graphs. For easier visualizing, we will group them into average monthly DataFrame.
Plot data
Now that everything is ready, let´s do the multiple time-series plot.
Overlapping lines are hard to read. In 2017, it can be seen that the amount of PM2.5 in many stations went in the same direction. However, in 2018 and 2019, pollution lines went arbitrarily, and it is hard to distinguish.
Visualization Ideas
The main purpose of this article is to guide with some visualization ideas with Python to handle Multiple Time-Series Data.
Before continuing, I need to clarify something. The visualizations recommended in this article are mainly for coping with the overlapping plots since it is a main problem in plotting multiple time-series data, as we have already seen.
Each graph has its pros and cons. Obviously, nothing is perfect. Some may be just for an eye-catching effect. But all of them have the same purpose comparing sequences between categories.
1. Changing nothing but making the plot interactive.
Plotly is a graphing library for making interactive graphs. The interactive chart helps zoom in on the area with overlapping lines.
With Plotly, an interactive area chart can also be made.
2. Comparing one by one with Small Multiple Time Series.
With the Seaborn library, we can do the small multiple time series. The idea behind these plots is to plot each line one by one while still comparing them with the silhouette of the other lines. The code on the official website link is here.
3. Changing the point of view with Facet Grid
FacetGrid from Seaborn can be used to make multi-plot grids. In this case, the ‘Month’ and ‘Year’ attributes are set as rows and columns, respectively. From another perspective, the values can be simultaneously compared monthly in vertical and yearly in horizontal.
4. Using color with Heat Map
A heat map represents the data into a two-dimensional chart showing values in colors. To deal with the Time Series data, we can set the groups on the vertical and the timeline on the horizontal dimensions. The difference in color helps distinguish between groups.
Pivot the DataFrame
5. Applying angles with a Radar chart
We can set the angular axis on the scatter plot in Plotly to create an interactive Radar Chart. Each month will be selected as a variable on the circle. For example, in this article, we will create a radar chart comparing the average monthly PM2.5 of the 25 districts in 2019.
Filter the DataFrame with only data from 2019
df_19 = df_monthly[df_monthly['Year']=='2019']
Create Radar Chart. A good thing about using Plotly is that the Radar chart is interactive. So we can easily filter the chart.
Let´s go further by filling the radar area of each distinct one by one and comparing each one with the rest. Then create a photo collage.
Define a function to create a photo collage. I found this excellent method to combine the plots from this link on Stack Overflow.
Use the function
Voila !!…
6. Fancy the bar plot with Circular Bar Plot (Race Track Plot)
The concept of a Circular Bar Plot (aka Race Track Plot) is so simple because it is just bar plots in a circle. We can plot Circular Bar Plot monthly and then make a photo collage to compare the process along the time.
The picture below shows an example of a Circular Bar Plot we are going to create. The disadvantage of this chart is that it is hard to compare between categories. By the way, it is a good choice for getting attention with an eye-catching effect.
Define a function to create a Circular Bar plot
Apply the function
Create a photo collage
7. Starting from the center with Radial Plot
Like Circular Bar Plot, Radial Plot is based on bar charts that use polar coordinates instead of cartesian coordinates. This chart type is inconvenient when comparing categories located far away from each other, but it is an excellent choice to get attention. It can be used in Infographics.
The picture below shows an example of Radial plots showing the average PM2.5 from the 25 districts in January 2019.
Apply the function
Create a photo collage
8. Showing densities with Overlapping densities (Ridge plot)
Overlapping densities (Ridge plot) can be used with multiple time-series data by setting an axis as a timeline. Likes Circular Bar Plot and Radial Plot, the Ridge plot can get people´s attention. The code on the official Seaborn website is here.
The following picture shows an example of the Ridge plot with the densities of PM2.5 in a district in 2019.
Define a function for creating the Ridge plot
Apply the function
Create photo collage
Summary
This article shows some visualizations with Python code examples for handling overlaying lines in the multiple time-series plot. The two main concepts are using interactive plots and separating them. The interactive chart is helpful with options that allow users to select categories freely, while separating the plots helps users compare them easily.
These are just some ideas. I’m sure there are more visualization ideas to handle multiple time-series data than the graphs mentioned in this article. If you have any questions or suggestions, please feel free to leave a comment. Thanks for reading.
These are other articles about data visualization that you may find interesting.
- Visualizing the Speed of Light with Python (link)
- Visualizing the Invisible SO2 with NASA Data and Python (link)
- Image Color Extraction with Python in 4 Steps (link)
Reference
- Seoul Metropolitan Government. (2021, May). Air Pollution Measurement Information in Seoul, Korea. Retrieved April 24, 2022 from https://www.kaggle.com/datasets/bappekim/air-pollution-in-seoul
- Author: Plotly Technologies Inc. Title: Collaborative data science Publisher: Plotly Technologies Inc. Place of publication: Montréal, QC Date of publication: 2015 URL: https://plot.ly
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS