VOOZH about

URL: https://towardsdatascience.com/8-visualizations-with-python-to-handle-multiple-time-series-data-19b5b2e66dd0/

⇱ 8 Visualizations with Python to Handle Multiple Time-Series Data | Towards Data Science


8 Visualizations with Python to Handle Multiple Time-Series Data

Visualization ideas for coping with overlapping lines in multiple time-series plots.

9 min read
👁 Photo by Juliana on Unsplash
Photo by Juliana on Unsplash

Dr. Strange, a fictional character, is my favorite superhero. One of his incredible abilities is seeing the possibilities of events. It would be cool if I could see many things simultaneously like him. In fact, I can do that without having any superpower. Just a correct type of visualization and Python are enough.

Multiple Time-Series Data

A time-series plot with a single line is a helpful graph to express data with long sequences. It consists of an X-axis representing the timeline and a Y-axis showing the value. This is a standard method since the concept is simple and easy to understand. The plot can help us extract some insight information such as trends and seasonal effects.

However, many lines on the multiple time-series plot can make things difficult. Let’s consider the examples below.

👁 Image
👁 The first picture: Multiple time-series line plot shows PM2.5 in 25 districts of Seoul, 2019. The second picture: The same dataset in Radial Plot. Air Pollution in Seoul data from Kaggle. Images by the author.
The first picture: Multiple time-series line plot shows PM2.5 in 25 districts of Seoul, 2019. The second picture: The same dataset in Radial Plot. Air Pollution in Seoul data from Kaggle. Images by the author.

This may seem exaggerated, but sometimes it is possible to encounter this kind of data, as will be shown next. The first picture is hard to distinguish between the lines, and it is hard to read. With the same dataset, the Radial Plots in the second picture, which will be explained as an idea in this article, help handle the overlapping plot.

This article will demonstrate 8 visualization ideas with Python code to cope with the chaos in plotting multiple time-series data. Let´s get started.

Get Data

To work with a real case example, I will use Air Pollution in Seoul dataset from Kaggle(link). The data was provided by the Seoul Metropolitan Government. It is about air pollution information which consists of SO2, NO2, CO, O3, PM10, and PM2.5 between 2017 and 2019 from 25 districts in Seoul, the capital city of South Korea.

In this article, PM2.5 from 25 districts will be the primary variable plotted as multiple time-series lines. PM2.5 is defined as a fine particle matter with a diameter smaller than 2.5 µm. It is considered a type of pollution that causes short-term health effects.

Visualizing PM2.5 from many locations helps compare how pollution affects the city.

Import Data

Start with import libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker

Read Measurement_summary.csv with Pandas.

df = pd.read_csv('<file locaion>/Measurement_summary.csv')
df.head()
👁 Image

Explore Data

Now that we have already imported the dataset. Continue with checking the missing value and the data type of each column.

df.info()
👁 Image

The good news is that there is no missing value. The next step is to check the number of distinct station codes.

df['Station code'].nunique()
###output:
###25

There are 25 stations in total. Check the station codes.

list_scode = list(set(df['Station code']))
list_scode
###output:
### [101, 102, 103, 104, ..., 125]

Preprocess Data

From 101 to 125, the Station codes represent the districts in Seoul. Personally, using the district names is more convenient for labeling the visualization since it is more convenient to read. The names will be exacted from the ‘Address’ column to create the ‘District’ column.

list_add = list(df['Address'])
District = [i.split(', ')[2] for i in list_add]
df['District'] = District

Create a list with the 25 district names for use later.

list_district = list(set(District))

Prepare another three columns, YM(Year-Month), Year, and Month to apply with some graphs. For easier visualizing, we will group them into average monthly DataFrame.

👁 Image

Plot data

Now that everything is ready, let´s do the multiple time-series plot.

👁 Multiple time-series lines plot shows the average monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Image by the author.
Multiple time-series lines plot shows the average monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Image by the author.

Overlapping lines are hard to read. In 2017, it can be seen that the amount of PM2.5 in many stations went in the same direction. However, in 2018 and 2019, pollution lines went arbitrarily, and it is hard to distinguish.


Visualization Ideas

The main purpose of this article is to guide with some visualization ideas with Python to handle Multiple Time-Series Data.

Before continuing, I need to clarify something. The visualizations recommended in this article are mainly for coping with the overlapping plots since it is a main problem in plotting multiple time-series data, as we have already seen.

Each graph has its pros and cons. Obviously, nothing is perfect. Some may be just for an eye-catching effect. But all of them have the same purpose comparing sequences between categories.

1. Changing nothing but making the plot interactive.

Plotly is a graphing library for making interactive graphs. The interactive chart helps zoom in on the area with overlapping lines.

👁 The result of using Plotly to create an interactive Multiple time-series lines plot. Images by the author.
The result of using Plotly to create an interactive Multiple time-series lines plot. Images by the author.

With Plotly, an interactive area chart can also be made.

👁 Fill the area under the lines with the colors. Image by the author.
Fill the area under the lines with the colors. Image by the author.

2. Comparing one by one with Small Multiple Time Series.

With the Seaborn library, we can do the small multiple time series. The idea behind these plots is to plot each line one by one while still comparing them with the silhouette of the other lines. The code on the official website link is here.

👁 A part of the Small Multiple Time-Series plot. Images by the author.
A part of the Small Multiple Time-Series plot. Images by the author.

3. Changing the point of view with Facet Grid

FacetGrid from Seaborn can be used to make multi-plot grids. In this case, the ‘Month’ and ‘Year’ attributes are set as rows and columns, respectively. From another perspective, the values can be simultaneously compared monthly in vertical and yearly in horizontal.

👁 A part of the Facet Grid plot. Images by the author.
A part of the Facet Grid plot. Images by the author.

4. Using color with Heat Map

A heat map represents the data into a two-dimensional chart showing values in colors. To deal with the Time Series data, we can set the groups on the vertical and the timeline on the horizontal dimensions. The difference in color helps distinguish between groups.

Pivot the DataFrame

👁 Image
👁 The heat map shows the average monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Image by the author.
The heat map shows the average monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Image by the author.

5. Applying angles with a Radar chart

We can set the angular axis on the scatter plot in Plotly to create an interactive Radar Chart. Each month will be selected as a variable on the circle. For example, in this article, we will create a radar chart comparing the average monthly PM2.5 of the 25 districts in 2019.

Filter the DataFrame with only data from 2019

df_19 = df_monthly[df_monthly['Year']=='2019']

Create Radar Chart. A good thing about using Plotly is that the Radar chart is interactive. So we can easily filter the chart.

👁 The result of using Plotly to create an interactive Radar chart. Images by the author.
The result of using Plotly to create an interactive Radar chart. Images by the author.

Let´s go further by filling the radar area of each distinct one by one and comparing each one with the rest. Then create a photo collage.

Define a function to create a photo collage. I found this excellent method to combine the plots from this link on Stack Overflow.

Use the function

Voila !!…

👁 Some Radar charts from the photo collage. Images by the author.
Some Radar charts from the photo collage. Images by the author.

6. Fancy the bar plot with Circular Bar Plot (Race Track Plot)

The concept of a Circular Bar Plot (aka Race Track Plot) is so simple because it is just bar plots in a circle. We can plot Circular Bar Plot monthly and then make a photo collage to compare the process along the time.

The picture below shows an example of a Circular Bar Plot we are going to create. The disadvantage of this chart is that it is hard to compare between categories. By the way, it is a good choice for getting attention with an eye-catching effect.

👁 Circular Bar Plot, Images by the author.
Circular Bar Plot, Images by the author.

Define a function to create a Circular Bar plot

Apply the function

Create a photo collage

👁 Circular Bar plots show monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Images by the author.
Circular Bar plots show monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Images by the author.

7. Starting from the center with Radial Plot

Like Circular Bar Plot, Radial Plot is based on bar charts that use polar coordinates instead of cartesian coordinates. This chart type is inconvenient when comparing categories located far away from each other, but it is an excellent choice to get attention. It can be used in Infographics.

The picture below shows an example of Radial plots showing the average PM2.5 from the 25 districts in January 2019.

👁 Radial Plot, Images by the author.
Radial Plot, Images by the author.

Apply the function

Create a photo collage

👁 Radial plots show monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Images by the author.
Radial plots show monthly PM2.5 from 25 districts in Seoul from 2017 to 2019. Images by the author.

8. Showing densities with Overlapping densities (Ridge plot)

Overlapping densities (Ridge plot) can be used with multiple time-series data by setting an axis as a timeline. Likes Circular Bar Plot and Radial Plot, the Ridge plot can get people´s attention. The code on the official Seaborn website is here.

The following picture shows an example of the Ridge plot with the densities of PM2.5 in a district in 2019.

👁 Overlapping densities (Ridge plot), Images by the author.
Overlapping densities (Ridge plot), Images by the author.

Define a function for creating the Ridge plot

Apply the function

Create photo collage

👁 Some Ridge plots from the photo collage. Images by the author.
Some Ridge plots from the photo collage. Images by the author.

Summary

This article shows some visualizations with Python code examples for handling overlaying lines in the multiple time-series plot. The two main concepts are using interactive plots and separating them. The interactive chart is helpful with options that allow users to select categories freely, while separating the plots helps users compare them easily.

These are just some ideas. I’m sure there are more visualization ideas to handle multiple time-series data than the graphs mentioned in this article. If you have any questions or suggestions, please feel free to leave a comment. Thanks for reading.


These are other articles about data visualization that you may find interesting.

  • Visualizing the Speed of Light with Python (link)
  • Visualizing the Invisible SO2 with NASA Data and Python (link)
  • Image Color Extraction with Python in 4 Steps (link)

Reference


Written By

Boriharn K

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles