VOOZH about

URL: https://towardsdatascience.com/rnn-recurrent-neural-networks-how-to-successfully-model-sequential-data-in-python-5a0b9e494f92/

โ‡ฑ RNN: Recurrent Neural Networks - How to Successfully Model Sequential Data in Python | Towards Data Science


RNN: Recurrent Neural Networks โ€“ How to Successfully Model Sequential Data in Python

A visual explanation of RNNs and a step by step guide to building them with Keras and Tensorflow Python libraries

16 min read

Neural Networks

๐Ÿ‘ Recurrent Neural Networks. Image by author.
Recurrent Neural Networks. Image by author.

Intro

Modeling and predicting sequential data requires a different approach from standard regression or classification. Luckily, a particular type of Neural Networks called Recurrent Neural Networks (RNNs) are specifically designed for that purpose.

In this article, I will cover the structure of RNNs and give you a complete example of how to build a simple RNN using Keras and Tensorflow in Python.

If you are not familiar with the basic structure of Neural Networks, you may prefer to familiarize yourself with Feed Forward and Deep Feed Forward NNs first.

Contents

  • A look at the Machine Learning universe
  • The architecture of Recurrent Neural Networks
  • Python example of how to build and train your own RNN

A look at the Machine Learning universe

While Neural Networks are most frequently used in a supervised manner with labeled training data, I felt that their unique approach to Machine Learning deserves a separate category.

Recurrent Neural Networks have their own sub-branch consisting of Simple RNNs, LSTMs (Long Short Term Memory), and GRUs (Gated Recurrent Unit).

The below graph is interactive, so please click on different categories to enlarge and reveal more๐Ÿ‘‡ .

If you enjoy Data Science and Machine Learning, please subscribe to get an email with my new articles.

The structure of Recurrent Neural Networks (RNNs)

First, letโ€™s remind ourselves what a typical Feed Forward Neural Network looks like. Note that it can contain any number of input nodes, hidden nodes, and output nodes. The below 2โ€“3โ€“2 structure is purely for illustration.

๐Ÿ‘ Simple Feed Forward Neural Network architecture. Image by author.
Simple Feed Forward Neural Network architecture. Image by author.

Next, if we look at RNN, we notice a slight difference. The hidden units inside RNN have a built-in feedback loop, enabling the information to be passed back to the same node multiple times. These hidden units are commonly called recurrent units.

๐Ÿ‘ Simple Recurrent Neural Network architecture. Image by author.
Simple Recurrent Neural Network architecture. Image by author.

A recurrent unit processes information for a predefined number of timesteps, each time passing a hidden state and an input for that specific timestep through an activation function.

Timestep โ€“ single processing of the inputs through the recurrent unit. E.g., if you have only one timestep, then your inputs will only be processed once (equivalent to a regular hidden node). If you have seven timesteps, then your inputs will be processed seven times.

See the illustration below showing the feedback loop inside the recurrent unit:

๐Ÿ‘ Recurrent unit operation at timestep t. Image by author.
Recurrent unit operation at timestep t. Image by author.

Note that at the initial timestep, the hidden state h0 is initialized to 0. Next, the output (a hidden state h at t+1) is passed back to a recurrent unit and processed again together with the following input:

๐Ÿ‘ Recurrent unit operation at timestep t+1. Image by author.
Recurrent unit operation at timestep t+1. Image by author.

The process repeats until the specified number of timesteps is reached.

Letโ€™s tie all of it together and see what a simple RNN with one input, one hidden node (containing three timesteps), and one output would look like.

๐Ÿ‘ Unfolding of the recurrent unit. Image by author.
Unfolding of the recurrent unit. Image by author.

To help explain what is happening in more detail, letโ€™s look at a simple example.

Assume you want to predict tomorrowโ€™s air temperature based on the sequence of air temperatures from the last three days. Then:

  • Inputs โ€“ while you may have only one input node, you would have to pass sequences of three numbers as your input because that is whatโ€™s required by the recurrent layer, i.e. [x0, x1, x2], โ€ฆ, [x{n-2}, x{n-1}, x_{n}].
  • Recurrent layer โ€“ in a typical feed-forward neural network, the hidden node would have two parameters: weight and bias. However, a recurrent layer has three parameters to optimize: weight for the input, weight for the hidden unit, and bias. Note that it would still be three parameters even if you had ten timesteps.
  • Training โ€“ a typical feed-forward neural network is trained using a backpropagation algorithm. Meanwhile, training an RNN uses a slightly modified version of backpropagation, which includes the unfolding in time to train the weights of the network. The algorithm is based on computing the gradient vector and is called backpropagation in time or BPTT for short.

As you are now familiar with the architecture of a simple RNN, letโ€™s go through a Python example.

๐Ÿ‘ Image

Python example of how to build and train your own RNN

Setup

Weโ€™ll need the following data and libraries:

Letโ€™s import all the libraries:

# Tensorflow / Keras
from tensorflow import keras # for building Neural Networks
print('Tensorflow/Keras: %s' % keras.__version__) # print version
from keras.models import Sequential # for creating a linear stack of layers for our Neural Network
from keras import Input # for instantiating a keras tensor
from keras.layers import Dense, SimpleRNN # for creating regular densely-connected NN layers and RNN layers

# Data manipulation
import pandas as pd # for data manipulation
print('pandas: %s' % pd.__version__) # print version
import numpy as np # for data manipulation
print('numpy: %s' % np.__version__) # print version
import math # to help with data reshaping of the data

# Sklearn
import sklearn # for model evaluation
print('sklearn: %s' % sklearn.__version__) # print version
from sklearn.model_selection import train_test_split # for splitting the data into train and test samples
from sklearn.metrics import mean_squared_error # for model evaluation metrics
from sklearn.preprocessing import MinMaxScaler # for feature scaling

# Visualization
import plotly 
import plotly.express as px
import plotly.graph_objects as go
print('plotly: %s' % plotly.__version__) # print version

The above code prints package versions used in this example:

Tensorflow/Keras: 2.7.0
pandas: 1.3.4
numpy: 1.21.4
sklearn: 1.0.1
plotly: 5.4.0

Next, we download and ingest Australian weather data (source: Kaggle). We also perform some simple data manipulation and derive a new variable (Median Temperature) for us to use.

# Set Pandas options to display more columns
pd.options.display.max_columns=50

# Read in the weather data csv
df=pd.read_csv('weatherAUS.csv', encoding='utf-8')

# Drop records where target MinTemp=NaN or MaxTemp=NaN
df=df[pd.isnull(df['MinTemp'])==False]
df=df[pd.isnull(df['MaxTemp'])==False]

# Median daily temperature (mid point between Daily Max and Daily Min)
df['MedTemp']=df[['MinTemp', 'MaxTemp']].median(axis=1)

# Show a snaphsot of data
df
๐Ÿ‘ A snippet of Kaggle's Australian weather data with some modifications. Image by author.
A snippet of Kaggleโ€™s Australian weather data with some modifications. Image by author.

Given the data contains weather information for multiple locations across Australia, letโ€™s pick one city (Canberra) and plot the daily median temperature on a chart.

# Select only Canberra 
dfCan=df[df['Location']=='Canberra'].copy()

# Plot daily median temperatures in Canberra
fig = go.Figure()
fig.add_trace(go.Scatter(x=dfCan['Date'], 
 y=dfCan['MedTemp'],
 mode='lines',
 name='Median Temperature',
 opacity=0.8,
 line=dict(color='black', width=1)
 ))

# Change chart background color
fig.update_layout(dict(plot_bgcolor = 'white'))

# Update axes lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
 showline=True, linewidth=1, linecolor='black',
 title='Date'
 )

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
 showline=True, linewidth=1, linecolor='black',
 title='Degrees Celsius'
 )

# Set figure title
fig.update_layout(title=dict(text="Median Daily Temperatures in Canberra", 
 font=dict(color='black')))

fig.show()
๐Ÿ‘ Daily median temperatures in Canberra. Image by author.
Daily median temperatures in Canberra. Image by author.

Training and Evaluating a Recurrent Neural Network (RNN)

Before we train and evaluate our Recurrent Neural Network, we need to create a function that will help us reshape the data to follow the required format.

##### Step 0 - We will use this function in step 3 to get the data into the right shape
def prep_data(datain, time_step):
 # 1. y-array 
 # First, create an array with indices for y elements based on the chosen time_step
 y_indices = np.arange(start=time_step, stop=len(datain), step=time_step)
 # Create y array based on the above indices 
 y_tmp = datain[y_indices]

 # 2. X-array 
 # We want to have the same number of rows for X as we do for y
 rows_X = len(y_tmp)
 # Since the last element in y_tmp may not be the last element of the datain, 
 # let's ensure that X array stops with the last y
 X_tmp = datain[range(time_step*rows_X)]
 # Now take this array and reshape it into the desired shape
 X_tmp = np.reshape(X_tmp, (rows_X, time_step, 1))
 return X_tmp, y_tmp

The above function can restructure the data for any number of timesteps. For example, as I am using seven timesteps (i.e., a sequence of 7 days temperature to predict air temperature of the next day), it will split the data like this:

๐Ÿ‘ Illustration of how the sequential data needs to be restructured for RNN. Image by author.
Illustration of how the sequential data needs to be restructured for RNN. Image by author.

Now we can train and evaluate our RNN. We use an extremely simple Neural Network for this example, with four layers and only one node in each layer. Feel free to experiment by adding additional layers, nodes, or by changing activation functions.

๐Ÿ‘ The structure of RNN used in the example. Image by author.
The structure of RNN used in the example. Image by author.

I have extensively commented on the code below to provide you with a clear understanding of what each part does. Hence, I will not repeat the same in the body of the article.

##### Step 1 - Select data for modeling and apply MinMax scaling
X=dfCan[['MedTemp']]
scaler = MinMaxScaler()
X_scaled=scaler.fit_transform(X)

##### Step 2 - Create training and testing samples
train_data, test_data = train_test_split(X_scaled, test_size=0.2, shuffle=False)

##### Step 3 - Prepare input X and target y arrays using previously defined function
time_step = 7
X_train, y_train = prep_data(train_data, time_step)
X_test, y_test = prep_data(test_data, time_step)

##### Step 4 - Specify the structure of a Neural Network
model = Sequential(name="First-RNN-Model") # Model
model.add(Input(shape=(time_step,1), name='Input-Layer')) # Input Layer - need to speicfy the shape of inputs
model.add(SimpleRNN(units=1, activation='tanh', name='Hidden-Recurrent-Layer')) # Hidden Recurrent Layer, Tanh(x) = sinh(x)/cosh(x) = ((exp(x) - exp(-x))/(exp(x) + exp(-x)))
model.add(Dense(units=1, activation='tanh', name='Hidden-Layer')) # Hidden Layer, Tanh(x) = sinh(x)/cosh(x) = ((exp(x) - exp(-x))/(exp(x) + exp(-x)))
model.add(Dense(units=1, activation='linear', name='Output-Layer')) # Output Layer, Linear(x) = x

##### Step 5 - Compile keras model
model.compile(optimizer='adam', # default='rmsprop', an algorithm to be used in backpropagation
 loss='mean_squared_error', # Loss function to be optimized. A string (name of loss function), or a tf.keras.losses.Loss instance.
 metrics=['MeanSquaredError', 'MeanAbsoluteError'], # List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance. 
 loss_weights=None, # default=None, Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs.
 weighted_metrics=None, # default=None, List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
 run_eagerly=None, # Defaults to False. If True, this Model's logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function.
 steps_per_execution=None # Defaults to 1. The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead.
 )

##### Step 6 - Fit keras model on the dataset
model.fit(X_train, # input data
 y_train, # target data
 batch_size=1, # Number of samples per gradient update. If unspecified, batch_size will default to 32.
 epochs=20, # default=1, Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided
 verbose='auto', # default='auto', ('auto', 0, 1, or 2). Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. 'auto' defaults to 1 for most cases, but 2 when used with ParameterServerStrategy.
 callbacks=None, # default=None, list of callbacks to apply during training. See tf.keras.callbacks
 validation_split=0.0, # default=0.0, Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. 
 #validation_data=(X_test, y_test), # default=None, Data on which to evaluate the loss and any model metrics at the end of each epoch. 
 shuffle=True, # default=True, Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
 class_weight=None, # default=None, Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
 sample_weight=None, # default=None, Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
 initial_epoch=0, # Integer, default=0, Epoch at which to start training (useful for resuming a previous training run).
 steps_per_epoch=None, # Integer or None, default=None, Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. 
 validation_steps=None, # Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.
 validation_batch_size=None, # Integer or None, default=None, Number of samples per validation batch. If unspecified, will default to batch_size.
 validation_freq=1, # default=1, Only relevant if validation data is provided. If an integer, specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs.
 max_queue_size=10, # default=10, Used for generator or keras.utils.Sequence input only. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
 workers=1, # default=1, Used for generator or keras.utils.Sequence input only. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1.
 use_multiprocessing=False, # default=False, Used for generator or keras.utils.Sequence input only. If True, use process-based threading. If unspecified, use_multiprocessing will default to False. 
 )

##### Step 7 - Use model to make predictions
# Predict the result on training data
pred_train = model.predict(X_train)
# Predict the result on test data
pred_test = model.predict(X_test)

##### Step 8 - Model Performance Summary
print("")
print('-------------------- Model Summary --------------------')
model.summary() # print model summary
print("")
print('-------------------- Weights and Biases --------------------')
print("Note, the last parameter in each layer is bias while the rest are weights")
print("")
for layer in model.layers:
 print(layer.name)
 for item in layer.get_weights():
 print(" ", item)
print("")
print('---------- Evaluation on Training Data ----------')
print("MSE: ", mean_squared_error(y_train, pred_train))
print("")

print('---------- Evaluation on Test Data ----------')
print("MSE: ", mean_squared_error(y_test, pred_test))
print("")

The above code prints the following summary and evaluation metrics for our Recurrent Neural Network:

๐Ÿ‘ Recurrent Neural Network performance. Image by author.
Recurrent Neural Network performance. Image by author.

Letโ€™s now plot the results on a chart and compare actual and predicted values. Note, we use the inverse_transform function to convert targets and predictions from scaled (we used MinMaxScaler before training RNN) to the original value range.

fig = go.Figure()
fig.add_trace(go.Scatter(x=np.array(range(0,len(y_test))),
 y=scaler.inverse_transform(y_test).flatten(),
 mode='lines',
 name='Median Temperature - Actual (Test)',
 opacity=0.8,
 line=dict(color='black', width=1)
 ))
fig.add_trace(go.Scatter(x=np.array(range(0,len(pred_test))),
 y=scaler.inverse_transform(pred_test).flatten(),
 mode='lines',
 name='Median Temperature - Predicted (Test)',
 opacity=0.8,
 line=dict(color='red', width=1)
 ))

# Change chart background color
fig.update_layout(dict(plot_bgcolor = 'white'))

# Update axes lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
 showline=True, linewidth=1, linecolor='black',
 title='Observation'
 )

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
 showline=True, linewidth=1, linecolor='black',
 title='Degrees Celsius'
 )

# Set figure title
fig.update_layout(title=dict(text="Median Daily Temperatures in Canberra", 
 font=dict(color='black')),
 legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
 )

fig.show()
๐Ÿ‘ RNN model predictions on test data. Image by author.
RNN model predictions on test data. Image by author.

The above results are for the test dataset. The prediction looks pretty accurate, but remember that we take seven prior data points in each case and only predict the next one. Therefore, the results of this specific model would be a lot less accurate if we tried to predict multiple points into the future, as I will show in a later example.

Using RNN to generate predictions

You will recall that we had every 8th observation in a sequence as our target during the training and prediction of the above model. But what if we wanted to use the model to generate predictions for every item (day) in our dataframe. The following code does exactly that:

# With the current setup, we feed in 7 days worth of data and get the prediction for the next day
# We want to create an array that contains 7-day chunks offset by one day at a time
# This is so we can make a prediction for every day in the data instead of every 7th day
X_every=dfCan[['MedTemp']]
X_every=scaler.transform(X_every)

for i in range(0, len(X_every)-time_step):
 if i==0:
 X_comb=X_every[i:i+time_step]
 else: 
 X_comb=np.append(X_comb, X_every[i:i+time_step])
X_comb=np.reshape(X_comb, (math.floor(len(X_comb)/time_step), time_step, 1))
print(X_comb.shape)

# Use the reshaped data to make predictions and add back into the dataframe
# np.zeros(time_step) - Set the first 7 numbers to 0 as we do not have data to predict
dfCan['MedTemp_prediction'] = np.append(np.zeros(time_step), scaler.inverse_transform(model.predict(X_comb)))

Since we added model predictions into the original dataframe, we can use it to plot the results.

fig = go.Figure()
fig.add_trace(go.Scatter(x=dfCan['Date'],
 y=dfCan['MedTemp'],
 mode='lines',
 name='Median Temperature - Actual',
 opacity=0.8,
 line=dict(color='black', width=1)
 ))
fig.add_trace(go.Scatter(x=dfCan['Date'],
 y=dfCan['MedTemp_prediction'],
 mode='lines',
 name='Median Temperature - Predicted',
 opacity=0.8,
 line=dict(color='red', width=1)
 ))

# Change chart background color
fig.update_layout(dict(plot_bgcolor = 'white'))

# Update axes lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
 showline=True, linewidth=1, linecolor='black',
 title='Observation'
 )

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey', 
 zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey', 
 showline=True, linewidth=1, linecolor='black',
 title='Degrees Celsius'
 )

# Set figure title
fig.update_layout(title=dict(text="Median Daily Temperatures in Canberra", 
 font=dict(color='black')),
 legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
 )

fig.show()
๐Ÿ‘ RNN model predictions on the entire data sample. Image by author.
RNN model predictions on the entire data sample. Image by author.

Again, pretty decent result keeping in mind that we only predict temperature for one day ahead.

What if we tried to predict temperatures for the next 365 days generating predictions for one day at a time? We will attempt this by iteratively adding new predictions to our 7-day sequence while at the same time dropping the oldest one from the sequence.

# Let's take the last sequence in the data to start predictions
inputs=X_comb[-1:]

# Create empty list
pred_list = []

# Loop 365 times to create predictions for the next year
for i in range(365): 
 pred_list.append(list(model.predict(inputs)[0])) # Generate prediction and add it to the list
 inputs = np.append(inputs[:,1:,:],[[pred_list[i]]],axis=1) # Drop oldest and append latest prediction

# Create a dataframe containing 365 days starting 2017-06-26
newdf=pd.DataFrame(pd.date_range(start='2017-06-26', periods=365, freq='D'), columns=['Date'])

# Add 365 days of model prediction from previous step
newdf['MedTemp_prediction']=scaler.inverse_transform(pred_list)

# Concatenate the orginal dataframe containing Canberra data and the new one containing predictions for the next 365 days
dfCan2=pd.concat([dfCan, newdf], ignore_index=False, axis=0, sort=False)

Finally, we reuse the chart plotting code from the previous step to show the results for the past two years + the prediction for the next 365 days.

replace:
x=dfCan['Date'] โ†’ x=dfCan2['Date'][-730:] # for both traces
y=dfCan['MedTemp'] โ†’ y=dfCan2['MedTemp'][-730:] # for first trace
y=dfCan['MedTemp_prediction'] โ†’ y=dfCan2['MedTemp_prediction'][-730:] # for second trace
๐Ÿ‘ RNN model predictions for the next 365 days. Image by author.
RNN model predictions for the next 365 days. Image by author.

And we can see that using the existing RNN model for anything longer than day+1 prediction is not wise. The reasons for such results are that we designed it only to predict one day ahead and partially influenced by RNNs having a relatively "short memory."

In the upcoming articles, I will analyze more advanced versions of Recurrent Neural Networks such as LSTM (Long Short Term Memory) and GRU (Gated Recurrent Units), so donโ€™t forget to subscribe not to miss them.

Final remarks

I sincerely hope you enjoyed reading this article and obtained some new knowledge.

Feel feel to use the code provided in this article to build your own Recurrent Neural Networks. You can find the complete Jupyter Notebook in my GitHub repository.

As I try to make my articles more useful for readers, I would appreciate it if you could let me know what has driven you to read this piece and whether it has given you the answers you were looking for. If not, what was missing?

Cheers! ๐Ÿ‘ Saul Dobilas


Feed Forward Neural Networks โ€“ How To Successfully Build Them in Python

Deep Feed Forward Neural Networks and the Advantage of ReLU Activation Function

LLE: Locally Linear Embedding โ€“ A Nifty Way to Reduce Dimensionality in Python


Written By

Saul Dobilas

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles