Feed Forward Neural Networks โ How To Successfully Build Them in Python
A detailed graphical explanation of Neural Networks with a Python example using real-life data
Neural Networks
Intro
Neural Networks have been the central talking point over the last few years. While they may initially seem intimidating, I assure you that you do not need a Ph.D. to understand how they work.
In this article, I will take you through the main ideas behind basic Neural Networks, also known as Feed Forward NNs or Multilayer Perceptrons (MLPs), and show you how to build them in Python using Tensorflow and Keras libraries.
Contents
- Feed Forward Neural Networkโs place within the universe of Machine Learning
- A visual explanation of how Feed Forward NNs work
- Network structure and terminology
- Parameters and activation functions
- Loss functions, optimizers, and training
- Python examples of how to build and train your own Feed Forward Neural Networks
Feed Forward Neural Networkโs place within the universe of Machine Learning
Machine Learning is a vast and ever-expanding space with new algorithms developed daily. I have attempted to bring structure to this world by categorizing some of the most commonly used algorithms in the interactive chart below. Click on different categories to enlarge and reveal more.๐
While this categorization is not perfect, it brings a general understanding of how different pieces fit together, and hopefully, it can also facilitate your data science learning journey.
I have placed Neural Networks in a distinct category recognizing their unique approach to Machine Learning. However, it is essential to remember that Neural Networks are most frequently employed to solve classification and regression problems using labeled training data. Hence, an alternative approach could be to put them under the Supervised branch of Machine Learning.
If you enjoy Data Science and Machine Learning, please subscribe to get an email whenever I publish a new story.
A visual explanation of how Feed Forward NNs work
Structure and terminology
First, letโs familiarize ourselves with the basic structure of a Neural Network.
- Input Layer โ contains one or more input nodes. For example, suppose you want to predict whether it will rain tomorrow and base your decision on two variables, humidity and wind speed. In that case, your first input would be the value for humidity, and the second input would be the value for wind speed.
- Hidden Layer โ this layer houses hidden nodes, each containing an activation function (more on these later). Note that a Neural Network with multiple hidden layers is known as Deep Neural Network.
- Output Layer โ contains one or more output nodes. Following the same weather prediction example above, you could choose to have only one output node generating a rain probability (where >0.5 means rain tomorrow, and โค0.5 no rain tomorrow). Alternatively, you could have two output nodes, one for rain and another for no rain. Note, you can use a different activation function for output nodes vs. hidden nodes.
- Connections โ lines joining different nodes are known as connections. These contain kernels (weights) and biases, the parameters that get optimized during the training of a neural network.
Parameters and activation functions
Letโs take a closer look at kernels (weights) and biases to understand what they do. For simplicity, we create a basic neural network with one input node, two hidden nodes, and one output node (1โ2โ1).
- Kernels (weights) โ **** used to scale input and hidden node values. Each connection typically holds a different weight.
- Biases โ used to adjust scaled values before passing them through an activation function.
- Activation functions โ think of activation functions as standard curves (building blocks) used by the Neural Network to create a custom curve to fit the training data. Passing different input values through the network selects different sections of the standard curve, which are then assembled into a final custom-fit curve.
There are many activation functions to choose from, with Softplus, ReLU, and Sigmoid being the most commonly used. Here are the shapes and equations of six frequently used activation functions in Neural Networks:
As we are now familiar with kernels (weights), biases, and activation functions, letโs use the same Neural Network to calculate the probability of rain tomorrow based on todayโs humidity.
Note, I have already trained this Neural Network (see Python section below). Hence, we already know the values for kernels (weights) and biases. The below illustration shows you a step-by-step process of how FF Neural Network takes an input value and produces the answer (output value).
As you can see, the above Neural Network tells us that a 50% humidity today implies a 33% probability of rain tomorrow.
Loss functions, optimizers, and training
Training Neural Networks involves a complicated process known as backpropagation. I will not go through a step-by-step explanation of how backpropagation works since it is a big enough topic deserving a separate article.
Instead, let me briefly introduce you to loss functions and optimizers and summarize what happens when we "train" a Neural Network.
- Loss โ represents the "size" of error between the true values/labels and the predicted values/labels. The goal of training a Neural Network is to minimize this loss. The smaller the loss, the closer the match between the true and the predicted data. There are many loss functions to choose from, with BinaryCrossentropy, CategoricalCrossentropy, and MeanSquaredError being the most common.
- Optimizers โ are the algorithms used in backpropagation. The goal of an optimizer is to find the optimum set of kernels (weights) and biases to minimize the loss. Optimizers typically use a gradient descent approach, which allows them to iteratively find the "best" possible configuration of weights and biases. The most commonly used ones are SGD, ADAM, and RMSProp.
Training a Neural Network is basically fitting a custom curve through the training data until it can approximate it as well as possible. The graph below illustrates what a custom-fitted curve could look like in a specific scenario. This example contains a set of data that seem to flip between 0 and 1 as the value for input increases.
In general, the wide selection of activation functions combined with the ability to add as many hidden nodes as we wish (provided we have sufficient computational power) means that Neural Networks can create a curve of any shape to fit the data.
However, having this extreme flexibility may sometimes lead to overfitting the data. Hence, we must always ensure that we validate the model on the test/validation set before using it to make predictions.
Summarizing what we have learned
Feed Forward Neural Networks take one or multiple input values and apply transformations using kernels (weights) and biases before passing results through activation functions. In the end, we get an output (prediction), which is a result of this complex set of transformations optimized through training.
We train Neural Networks by fitting a custom curve through the training data, guided by loss minimization and achieved through parameter (kernels and biases) optimization.
Building and training Feed Forward Neural Networks in Python
Letโs now have some fun and build our own Neural Network. We will use historic Australian weather data to train a Neural Network that predicts whether it will rain tomorrow or not.
Setup
Weโll need the following data and libraries:
- Australian weather data from Kaggle (license: Creative Commons, original source of the data: Commonwealth of Australia, Bureau of Meteorology).
- Pandas and Numpy for data manipulation
- Plotly for data visualizations
- Tensorflow/Keras for Neural Networks
- Scikit-learn library for splitting the data into train-test samples, and for some basic model evaluation
Letโs import all the libraries:
# Tensorflow / Keras
from tensorflow import keras # for building Neural Networks
print('Tensorflow/Keras: %s' % keras.__version__) # print version
from keras.models import Sequential # for creating a linear stack of layers for our Neural Network
from keras import Input # for instantiating a keras tensor
from keras.layers import Dense # for creating regular densely-connected NN layers.
# Data manipulation
import pandas as pd # for data manipulation
print('pandas: %s' % pd.__version__) # print version
import numpy as np # for data manipulation
print('numpy: %s' % np.__version__) # print version
# Sklearn
import sklearn # for model evaluation
print('sklearn: %s' % sklearn.__version__) # print version
from sklearn.model_selection import train_test_split # for splitting data into train and test samples
from sklearn.metrics import classification_report # for model evaluation metrics
# Visualization
import plotly
import plotly.express as px
import plotly.graph_objects as go
print('plotly: %s' % plotly.__version__) # print version
The above code prints package versions used in this example:
Tensorflow/Keras: 2.7.0
pandas: 1.3.4
numpy: 1.21.4
sklearn: 1.0.1
plotly: 5.4.0
Next, we download and ingest Australian weather data (source: Kaggle). We also do some simple data manipulations and derive new variables for our models.
# Set Pandas options to display more columns
pd.options.display.max_columns=50
# Read in the weather data csv
df=pd.read_csv('weatherAUS.csv', encoding='utf-8')
# Drop records where target RainTomorrow=NaN
df=df[pd.isnull(df['RainTomorrow'])==False]
# For other columns with missing values, fill them in with column mean
df=df.fillna(df.mean())
# Create a flag for RainToday and RainTomorrow, note RainTomorrowFlag will be our target variable
df['RainTodayFlag']=df['RainToday'].apply(lambda x: 1 if x=='Yes' else 0)
df['RainTomorrowFlag']=df['RainTomorrow'].apply(lambda x: 1 if x=='Yes' else 0)
# Show a snaphsot of data
df
And this is what the data looks like:
Neural Networks
Now we train and evaluate our Feed Forward (FF) Neural Network. I have extensively commented the code below to provide you with a clear understanding of what each part does. Hence, I will not repeat the same in the body of the article.
Using one input (Humidity3pm)
In short, we are using humidity at 3 pm today to predict whether it will rain tomorrow or not. Our Neural Network has a simple structure (1โ2โ1) analyzed earlier in this article: one input node, two hidden nodes, and one output node.
A couple of things to note:
- The below code performs validation twice, once on a portion of X_train data (see validation_split in step 5) and another time on a test sample created in step 2. Of course, there is no need to do it twice, so feel free to use either method to validate your model.
- The data was imbalanced (more sunny days than rainy days), so Iโve adjusted classes_weight in step 5.
##### Step 1 - Select data for modeling
X=df[['Humidity3pm']]
y=df['RainTomorrowFlag'].values
##### Step 2 - Create training and testing samples
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
##### Step 3 - Specify the structure of a Neural Network
model = Sequential(name="Model-with-One-Input") # Model
model.add(Input(shape=(1,), name='Input-Layer')) # Input Layer - need to speicfy the shape of inputs
model.add(Dense(2, activation='softplus', name='Hidden-Layer')) # Hidden Layer, softplus(x) = log(exp(x) + 1)
model.add(Dense(1, activation='sigmoid', name='Output-Layer')) # Output Layer, sigmoid(x) = 1 / (1 + exp(-x))
##### Step 4 - Compile keras model
model.compile(optimizer='adam', # default='rmsprop', an algorithm to be used in backpropagation
loss='binary_crossentropy', # Loss function to be optimized. A string (name of loss function), or a tf.keras.losses.Loss instance.
metrics=['Accuracy', 'Precision', 'Recall'], # List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance.
loss_weights=None, # default=None, Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs.
weighted_metrics=None, # default=None, List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
run_eagerly=None, # Defaults to False. If True, this Model's logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function.
steps_per_execution=None # Defaults to 1. The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead.
)
##### Step 5 - Fit keras model on the dataset
model.fit(X_train, # input data
y_train, # target data
batch_size=10, # Number of samples per gradient update. If unspecified, batch_size will default to 32.
epochs=3, # default=1, Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided
verbose='auto', # default='auto', ('auto', 0, 1, or 2). Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. 'auto' defaults to 1 for most cases, but 2 when used with ParameterServerStrategy.
callbacks=None, # default=None, list of callbacks to apply during training. See tf.keras.callbacks
validation_split=0.2, # default=0.0, Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
#validation_data=(X_test, y_test), # default=None, Data on which to evaluate the loss and any model metrics at the end of each epoch.
shuffle=True, # default=True, Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
class_weight={0 : 0.3, 1 : 0.7}, # default=None, Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight=None, # default=None, Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
initial_epoch=0, # Integer, default=0, Epoch at which to start training (useful for resuming a previous training run).
steps_per_epoch=None, # Integer or None, default=None, Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined.
validation_steps=None, # Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.
validation_batch_size=None, # Integer or None, default=None, Number of samples per validation batch. If unspecified, will default to batch_size.
validation_freq=3, # default=1, Only relevant if validation data is provided. If an integer, specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs.
max_queue_size=10, # default=10, Used for generator or keras.utils.Sequence input only. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
workers=1, # default=1, Used for generator or keras.utils.Sequence input only. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1.
use_multiprocessing=False, # default=False, Used for generator or keras.utils.Sequence input only. If True, use process-based threading. If unspecified, use_multiprocessing will default to False.
)
##### Step 6 - Use model to make predictions
# Predict class labels on training data
pred_labels_tr = (model.predict(X_train) > 0.5).astype(int)
# Predict class labels on a test data
pred_labels_te = (model.predict(X_test) > 0.5).astype(int)
##### Step 7 - Model Performance Summary
print("")
print('-------------------- Model Summary --------------------')
model.summary() # print model summary
print("")
print('-------------------- Weights and Biases --------------------')
for layer in model.layers:
print("Layer: ", layer.name) # print layer name
print(" --Kernels (Weights): ", layer.get_weights()[0]) # weights
print(" --Biases: ", layer.get_weights()[1]) # biases
print("")
print('---------- Evaluation on Training Data ----------')
print(classification_report(y_train, pred_labels_tr))
print("")
print('---------- Evaluation on Test Data ----------')
print(classification_report(y_test, pred_labels_te))
print("")
The above code prints the following summary and evaluation metrics for our 1โ2โ1 Neural Network:
Note that weights and biases for this model are different from the ones in the calculated example earlier in this article. It is because Neural Network training uses a stochastic (random) approach within the optimizer algorithms. Hence, your model will be different every time you re-train it.
Letโs now plot the prediction curve on a chart.
# Create 100 evenly spaced points from smallest X to largest X
X_range = np.linspace(X.min(), X.max(), 100)
# Predict probabilities for rain tomorrow
y_predicted = model.predict(X_range.reshape(-1, 1))
# Create a scatter plot
fig = px.scatter(x=X_range.ravel(), y=y_predicted.ravel(),
opacity=0.8, color_discrete_sequence=['black'],
labels=dict(x="Value of Humidity3pm", y="Predicted Probability of Rain Tomorrow",))
# Change chart background color
fig.update_layout(dict(plot_bgcolor = 'white'))
# Update axes lines
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')
# Set figure title
fig.update_layout(title=dict(text="Feed Forward Neural Network (1 Input) Model Results",
font=dict(color='black')))
# Update marker size
fig.update_traces(marker=dict(size=7))
fig.show()
Using two inputs (WindGustSpeed and Humidity3pm)
Letโs see how the network and predictions change when we use two inputs (WindGustSpeed and Humidity3pm) to train a Neural Network that has a 2โ2โ1 structure.
Feel free to experiment at your own time by training a model with 17 inputs and a different number of hidden nodes.
##### Step 1 - Select data for modeling
X=df[['WindGustSpeed', 'Humidity3pm']]
y=df['RainTomorrowFlag'].values
##### Step 2 - Create training and testing samples
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
##### Step 3 - Specify the structure of a neural network
model2 = Sequential(name="Model-with-Two-Inputs") # Model
model2.add(Input(shape=(2,), name='Input-Layer')) # Input Layer - need to speicfy the shape of inputs
model2.add(Dense(2, activation='softplus', name='Hidden-Layer')) # Hidden Layer, softplus(x) = log(exp(x) + 1)
model2.add(Dense(1, activation='sigmoid', name='Output-Layer')) # Output Layer, sigmoid(x) = 1 / (1 + exp(-x))
##### Step 4 - Compile the keras model
model2.compile(optimizer='adam', # default='rmsprop', an algorithm to be used in backpropagation
loss='binary_crossentropy', # Loss function to be optimized. A string (name of loss function), or a tf.keras.losses.Loss instance.
metrics=['Accuracy', 'Precision', 'Recall'], # List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance.
loss_weights=None, # default=None, Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs.
weighted_metrics=None, # default=None, List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
run_eagerly=None, # Defaults to False. If True, this Model's logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function.
steps_per_execution=None # Defaults to 1. The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead.
)
##### Step 5 - Fit keras model on the dataset
model2.fit(X_train, # input data
y_train, # target data
batch_size=10, # Number of samples per gradient update. If unspecified, batch_size will default to 32.
epochs=3, # default=1, Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided
verbose='auto', # default='auto', ('auto', 0, 1, or 2). Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. 'auto' defaults to 1 for most cases, but 2 when used with ParameterServerStrategy.
callbacks=None, # default=None, list of callbacks to apply during training. See tf.keras.callbacks
validation_split=0.2, # default=0.0, Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
#validation_data=(X_test, y_test), # default=None, Data on which to evaluate the loss and any model metrics at the end of each epoch.
shuffle=True, # default=True, Boolean (whether to shuffle the training data before each epoch) or str (for 'batch').
class_weight={0 : 0.3, 1 : 0.7}, # default=None, Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight=None, # default=None, Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
initial_epoch=0, # Integer, default=0, Epoch at which to start training (useful for resuming a previous training run).
steps_per_epoch=None, # Integer or None, default=None, Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined.
validation_steps=None, # Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.
validation_batch_size=None, # Integer or None, default=None, Number of samples per validation batch. If unspecified, will default to batch_size.
validation_freq=3, # default=1, Only relevant if validation data is provided. If an integer, specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs.
max_queue_size=10, # default=10, Used for generator or keras.utils.Sequence input only. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
workers=1, # default=1, Used for generator or keras.utils.Sequence input only. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1.
use_multiprocessing=False, # default=False, Used for generator or keras.utils.Sequence input only. If True, use process-based threading. If unspecified, use_multiprocessing will default to False.
)
##### Step 6 - Use model to make predictions
# Predict class labels on training data
pred_labels_tr = (model2.predict(X_train) > 0.5).astype(int)
# Predict class labels on a test data
pred_labels_te = (model2.predict(X_test) > 0.5).astype(int)
##### Step 7 - Model Performance Summary
print("")
print('-------------------- Model Summary --------------------')
model2.summary() # print model summary
print("")
print('-------------------- Weights and Biases --------------------')
for layer in model2.layers:
print("Layer: ", layer.name) # print layer name
print(" --Kernels (Weights): ", layer.get_weights()[0]) # kernels (weights)
print(" --Biases: ", layer.get_weights()[1]) # biases
print("")
print('---------- Evaluation on Training Data ----------')
print(classification_report(y_train, pred_labels_tr))
print("")
print('---------- Evaluation on Test Data ----------')
print(classification_report(y_test, pred_labels_te))
print("")
And the results are:
Since we used two inputs, we can still visualize the predictions. However, this time we need a 3D chart to do it:
def Plot_3D(X, X_test, y_test, clf, x1, x2, mesh_size, margin):
# Specify a size of the mesh to be used
mesh_size=mesh_size
margin=margin
# Create a mesh grid on which we will run our model
x_min, x_max = X.iloc[:, 0].min() - margin, X.iloc[:, 0].max() + margin
y_min, y_max = X.iloc[:, 1].min() - margin, X.iloc[:, 1].max() + margin
xrange = np.arange(x_min, x_max, mesh_size)
yrange = np.arange(y_min, y_max, mesh_size)
xx, yy = np.meshgrid(xrange, yrange)
# Calculate Neural Network predictions on the grid
Z = model2.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Create a 3D scatter plot
fig = px.scatter_3d(x=X_test[x1], y=X_test[x2], z=y_test,
opacity=0.8, color_discrete_sequence=['black'], height=900, width=1000)
# Set figure title and colors
fig.update_layout(#title_text="Scatter 3D Plot with FF Neural Network Prediction Surface",
paper_bgcolor = 'white',
scene_camera=dict(up=dict(x=0, y=0, z=1),
center=dict(x=0, y=0, z=-0.1),
eye=dict(x=0.75, y=-1.75, z=1)),
margin=dict(l=0, r=0, b=0, t=0),
scene = dict(xaxis=dict(title=x1,
backgroundcolor='white',
color='black',
gridcolor='#f0f0f0'),
yaxis=dict(title=x2,
backgroundcolor='white',
color='black',
gridcolor='#f0f0f0'
),
zaxis=dict(title='Probability of Rain Tomorrow',
backgroundcolor='lightgrey',
color='black',
gridcolor='#f0f0f0',
)))
# Update marker size
fig.update_traces(marker=dict(size=1))
# Add prediction plane
fig.add_traces(go.Surface(x=xrange, y=yrange, z=Z, name='FF NN Prediction Plane',
colorscale='Bluered',
reversescale=True,
showscale=False,
contours = {"z": {"show": True, "start": 0.5, "end": 0.9, "size": 0.5}}))
fig.show()
return fig
# Call the above function
fig = Plot_3D(X, X_test, y_test, model2, x1='WindGustSpeed', x2='Humidity3pm', mesh_size=1, margin=0)
Conclusions
Neural Networks are not as scary as they seem at first. I sincerely hope you enjoyed reading this article and obtained some new knowledge.
Feel feel to use the code provided in this article to build your own Neural Networks. Also, you can find the complete Jupyter Notebook in my GitHub repository.
As I try to make my articles more useful for readers, I would appreciate it if you could let me know what has driven you to read this piece and whether it has given you the answers you were looking for. If not, what was missing?
Cheers! ๐ Saul Dobilas
UMAP Dimensionality Reduction โ An Incredibly Robust Machine Learning Algorithm
Self-Training Classifier: How to Make Any Algorithm Behave Like a Semi-Supervised One
BBN: Bayesian Belief Networks โ How to Build Them Effectively in Python?
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS