VOOZH about

URL: https://www.analyticsvidhya.com/blog/2021/03/improving-your-deep-learning-model-using-model-checkpointing-implementation-part-2/

⇱ Model Checkpointing Implementation | Model Checkpointing for DL


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Improving your Deep Learning model using Model Checkpointing(Implementation)- Part 2

Himanshi Singh Last Updated : 18 Mar, 2021
7 min read

Introduction

In the last article, we understood the model checkpointing technique, which can be used in order to monitor the model performance after every epoch and it helps us save the best model. In this article, we’ll see how to implement that in Keras. We’ll be doing Emergency vs Non-Emergency vehicle classification using Keras. And this article is more focused on the implementation of Model checkpointing and you’re required to have a little bit of prior knowledge about creating models using Keras, as I’ve just covered those steps and not explained them in detail.

Without any further delay let’s begin!

Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.

So here are the steps that we’ll be following, as discussed in the previous article,  and will set up model checkpointing at the time of model training-

    1. Loading the dataset
    2. Pre-processing the data
    3. Creating training and validation set
    4. Defining the model architecture
    5. Compiling the model
    6. Training the model
    1. Setting up model checkpointing
    7. Evaluating model performance

1. Loading the dataset

Now, let’s start with the first step, which is loading the required libraries and the dataset-

# import necessary libraries and functions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# importing layers from keras
from keras.layers import Dense, InputLayer
from keras.models import Sequential

# importing adam optimizer from keras optimizer module
from keras.optimizers import Adam

# train_test_split to create training and validation set
from sklearn.model_selection import train_test_split

# accuracy_score to calculate the accuracy of predictions
from sklearn.metrics import accuracy_score

So here I’ve imported the required libraries and then I’ll mount the drive-

from google.colab import drive
drive.mount('/content/drive')

👁 Model checkpointing mounted

Once the drive is mounted, we’ll unzip the file-

!unzip /content/drive/My\ Drive/Dataset.zip

And then read the CSV file, which stores the image names, as well as the target-

# reading the csv file
data = pd.read_csv('Dataset/emergency_classification.csv')

After that set the seed value randomly and look at the first five rows of the file-

seed = 42
data.head()

👁 data head

So this is the file. Then we’ll use this file in order to load the images and store them in a variable name “X”, and we are separately storing the target in a variable name “y”-

# load images and store them in numpy array

# empty list to store the images
X = []

# iterating over each image
for img_name in data.image_names:

# loading the image using its name
img = plt.imread('Dataset/images/' + img_name)

# saving each image in the list
X.append(img)

# converting the list of images into array
X=np.array(X)

# storing the target variable in separate variable
y = data.emergency_or_not.values

Let’s look at the shape of our array-

# shape of the images
X.shape

So the output confirms that we have 2352 input images and each image has the shape of 224 x 224 x 3. Now the next step is to preprocess the data.

2. Pre-processing the data

So here, first of all, we are converting the three-dimensional matrix into one dimension. So we are converting the last three dimensions 224 x 224 x 3 into one dimension, which will be 224*224*3 = 150528.

# converting 3 dimensional image to 1 dimensional image
X = X.reshape(X.shape[0], 224*224*3)
X.shape

Then we’ll be checking the minimum and maximum value of the images.-

# minimum and maximum pixel values of images
X.min(), X.max()

So it’s 0 to 255. We are changing this range of 0 to 255 into 0 to 1. And we’ll normalize by dividing each of the pixel values by 255-

# normalizing the pixel values
X = X / X.max()
# minimum and maximum pixel values of images after normalizing
X.min(), X.max()

3. Creating training and validation set

Now we’ll create the training and validation set-

# creating a training and validation set
X_train, X_valid, y_train, y_valid=train_test_split(X,y,test_size=0.3, random_sta

# shape of training and validation set
(X_train.shape, y_train.shape), (X_valid.shape, y_valid.shape)

👁 training and validation set

So we have our training and validation sets are ready. Now we are going to define the architecture for our model.

4. Defining the model architecture

The input shape will be the number of features for each of these images, which will be 224*224*3. Then we have used two hidden layers each having 100 neurons and sigmoid activation function. And after that, we have our output layer, which has one neuron and the sigmoid activation function-

# defining the model architecture

model=Sequential()
model.add(InputLayer(input_shape=(224*224*3,)))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(units=1, activation='sigmoid'))

5. Compiling the model

Next, we will compile the model where we will define the loss function as well as the optimizer.

So here we have defined the learning rate for the optimizer Adam and then we have specified loss optimizer and the metric and using model.compile we’ll compile this model-

# defining the adam optimizer and setting the learning rate as 10^-5
adam = Adam(lr=1e-5)

# compiling the model
# defining loss as binary cross-entropy
# defining optimizer as Adam
# defining metrics as accuracy

model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])

6. Training the model

Setting up model checkpointing

Now we will train the model and here will also define the model checkpointing. So for that, we’ll first import the model checkpointing function from the “callbacks” module of Keras-
# importing model checkpointing from keras callbacks
from keras.callbacks import ModelCheckpoint
Next will define the model checkpoint. So for that, we’ll first define the part where we wish to save the model weights.

So here I am storing the model or I would say the weights and basis of the model as best_weights.hdf5, here “.hdf5” will be the format of the file next.

# defining model checkpointing

# defining the path to store the weights
filepath="best_weights.hdf5"

As I mentioned in the last article, we have to define the metric to monitor and it’s mode so we’ll define that.

# defining the model checkpointing and metric to monitor

checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')

# defining checkpointing variable
callbacks_list = [checkpoint]

So here we are calling the model checkpoint function and within this function, we have to define the path first where we wish to save the model i.e best_weights.hdf5. After that, we have to define the metric to monitor. So we are defining the metric to monitor i.e Validation Accuracy as val_accuracy. So this will monitor the validation accuracy after every epoch. Then we have “verbose = 1” so that it will print the summary after every epoch. And then we have the parameter called “save_best_only” and it is set equal to “True”, this means that it will overwrite the previously saved models and it will only save the best model. And finally, we are defining the “mode = ‘max’ “, since we want the maximum validation accuracy.

So this is how we define the model checkpoint function.

Next, we’ll train the model for 50 epochs, set the batch size to 128. Also, we’ll provide our training and validation sets. Let us run this cell.-

# training the model for 50 epochs
model_history = model.fit(X_train, y_train, epochs=50, batch_size=128, validation_data=(X_valid,y_valid), callbacks=callbacks_list)

We can see that the model training is complete and we have saved the best model at epoch number 48, after that the inaccuracy did not improve.

8. Evaluating model performance

Now that we have trained the model, let us go ahead and evaluate the model. So we are checking the accuracy scored on the validation set-

# accuracy on validation set
print('Accuracy on validation set:', accuracy_score(y_valid, model.predict_classes(X_valid)[:, 0]), '%')

So the accuracy score on the validation set comes out to be 0.7181. Now let’s load the best model-

# loading the best model
model.load_weights("best_weights.hdf5")

And once we have loaded the best model, we are going to use this in order to make the predictions and then we’ll compare how the model has performed-

# accuracy on validation set
print('Accuracy on validation set:', accuracy_score(y_valid, model.predict_classes(X_valid)[:, 0]), '%')

So we can see that the accuracy score with the best model has come out to be 0.7237, which is better than the previous validation accuracy.

So this is how we can implement model checkpointing in Keras.

End Notes

In this article, we saw the implementation of the Model Checkpointing technique on the Emergency vs Non-Emergency classification dataset.

The validation, accuracy by the model that used the weights and bias values which were stored at the last epoch was 0.71, and then when we calculated validation accuracy using our best model, it came out to be 0.72

If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program

If you have any questions, let me know in the comments section!

I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together. 

Thanks for stopping by my profile - hope you found something you liked :)

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Learn ensemble learning, its techniques, and how it works in this course!

Nano Course: Dreambooth-Stable Diffusion for Custom Images

Learn to create custom images with Dreambooth Stable Diffusion technology

Dimensionality Reduction for Machine Learning

Master key dimensionality reduction techniques for ML success!

Responses From Readers

Tho

Hi Himanshi, where can I get this dataset that you are using in this article? Thanks!

123 1
Himanshi Singh

Hi, Tho Please check your mail. I've sent you the dataset over there. Thanks!

123 456
Arpit Yadav

Hi Himanshi, where can I get this dataset that you are using in this article? Thanks!

123 1
Himanshi Singh

Please check your mail!

123 456
Arpit Yadav

Hey Himanshi, where can I get this dataset for the article? Thanks and Regards

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
👁 Av Logo White

Continue your learning for FREE

Forgot your password?
👁 Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

👁 Popup Banner
👁 AI Popup Banner