VOOZH about

URL: https://www.geeksforgeeks.org/deep-learning/residual-networks-resnet-deep-learning/

⇱ Residual Networks (ResNet) - Deep Learning - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Residual Networks (ResNet) - Deep Learning

Last Updated : 12 May, 2026

Residual Networks (ResNet) is a deep learning architecture designed to enable efficient training of very deep neural networks. It introduces skip (shortcut) connections, which allow the model to learn residual mappings instead of direct transformations.

  • Helps prevent vanishing gradient problems in very deep models
  • Allows information to flow directly across layers using skip connections
  • Enables building networks with hundreds or even thousands of layers
👁 residual_block2
Residual Block

Challenges in Deep Neural Networks

Deep Neural Networks are powerful models, but training them becomes difficult as network depth increases. Two major issues are:

1. Vanishing/Exploding Gradient Problem: As the number of layers increases, gradients can become extremely small (vanishing) or very large (exploding) during backpropagation, making training unstable.

2. Degradation Problem: Increasing network depth does not always improve performance and can even degrade it.

  • Performance Plateau: Training error stops decreasing after a certain depth
  • Accuracy Degradation: Validation error increases, leading to poor generalization

Key Features

  • Residual Connections: Enable very deep networks by allowing gradients to flow through identity shortcuts, reducing the vanishing gradient problem.
  • Identity Mapping: Simplifies training by learning residual functions instead of full mappings.
  • Depth: Supports extremely deep architectures for improved image recognition performance.
  • Fewer Parameters: Achieves high accuracy with fewer parameters hence improving computational efficiency.

The following graph compares training and test errors of 20-layer and 56-layer networks, highlighting the limitations of deeper networks without residual connections.

  • Training error: The 56-layer network learns slowly and shows fluctuations, while the 20-layer network converges more smoothly
  • Test error: The deeper network has higher error (degradation problem), whereas the shallower network generalizes better
👁 resnet-1
Comparison of 20-layer vs 56-layer architecture

ResNet-34

ResNet-34 is a deep residual network built on a 34-layer plain network inspired by VGG-19, with shortcut connections forming 16 residual blocks. The architecture is organized into stages as follows:

  • First stage: 3 residual blocks, each with 2 convolution layers of 64 filters and identity skip connections
  • Second stage: 4 residual blocks, each with 2 convolution layers of 128 filters; uses 1×1 projection or padding for dimension matching
  • Third stage: 6 residual blocks, each with 2 convolution layers of 256 filters
  • Fourth stage: 3 residual blocks, each with 2 convolution layers of 512 filters
  • Output layer: Feature maps are passed through Global Average Pooling followed by a fully connected layer with softmax for classification
👁 ResNet
ResNet34

Working

Conventional networks try to learn the full mapping . ResNet instead learns a residual function and combines it with the input via a skip connection

where:

  • : input to the block
  • : desired mapping
  • : residual function to be learned

Learning the simpler residual makes optimization easier.

1. Residual Block: A residual block is the core unit of ResNet and consists of

  • One or more convolutional layers
  • A skip connection that bypasses these layers
  • Addition of input to the convolution output

This design ensures smooth flow of information and gradients across layers.

👁 skip_connection
Residual Block

2. Skip (Shortcut) Connection

  • Bypasses one or more layers
  • Adds input directly to output
  • Prevents vanishing gradients
  • Improves parameter updates

3. Handling Dimension Mismatch: When input and output dimensions differ

  • Zero Padding: Adds extra zeros to the input to match output dimensions
  • Linear Projection: Uses a learnable 1x1 convolution to match input and output dimensions for the skip connection.

4. Stacking Residual Blocks : Multiple residual blocks can be stacked to create deep architectures. This allows networks to go very deep without suffering from degradation.

5. Global Average Pooling (GAP): Before the final fully connected layer ResNet uses GAP

  • Converts each feature map to a single value by averaging
  • Reduces parameters less overfitting
  • Produces compact feature representation

Implementation

We will implement ResNet (v1 and v2) for CIFAR-10 and cover data preprocessing, model creation, training and plotting graphs step by step.

Step 1: Importing Libraries

Import libraries like

  • tensorflow for building and training the model
  • keras defines model layers and structure
  • numpy handles numerical operations
  • os manages files and directories

Step 2: Setting Hyperparameters

  • Set batch_size, epochs, num_classes and data_augmentation
  • Choose ResNet version and number of residual blocks
  • Compute depth based on CIFAR ResNet rules

Step 3: Loading and Preprocessing CIFAR-10 Data

  • Load CIFAR-10 dataset using Keras.
  • Normalize pixel values to range [0, 1].
  • Optionally subtract the dataset mean for zero-centered input.
  • Convert labels to one hot vectors.

Output:

👁 ndjsncjs
Load Dataset

Step 4: Defining Learning Rate

Define learning rate for our model.

Step 5: Defining a ResNet Layer Function

  • Defines a single convolutional layer optionally followed by BatchNorm and ReLU.
  • conv_first applies convolution first

Step 6: Defining ResNet v1

  • Uses 2 layer residual blocks for each residual unit
  • Computes number of residual blocks
  • Adds identity or projection shortcuts when feature map dimensions change
  • Ends with Global Average Pooling and Dense softmax layer

Step 7: Defining ResNet v2

  • Uses 3 layer bottleneck residual blocks.
  • Handles identity or projection shortcuts for dimension matching.
  • Ends with BatchNorm ,ReLU, GAP, Dense, softmax.

Step 8: Compiling the Model

  • Instantiate v1 or v2 based on version.
  • Compile with Adam optimizer, categorical_crossentropy and accuracy metric.

Step 9: Setup Callbacks

  • ModelCheckpoint saves the best model.
  • LearningRateScheduler adjusts learning rate during training.
  • ReduceLROnPlateau reduces LR if validation performance plateaus.

Step 10: Data Augmentation & Training

  • Uses ImageDataGenerator for real time augmentation if enabled.
  • history variable stores training metrics for plotting.

Output:

👁 Screenshot-2025-11-20-163638
Traning

You can download full code from here.

ResNet Results on ImageNet and COCO

On the ImageNet dataset, a 152-layer ResNet, much deeper than VGG-19, achieved high accuracy with fewer parameters. An ensemble of ResNet models reached around 3.7% top-5 error. On the COCO dataset, ResNet showed a 28% relative improvement in object detection performance.

👁 Image
Error-rate on ResNet Architecture

The results show that shortcut connections effectively address the problems caused by increasing network depth as increasing layers from 18 to 34 leads to a decrease in error rate on the ImageNet validation set unlike plain networks.

👁 Image
top-1 and top-5 Error rate on ImageNet Validation Set.

Below are the results on ImageNet Test Set. The 3.57% top-5 error rate of ResNet was the lowest and thus ResNet architecture came first in ImageNet classification challenge in 2015.

👁 Image

Advantages

  • Eases training of deep networks by allowing direct gradient flow through skip connections, reducing vanishing gradient problems
  • Enables very deep architectures (50–152+ layers) with stable training
  • Improves accuracy through residual learning in tasks like image classification and object detection
  • Reduces degradation as increasing depth does not increase training error in ResNet
  • Achieves better performance with fewer parameters compared to traditional deep networks

Challenges

  • Requires high computational power due to its deep architecture
  • Needs projection layers to handle dimension mismatch in skip connections
  • May overfit on small datasets because of large model capacity
  • Training can become unstable without proper batch normalization
  • Very deep networks may still face performance degradation in extreme cases
Comment
Article Tags: