VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/understanding-googlenet-model-cnn-architecture/

⇱ Understanding GoogLeNet Model - CNN Architecture - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Understanding GoogLeNet Model - CNN Architecture

Last Updated : 12 May, 2026

GoogLeNet (Inception V1) is a convolutional neural network designed for efficient image classification. It uses the Inception module to process multiple filter sizes in parallel, improving feature extraction while keeping computation low.

  • Inception modules combine 1×1, 3×3, 5×5 convolutions and pooling in parallel
  • Uses 1×1 convolutions and global average pooling to reduce computation and parameters
  • Designed to achieve high accuracy with efficient use of resources

Key Features of GoogLeNet

1. 1×1 Convolutions

GoogLeNet uses 1×1 convolutions mainly for dimensionality reduction, which reduces computation and the number of trainable parameters while preserving important features.

Example Comparison:

  • Without 1×1 Convolution:(14×14×48)×(5×5×480)=112.9M operation
👁 convulation_1
Without 1×1 Convolution
  • With 1×1 Convolution:(14×14×16)×(1×1×480)+(14×14×48)×(5×5×16)=5.3M operations
👁 convulation_2
With 1×1 Convolution

This results in a major reduction in computation without loss of performance.

2. Global Average Pooling

Instead of fully connected layers, GoogLeNet uses Global Average Pooling, which averages each feature map into a single value.

  • Eliminates large number of parameters
  • Reduces overfitting
  • Improves generalization and accuracy

3. Inception Module

The Inception module is the core building block of GoogLeNet. It applies multiple operations in parallel:

  • 1×1 convolutions
  • 3×3 convolutions
  • 5×5 convolutions
  • 3×3 max pooling

All outputs are concatenated to capture multi-scale features efficiently without increasing computation significantly.

👁 convulation_3
Inception Module

4. Auxiliary Classifiers

To reduce vanishing gradient problems, GoogLeNet uses auxiliary classifiers during training.

Each classifier includes:

  • Average pooling
  • 1×1 convolution
  • Fully connected layers
  • Softmax output

These help stabilize training and improve generalization.

5. Model Architecture

GoogLeNet is a 22-layer deep network (excluding pooling layers) that emphasizes computational efficiency, making it feasible to run even on hardware with limited resources. Below is Layer by Layer architectural details of GoogLeNet.

👁 convulation_4
Layer-by-Layer Inception

The architecture also contains two auxiliary classifier layer connected to the output of Inception (4a) and Inception (4d) layers.

Inception V1 architecture

  • Input Layer: Accepts a 224×224 RGB image
  • Initial Convolutions and Pooling: Applies convolution and max pooling layers to extract low-level features and reduce spatial dimensions
  • Local Response Normalization (LRN): Normalizes feature maps early to improve generalization
  • Inception Modules: Apply 1×1, 3×3, 5×5 convolutions and 3×3 max pooling in parallel, then concatenate outputs to capture multi-scale features
  • Auxiliary Classifiers: Intermediate branches with pooling, convolutions, fully connected layers, and softmax used to improve training stability
  • Final Layers: Uses global average pooling followed by a fully connected layer and softmax for final classification

Performance and Results

  • Winner of ILSVRC 2014 in both classification and detection tasks
  • Achieved a top-5 error rate of 6.67% in image classification
  • An ensemble of six GoogLeNet models achieved 43.9% mAP (mean Average Precision) on the ImageNet detection task
👁 Image
GoogLeNet Classification top-5 Error
👁 Image
GoogLeNet Detection Performance

Related Articles

Comment