Deep Learning: Advanced Backbones and Efficient GPU Training
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Deep Learning: Advanced Backbones and Efficient GPU Training
This course is part of Advanced Deep Learning Architectures Specialization
Instructor: Board Infinity
Included with
Learn more
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Build and fine-tune ConvNeXt and Vision Transformer models using PyTorch Lightning and the timm library
Apply RMSNorm, SwiGLU, and Rotary Position Embeddings (RoPE) in modern transformer architectures
Implement mixed precision, gradient accumulation, and DDP/FSDP for efficient multi-GPU training
Design, track, and benchmark CNN vs. ViT experiments using TensorBoard, W&B, and PyTorch Profiler
Skills you'll gain
Tools you'll learn
Details to know
May 2026
16 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 4 modules in this course
Master advanced deep learning architectures and efficient training techniques using PyTorch Lightning, timm, ConvNeXt, Vision Transformers, RoPE, SwiGLU, RMSNorm, and Weights & Biases. This course equips you to design, train, and benchmark modern backbones on limited GPU hardware for real-world production use.
Module 1 introduces modern backbone architectures, tracing the evolution from ResNets to ConvNeXt and Vision Transformers, covering patch embeddings, multi-head self-attention, and position encodings. Module 2 dives into training dynamics and stabilization techniques including RMSNorm, SwiGLU activations, and Rotary Position Embeddings (RoPE) for stable, scalable training. Module 3 focuses on efficient training on limited GPUs using mixed precision (FP16/BF16), gradient accumulation, efficient data pipelines, and distributed training with DDP/FSDP in Lightning. Module 4 covers experiment tracking with TensorBoard and W&B, profiling FLOPs and throughput, and a hands-on ViT vs. CNN Showdown project with fine-tuning in timm. By the end of this course, you will: - Build and fine-tune ConvNeXt and Vision Transformer backbones using PyTorch Lightning and timm - Apply RMSNorm, SwiGLU, and RoPE to stabilize and scale deep transformer training - Implement mixed precision, gradient accumulation, and DDP/FSDP for efficient multi-GPU training - Design controlled CNN vs. ViT experiments with W&B tracking and PyTorch profiling Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.
Explore the evolution of deep learning backbones from classical CNNs to ConvNeXt and Vision Transformers, understanding their mechanics, trade-offs, and industry relevance.
What's included
10 videos3 readings4 assignments
10 videosβ’Total 80 minutes
- Where Advanced Architectures Are Used Todayβ’11 minutes
- CNNs vs Transformers: Industry Realityβ’8 minutes
- Skills You Need as a Vision Engineerβ’7 minutes
- Why Classic CNNs Started Failingβ’8 minutes
- What ConvNeXt Fixed in Old CNNsβ’8 minutes
- ResNet vs ConvNeXt β Part 1β’8 minutes
- ResNet vs ConvNeXt - Part 2β’10 minutes
- How Images Become Tokensβ’6 minutes
- What Attention Really Doesβ’6 minutes
- CNN vs ViT: Choosing the Right Backboneβ’9 minutes
3 readingsβ’Total 90 minutes
- Industry Landscape: Modern Backbones & Global Attentionβ’30 minutes
- Architectural Transition: From ResNet to ConvNeXtβ’30 minutes
- Inside the ViT Forward Pass: Tokens, Attention & Positional Structureβ’30 minutes
4 assignmentsβ’Total 150 minutes
- Modern Backbone Architectures (ConvNeXt & Vision Transformers)β’60 minutes
- Career Scope in Advanced Architecturesβ’30 minutes
- The Evolution Beyond ResNetsβ’30 minutes
- Vision Transformers Under the Hoodβ’30 minutes
Learn modern stabilization and efficiency techniques including RMSNorm, SwiGLU activations, and Rotary Position Embeddings that power state-of-the-art transformers.
What's included
8 videos3 readings4 assignments
8 videosβ’Total 66 minutes
- Why Normalization Is Neededβ’6 minutes
- BatchNorm vs LayerNorm vs RMSNormβ’8 minutes
- Practical Effects on Training Stabilityβ’7 minutes
- Why ReLU Is Not Enough Anymoreβ’9 minutes
- GELU & SwiGLU Explained Visuallyβ’8 minutes
- Practical Gains: Stability, Expressiveness, Convergence Speedβ’11 minutes
- Why Position Encoding Mattersβ’8 minutes
- RoPE Explained Intuitivelyβ’9 minutes
3 readingsβ’Total 90 minutes
- Normalization Benchmarks in Modern Architecturesβ’30 minutes
- SwiGLU in Production Transformersβ’30 minutes
- RoPE Explained: Sequence Extrapolation & Rotary Geometryβ’30 minutes
4 assignmentsβ’Total 150 minutes
- Training Dynamics & Stabilization Techniquesβ’60 minutes
- RMSNorm & Normalization Strategiesβ’30 minutes
- SwiGLU & Modern Activation Functionsβ’30 minutes
- Rotary Position Embeddings (RoPE)β’30 minutes
Master practical techniques for training large models on limited hardware including mixed precision, gradient accumulation, and distributed training strategies.
What's included
9 videos3 readings4 assignments
9 videosβ’Total 79 minutes
- Why Mixed Precision Mattersβ’9 minutes
- FP16 vs BF16: When to Use Whatβ’9 minutes
- Common Mixed Precision Failuresβ’8 minutes
- What Gradient Accumulation Really Doesβ’10 minutes
- Effective Batch Size Explained Clearlyβ’10 minutes
- Stability Issues with Large Batchesβ’9 minutes
- Single GPU vs Multi-GPU: When to Scaleβ’9 minutes
- DDP vs FSDP (Decision-Based)β’8 minutes
- Measuring Speed & Memory Correctlyβ’8 minutes
3 readingsβ’Total 90 minutes
- AMP Benchmarks & Failure Patternsβ’30 minutes
- Efficient Data Pipelines for Transformers & ViTsβ’30 minutes
- Distributed Training on Commodity Hardwareβ’30 minutes
4 assignmentsβ’Total 150 minutes
- Efficient Training on Limited GPUsβ’60 minutes
- Mixed Precision Training (FP16/BF16)β’30 minutes
- Gradient Accumulation & Large-Batch Simulationβ’30 minutes
- Distributed Training with Lightning (DDP/FSDP)β’30 minutes
Learn to track experiments professionally and apply all course concepts in a hands-on ViT vs CNN Showdown project using fine-tuning with timm and PyTorch Lightning.
What's included
12 videos3 readings4 assignments
12 videosβ’Total 101 minutes
- What to Trackβ’9 minutes
- Visualizing Loss Curves, Gradient Norms & Failure Modesβ’9 minutes
- Profiling Memory & FLOPsβ’10 minutes
- How Bad Comparisons Happenβ’8 minutes
- Controlling Variables Properlyβ’8 minutes
- Forming Clear Hypothesesβ’6 minutes
- Fine-Tuning ConvNeXt & ViTβ’10 minutes
- Fine-Tuning ConvNeXt & ViT Part 2β’6 minutes
- Fine-Tuning ConvNeXt & ViT Part 3β’9 minutes
- Applying Mixed Precision & Efficiency Techniques Part -1β’7 minutes
- Applying Mixed Precision & Efficiency Techniquesβ’7 minutes
- Interpreting Results Like an Engineerβ’10 minutes
3 readingsβ’Total 90 minutes
- Experiment Reproducibility & Performance Debuggingβ’30 minutes
- Backbone Design Patterns: Freezing, Unfreezing, Adapters, Head Tuningβ’30 minutes
- Case Study: Fine-Grained Classification with Modern Backbonesβ’30 minutes
4 assignmentsβ’Total 150 minutes
- Experimentation, Tracking & The ViT vs CNN Showdown Projectβ’60 minutes
- Experiment Tracking (TensorBoard & W&B)β’30 minutes
- Designing a CNN vs ViT Experimentβ’30 minutes
- The Hands-On Project - The ViT vs CNN Showdownβ’30 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Machine Learning
- B
Board Infinity
Course
- B
Board Infinity
Course
Why people choose Coursera for their career
Frequently asked questions
Yes. You should have working knowledge of PyTorch, CNNs, and standard training loops. Familiarity with transformers is helpful but not mandatory.
You'll work with PyTorch Lightning, the timm library, Weights & Biases, TensorBoard, and the PyTorch Profiler throughout the course.
The course prepares you for roles such as Deep Learning Engineer, Computer Vision Engineer, ML Research Engineer, and AI Infrastructure Engineer.
More questions
Financial aid available,
