Model Serving Systems: Containers, APIs & Scalability

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 Board Infinity

Model Serving Systems: Containers, APIs & Scalability

This course is part of Machine Learning Operations (MLOps) Specialization

👁 Board Infinity

Instructor: Board Infinity

Included with

•

Learn more

Ask Coursera

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Build optimized Docker images and multi-container ML apps using Docker Compose and multi-stage builds
Design scalable REST APIs with FastAPI, Pydantic validation, versioning, and error handling
Scale ML serving with async queues, load balancing, and latency profiling for production workloads

Skills you'll gain

Tools you'll learn

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Machine Learning Operations (MLOps) Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

👁 Image

There are 4 modules in this course

"Docker and Model Serving: Deploy ML APIs with FastAPI and ONNX is designed for ML engineers, MLOps practitioners, and backend developers who want to take models from notebooks to production. You'll learn to build Docker containers for ML workloads, design scalable REST APIs with FastAPI, serialize models with ONNX and SavedModel, and deploy with zero-downtime strategies like blue-green and canary releases.

The first module covers Docker fundamentals, image optimization, multi-stage builds, secrets management, and Docker Compose for multi-container ML apps. The second module focuses on REST API design with FastAPI, model versioning, input validation with Pydantic, structured logging, and production-grade error handling. The third module teaches scaling strategies — horizontal scaling, async queues, load balancing, batch vs. real-time inference, and latency optimization for high-throughput serving. The final module covers model serialization formats (ONNX, pickle, SavedModel), blue-green and canary deployments, automated rollback, and disaster recovery. By the end of this course, you will: - Build and optimize Docker images for ML models using multi-stage builds and Compose - Design scalable FastAPI endpoints with versioning, validation, and observability - Scale ML inference with async queues, load balancing, and latency optimization - Deploy models with ONNX serialization and zero-downtime blue-green rollbacks"

This module introduces containerization fundamentals and shows learners how to build efficient Docker images for ML workloads, ensuring portability and reproducibility across environments.

What's included

12 videos4 readings5 assignments

12 videos•Total 105 minutes

Role of Containers in MLOps Careers•9 minutes
Industry Trends in ML Containerization•10 minutes
Key Tools and Platforms•8 minutes
Understanding Containers vs. VMs•11 minutes
Building a Docker Image for ML Models•9 minutes
Running Containers Locally•8 minutes
Multi-Stage Builds•6 minutes
Managing Environment Variables•9 minutes
Secrets and Credentials in Containers•6 minutes
Introduction to Docker Compose•10 minutes
Running ML APIs and Databases Together•9 minutes
Networking Between Containers•10 minutes

4 readings•Total 60 minutes

Career Scope in ML Containerization•15 minutes
Understanding Containers vs. VMs•15 minutes
Optimizing Docker•15 minutes
Environment Configuration•15 minutes

5 assignments•Total 180 minutes

Docker for ML•60 minutes
Career Scope in ML Containerization•30 minutes
Container Fundamentals•30 minutes
Optimizing Docker Images•30 minutes
Multi-Container Deployments•30 minutes

Learners develop and refine REST APIs for ML model inference, focusing on reliability, scalability, and real-world best practices.

What's included

9 videos3 readings4 assignments

9 videos•Total 81 minutes

Principles of RESTful API Design•9 minutes
Structuring Endpoints for ML Models•8 minutes
Using FastAPI for ML Endpoints.•12 minutes
Why Version Models•7 minutes
Implementing Versioned Endpoints•8 minutes
Handling Multiple Models in Production•10 minutes
Input Schema Validation•10 minutes
Managing Errors and Exceptions•8 minutes
Logging and Observability•8 minutes

3 readings•Total 45 minutes

Compose Syntax•15 minutes
Multi-Container Deployment Guide•15 minutes
Structuring Endpoints for ML Models•15 minutes

4 assignments•Total 150 minutes

API Design for ML Serving•60 minutes
REST API Architecture for ML•30 minutes
Model Versioning and Routing•30 minutes
Handling Input Validation•30 minutes

This module emphasizes scalability, concurrency, and optimization for production-grade model serving systems.

What's included

9 videos3 readings4 assignments

9 videos•Total 79 minutes

Vertical vs. Horizontal Scaling•11 minutes
Async Processing and Queues•8 minutes
Load Balancing Basics•9 minutes
When to Use Batch Serving•11 minutes
Building Batch Pipelines•6 minutes
Real-Time Inference with Queues•9 minutes
Profiling Inference Performance•10 minutes
Latency Reduction Techniques•8 minutes
Monitoring Throughput and Cost•7 minutes

3 readings•Total 45 minutes

Why Version Models•15 minutes
Model Registry Integration•15 minutes
API Error Codes•15 minutes

4 assignments•Total 150 minutes

Scaling Model Serving•60 minutes
Scaling Strategies•30 minutes
Batch vs. Real-Time Serving•30 minutes
Performance Optimization•30 minutes

The final module demonstrates how to save, deploy, and safely roll back production models while maintaining uptime and integrity.

What's included

9 videos3 readings4 assignments

9 videos•Total 79 minutes

Common Serialization Techniques•10 minutes
Converting Between Formats•8 minutes
Storing and Loading Models•7 minutes
Zero-Downtime Deployments•8 minutes
Blue-Green and Canary Patterns•9 minutes
Staging and Validation•8 minutes
Detecting Failed Deployments•8 minutes
Automated Rollback Workflows•9 minutes
Validating Restored Versions•12 minutes

3 readings•Total 45 minutes

Debugging Production API Issues•15 minutes
Load Balancing Basics•15 minutes
Building Batch Pipelines•15 minutes

4 assignments•Total 150 minutes

Model Serialization and Deployment•60 minutes
Model Serialization Formats•30 minutes
Deployment Strategies•30 minutes
Rollback and Recovery•30 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

👁 Board Infinity

Board Infinity

261 Courses•428,749 learners

Offered by

👁 Image

Board Infinity

Explore more from Machine Learning

👁 Image
B
Board Infinity
DevOps for Machine Learning: CI/CD, APIs & Deployment
Course
👁 Image
B
Board Infinity
Cloud Platforms for ML: AWS, Azure & GCP Deployment
Course

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

No prior Docker experience is required. Module 1 starts with container fundamentals and guides you through building ML-optimized images from scratch.

The course covers Docker, Docker Compose, FastAPI, Pydantic, ONNX, pickle, TensorFlow SavedModel, load balancers, and message queues for real-time inference.

Yes, basic Python is expected since you'll be writing FastAPI endpoints and model serialization scripts. ML training experience is helpful but not required.

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/model-serving-systems-containers-apis-and-scalability