Model Serving Systems: Containers, APIs & Scalability
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Model Serving Systems: Containers, APIs & Scalability
This course is part of Machine Learning Operations (MLOps) Specialization
Instructor: Board Infinity
Included with
Learn more
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Build optimized Docker images and multi-container ML apps using Docker Compose and multi-stage builds
Design scalable REST APIs with FastAPI, Pydantic validation, versioning, and error handling
Scale ML serving with async queues, load balancing, and latency profiling for production workloads
Skills you'll gain
Details to know
May 2026
17 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 4 modules in this course
"Docker and Model Serving: Deploy ML APIs with FastAPI and ONNX is designed for ML engineers, MLOps practitioners, and backend developers who want to take models from notebooks to production. You'll learn to build Docker containers for ML workloads, design scalable REST APIs with FastAPI, serialize models with ONNX and SavedModel, and deploy with zero-downtime strategies like blue-green and canary releases.
The first module covers Docker fundamentals, image optimization, multi-stage builds, secrets management, and Docker Compose for multi-container ML apps. The second module focuses on REST API design with FastAPI, model versioning, input validation with Pydantic, structured logging, and production-grade error handling. The third module teaches scaling strategies β horizontal scaling, async queues, load balancing, batch vs. real-time inference, and latency optimization for high-throughput serving. The final module covers model serialization formats (ONNX, pickle, SavedModel), blue-green and canary deployments, automated rollback, and disaster recovery. By the end of this course, you will: - Build and optimize Docker images for ML models using multi-stage builds and Compose - Design scalable FastAPI endpoints with versioning, validation, and observability - Scale ML inference with async queues, load balancing, and latency optimization - Deploy models with ONNX serialization and zero-downtime blue-green rollbacks"
This module introduces containerization fundamentals and shows learners how to build efficient Docker images for ML workloads, ensuring portability and reproducibility across environments.
What's included
12 videos4 readings5 assignments
12 videosβ’Total 105 minutes
- Role of Containers in MLOps Careersβ’9 minutes
- Industry Trends in ML Containerizationβ’10 minutes
- Key Tools and Platformsβ’8 minutes
- Understanding Containers vs. VMsβ’11 minutes
- Building a Docker Image for ML Modelsβ’9 minutes
- Running Containers Locallyβ’8 minutes
- Multi-Stage Buildsβ’6 minutes
- Managing Environment Variablesβ’9 minutes
- Secrets and Credentials in Containersβ’6 minutes
- Introduction to Docker Composeβ’10 minutes
- Running ML APIs and Databases Togetherβ’9 minutes
- Networking Between Containersβ’10 minutes
4 readingsβ’Total 60 minutes
- Career Scope in ML Containerizationβ’15 minutes
- Understanding Containers vs. VMsβ’15 minutes
- Optimizing Dockerβ’15 minutes
- Environment Configurationβ’15 minutes
5 assignmentsβ’Total 180 minutes
- Docker for MLβ’60 minutes
- Career Scope in ML Containerizationβ’30 minutes
- Container Fundamentalsβ’30 minutes
- Optimizing Docker Imagesβ’30 minutes
- Multi-Container Deploymentsβ’30 minutes
Learners develop and refine REST APIs for ML model inference, focusing on reliability, scalability, and real-world best practices.
What's included
9 videos3 readings4 assignments
9 videosβ’Total 81 minutes
- Principles of RESTful API Designβ’9 minutes
- Structuring Endpoints for ML Modelsβ’8 minutes
- Using FastAPI for ML Endpoints.β’12 minutes
- Why Version Modelsβ’7 minutes
- Implementing Versioned Endpointsβ’8 minutes
- Handling Multiple Models in Productionβ’10 minutes
- Input Schema Validationβ’10 minutes
- Managing Errors and Exceptionsβ’8 minutes
- Logging and Observabilityβ’8 minutes
3 readingsβ’Total 45 minutes
- Compose Syntaxβ’15 minutes
- Multi-Container Deployment Guideβ’15 minutes
- Structuring Endpoints for ML Modelsβ’15 minutes
4 assignmentsβ’Total 150 minutes
- API Design for ML Servingβ’60 minutes
- REST API Architecture for MLβ’30 minutes
- Model Versioning and Routingβ’30 minutes
- Handling Input Validationβ’30 minutes
This module emphasizes scalability, concurrency, and optimization for production-grade model serving systems.
What's included
9 videos3 readings4 assignments
9 videosβ’Total 79 minutes
- Vertical vs. Horizontal Scalingβ’11 minutes
- Async Processing and Queuesβ’8 minutes
- Load Balancing Basicsβ’9 minutes
- When to Use Batch Servingβ’11 minutes
- Building Batch Pipelinesβ’6 minutes
- Real-Time Inference with Queuesβ’9 minutes
- Profiling Inference Performanceβ’10 minutes
- Latency Reduction Techniquesβ’8 minutes
- Monitoring Throughput and Costβ’7 minutes
3 readingsβ’Total 45 minutes
- Why Version Modelsβ’15 minutes
- Model Registry Integrationβ’15 minutes
- API Error Codesβ’15 minutes
4 assignmentsβ’Total 150 minutes
- Scaling Model Servingβ’60 minutes
- Scaling Strategiesβ’30 minutes
- Batch vs. Real-Time Servingβ’30 minutes
- Performance Optimizationβ’30 minutes
The final module demonstrates how to save, deploy, and safely roll back production models while maintaining uptime and integrity.
What's included
9 videos3 readings4 assignments
9 videosβ’Total 79 minutes
- Common Serialization Techniquesβ’10 minutes
- Converting Between Formatsβ’8 minutes
- Storing and Loading Modelsβ’7 minutes
- Zero-Downtime Deploymentsβ’8 minutes
- Blue-Green and Canary Patternsβ’9 minutes
- Staging and Validationβ’8 minutes
- Detecting Failed Deploymentsβ’8 minutes
- Automated Rollback Workflowsβ’9 minutes
- Validating Restored Versionsβ’12 minutes
3 readingsβ’Total 45 minutes
- Debugging Production API Issuesβ’15 minutes
- Load Balancing Basicsβ’15 minutes
- Building Batch Pipelinesβ’15 minutes
4 assignmentsβ’Total 150 minutes
- Model Serialization and Deploymentβ’60 minutes
- Model Serialization Formatsβ’30 minutes
- Deployment Strategiesβ’30 minutes
- Rollback and Recoveryβ’30 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Machine Learning
- B
Board Infinity
Course
- B
Board Infinity
Course
Why people choose Coursera for their career
Frequently asked questions
No prior Docker experience is required. Module 1 starts with container fundamentals and guides you through building ML-optimized images from scratch.
The course covers Docker, Docker Compose, FastAPI, Pydantic, ONNX, pickle, TensorFlow SavedModel, load balancers, and message queues for real-time inference.
Yes, basic Python is expected since you'll be writing FastAPI endpoints and model serialization scripts. ML training experience is helpful but not required.
More questions
Financial aid available,
