MLOps and LLMOps: Deploying and Scaling AI in Production
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
MLOps and LLMOps: Deploying and Scaling AI in Production
This course is part of Managing AI Systems: Development, Deployment, and Governance Specialization
Instructor: Board Infinity
Included with
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Configure CI/CD pipelines for ML and LLM systems using GitHub Actions and MLflow
Optimize LLM inference pipelines for reduced latency, token cost, and improved reliability
Build automated evaluation frameworks using LLM-as-a-Judge and quality gates
Instrument production AI systems with tracing, drift detection, and observability dashboards
Skills you'll gain
- Application Deployment
- MLOps (Machine Learning Operations)
- Responsible AI
- Scalability
- AI Security
- Site Reliability Engineering
- LLM Application
- Data Ethics
- CI/CD
- System Monitoring
- Cloud Deployment
- Containerization
- Continuous Deployment
- Model Evaluation
- Release Management
- Model Training
- Retrieval-Augmented Generation
Tools you'll learn
Details to know
April 2026
16 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 4 modules in this course
This intermediate course equips ML engineers, data scientists, and software engineers with the practical skills needed to design, deploy, and scale production AI systems. You’ll learn how to architect reliable ML and LLM applications, including model serving patterns, feature stores, and retrieval-augmented generation (RAG) components. The course walks through reproducible training and experimentation pipelines with tools like MLflow and Weights & Biases, from experiment tracking and model registration to production deployment.
You will configure CI/CD workflows tailored to ML and LLM systems, covering data, model, and prompt versioning, automated testing, and safe rollback strategies. The course emphasizes security, privacy, and compliance best practices, including access control, secrets management, and safe handling of user and training data. You’ll design scalable serving infrastructure using containers, Kubernetes, and autoscaling, and apply deployment patterns such as canary, blue-green, shadow, and A/B testing to introduce changes safely. Finally, you’ll build automated evaluation and observability for production AI. This includes automated evaluation pipelines (e.g., LLM-as-a-judge) wired into CI/CD gates, defining and tracking key quality and performance metrics like hallucination rate, latency, throughput, and cost per request, and implementing robust logging, metrics, distributed tracing, and telemetry. You will also detect and monitor data and model drift, bias, and degradation over time using tools such as Arize Phoenix, design alerting strategies, and collaborate with product and reliability teams to establish incident response, runbooks, and continuous improvement processes for AI systems at scale. Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.
Start by grounding learners in practical, production-ready system design for ML and LLM applications. This module connects architectural patterns—serving topologies, feature stores, and retrieval-augmented generation (RAG)—to reproducible experimentation and compliant design decisions. Expect short instructor videos, readings that map design trade-offs, and hands-on exercises using experiment-tracking tools to make architectures actionable.
What's included
9 videos3 readings4 assignments1 plugin
9 videos•Total 92 minutes
- ML/LLM CI/CD Architecture: How It's Different from DevOps•9 minutes
- Automating Build → Test → Deploy for ML Pipelines•10 minutes
- Integrating Model & Data Validation into CI/CD•9 minutes
- Semantic Versioning for Models, Prompts, & Datasets•12 minutes
- Model Registries: MLflow, W&B, and Custom Systems•7 minutes
- Rollbacks & Lineage Tracking for Experiment Safety•8 minutes
- Why ML Environments Drift•18 minutes
- Reproducibility with Docker, Conda, Lockfiles, and Hashes•11 minutes
- Promoting Environments Across Dev → Staging → Production•9 minutes
3 readings•Total 90 minutes
- “CI/CD + CT/CD: Patterns & Anti-patterns in ML Deployment Pipelines”•30 minutes
- “Model Registry Design: Governance, Lineage, and Auditability”•30 minutes
- “Environment Parity Checklist for ML Systems”•30 minutes
4 assignments•Total 105 minutes
- Graded Quiz : Operationalizing AI Pipelines (CI/CD, CT/CD, Versioning)•60 minutes
- Practice Quiz : Foundations of CI/CD for ML & LLM Systems•15 minutes
- Practice Quiz : Model Versioning & Release Management•15 minutes
- Practice Quiz : Environment & Dependency Management•15 minutes
1 plugin•Total 5 minutes
- Quick Course Check-In•5 minutes
Move from design to continuous delivery: this module teaches how to build CI/CD pipelines tailored to ML and LLM systems and how to gate changes with automated evaluation. Learners will set up data, model, and prompt versioning, define meaningful metrics (accuracy, hallucination rate, latency, cost), and implement evaluation pipelines—including LLM-as-a-judge methods—that plug into CI/CD gates. Activities include guided configuration examples, scenario-driven readings, and automated practice quizzes.
What's included
9 videos3 readings4 assignments
9 videos•Total 78 minutes
- Designing Efficient Context Windows•9 minutes
- Structured Prompts for Reliability & Determinism•6 minutes
- Techniques to Reduce Hallucination via Prompt Engineering•8 minutes
- Understanding Latency Budgets & Token Cost Drivers•13 minutes
- Batching, Caching, Streaming, Compression•12 minutes
- Model Choices: API vs Local Models•11 minutes
- Logging Prompt Variants with W&B/Mlflow•3 minutes
- Tracking Prompt-Response Deltas•9 minutes
- Scientific Evaluation of Prompt Variants•8 minutes
3 readings•Total 90 minutes
- “Prompt Architecture Patterns for Production LLM Systems”•30 minutes
- “Token Economics: Understanding Cost Structures of LLM Pipelines•30 minutes
- “Prompt Versioning Framework Example Repository”•30 minutes
4 assignments•Total 105 minutes
- “Help me reduce the latency and cost of my LLM pipeline.”•60 minutes
- Practice Quiz : Managing Context Windows & Prompt Structure•15 minutes
- Practice Quiz : Inference Optimization: Latency & Token Cost•15 minutes
- Practice Quiz : Prompt Versioning & Experiment Tracking•15 minutes
This module focuses on the operational mechanics of serving models and LLMs at scale. You will design and implement containerized serving architectures using orchestration (e.g., Kubernetes), autoscaling, and cost-aware inference pipelines; practice deployment patterns such as canary, blue-green, shadow, and A/B testing; and learn prompt and context-window optimization techniques to balance latency, quality, and cost. Practical labs and demonstrations show real-world manifests, autoscaling configs, and inference pipeline tuning.
What's included
9 videos3 readings4 assignments
9 videos•Total 66 minutes
- Constructing Realistic Evaluation Data•9 minutes
- Sampling Edge Cases & Failure Modes•8 minutes
- Avoiding Bias in Test Data•5 minutes
- Designing Evaluator Prompts•8 minutes
- Scoring for Consistency, Relevance, Correctness•8 minutes
- Limits of Automated Scoring•6 minutes
- Evaluation Triggers During Deployment•7 minutes
- Quality Gates & Release Thresholds•7 minutes
- Reading Evaluation Dashboards for Release Readiness•7 minutes
3 readings•Total 90 minutes
- “LLM Evaluation Dataset Blueprint”•30 minutes
- “Automated Scoring Frameworks for LLM Evaluation”•30 minutes
- “Evaluation Automation Templates Using MLflow/W&B”•30 minutes
4 assignments•Total 75 minutes
- Covers: dataset design, automated evaluation, CI/CD integration.•30 minutes
- Designing LLM Evaluation Datasets•15 minutes
- LLM-as-a-Judge Methodologies•15 minutes
- Practice Quiz : Integrating Evaluation into CI/CD Pipelines•15 minutes
Close the loop by instrumenting systems for deep observability and long-term reliability. Learners will add logging, metrics, distributed tracing, and telemetry; use monitoring platforms (e.g., Arize Phoenix) to detect data/model drift, bias, and degradation; and design alerting and runbooks while coordinating incident response with product and reliability teams. The module culminates in a hands-on capstone programming project that integrates architecture, CI/CD, serving, evaluation, and monitoring into a production-ready AI solution.
What's included
9 videos3 readings4 assignments
9 videos•Total 53 minutes
- Logging Prompts, Responses, and Metadata•8 minutes
- Comparing Experiments Across Versions•8 minutes
- Tracking Inference Metrics•7 minutes
- How Chains and Agents Break•8 minutes
- Using Phoenix to Trace Execution Steps•5 minutes
- Identifying Hallucination Triggers and Bottlenecks•5 minutes
- Data Drift vs Behavioral Drift•5 minutes
- Drift Dashboards & Alerting•3 minutes
- When to Retrain or Update the Pipeline•4 minutes
3 readings•Total 90 minutes
- “Telemetry Best Practices for Production AI”•30 minutes
- “Tracing Playbook for Complex AI Systems”•30 minutes
- “Drift Detection Techniques for LLM Applications”•30 minutes
4 assignments•Total 105 minutes
- “Help me diagnose the failure points in this trace and recommend fixes.”•60 minutes
- Practice Quiz : Experiment Tracking & Telemetry (W&B / MLflow)•15 minutes
- Practice Quiz : Tracing and Debugging with Arize Phoenix•15 minutes
- Practice Quiz : Monitoring Drift & System Health•15 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Software Development
- B
Board Infinity
Course
- B
Board Infinity
Course
Why people choose Coursera for their career
Frequently asked questions
Basic familiarity with Python and ML concepts is recommended. No prior MLOps experience is required — the course builds from foundational CI/CD concepts.
You'll work with MLflow, Weights & Biases (W&B), Arize Phoenix, GitHub Actions, Docker, and various prompt engineering frameworks.
Yes. The course introduces CI/CD and deployment concepts from an ML-first perspective, making it accessible for data scientists.
More questions
Financial aid available,
