Training, Evaluating, and Monitoring Machine Learning Models
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Training, Evaluating, and Monitoring Machine Learning Models
This course is part of Machine Learning Made Easy for Software Engineers Specialization
Included with
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Train machine learning models and analyze training dynamics using logs and loss curves
Evaluate model performance using metrics, confusion matrices, and statistical analysis
Design monitoring strategies to detect model drift and maintain model reliability
Skills you'll gain
Tools you'll learn
Details to know
March 2026
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 10 modules in this course
Building machine learning models is only the first step. To create reliable ML systems, engineers must evaluate model performance, diagnose prediction errors, and monitor deployed models over time. In this course, you'll learn how to train, evaluate, and monitor machine learning models using practical engineering techniques.
You’ll begin by exploring model training strategies that improve convergence and performance. You’ll analyze training logs, loss curves, and class imbalance effects to understand how models learn and where they struggle. Next, you’ll learn how to evaluate machine learning models using appropriate performance metrics. You’ll analyze confusion matrices and residual patterns to identify systematic prediction errors and assess the statistical significance of model improvements. Finally, you’ll focus on monitoring machine learning models in production environments. You’ll apply validation techniques, analyze A/B testing results, and monitor model behavior over time to detect performance drift and trigger retraining workflows. Through a hands-on project, you'll design a model evaluation and monitoring framework that helps ensure machine learning systems remain accurate and reliable after deployment.
You will apply batch and mini-batch training procedures to optimize model convergence.
What's included
3 videos1 reading1 assignment
3 videos•Total 13 minutes
- Introduction and Welcome•4 minutes
- Why Mini-Batches Improve Training Stability•5 minutes
- How Schedulers Influence Convergence•4 minutes
1 reading•Total 6 minutes
- Batch vs Mini-Batch: What Changes in Practice•6 minutes
1 assignment•Total 15 minutes
- Hands-On Activity: Train a PyTorch Model with Mini-Batches and Scheduler•15 minutes
You will analyze training logs and loss curves to diagnose common model training issues.
What's included
2 videos1 reading1 ungraded lab
2 videos•Total 5 minutes
- Reading Loss Curves Like an Analyst•3 minutes
- Spotting Instability Using Training Logs•2 minutes
1 reading•Total 6 minutes
- Common Training Issues and How Logs Reveal Them•6 minutes
1 ungraded lab•Total 60 minutes
- Fix Overfitting by Analyzing Divergence Patterns•60 minutes
You will evaluate the impact of class-imbalance techniques on model performance.
What's included
1 video1 reading2 assignments
1 video•Total 3 minutes
- Choosing Class-Imbalance Methods with Confidence•3 minutes
1 reading•Total 7 minutes
- How Balanced Data Shapes Your Model’s F1 Score•7 minutes
2 assignments•Total 37 minutes
- Graded Quiz: Assessing Training, Diagnostics, and Imbalance Methods•25 minutes
- Hands-On Activity: Compare F1 Scores Using Class-Weights and SMOTE•12 minutes
You will apply appropriate performance metrics to evaluate machine learning models.
What's included
2 videos1 reading1 assignment
2 videos•Total 10 minutes
- Why Metrics Matter in Model Evaluation?•4 minutes
- RMSE vs. MAE for Regression Models•6 minutes
1 reading•Total 10 minutes
- Reflecting on Model Performance Metrics •10 minutes
1 assignment•Total 15 minutes
- Hands-On Activity: Metric Matching Exercise•15 minutes
You will analyze confusion matrices and residual plots to identify systematic model prediction errors.
What's included
2 videos1 reading1 assignment
2 videos•Total 9 minutes
- Looking Inside the Confusion Matrix•5 minutes
- Residual Plots for Regression Diagnostics•4 minutes
1 reading•Total 10 minutes
- Diagnosing Systematic Model Errors with Confusion Matrices and Residual Plots •10 minutes
1 assignment•Total 15 minutes
- Hands-On Activity: Spam Filter Failure Analysis•15 minutes
You will evaluate the statistical significance of differences in metrics.
What's included
2 videos1 reading1 assignment1 ungraded lab
2 videos•Total 10 minutes
- Why Statistical Significance Matters in Model Comparison•4 minutes
- Bootstrapping Metrics Step by Step•6 minutes
1 reading•Total 10 minutes
- Evaluating Statistical Significance in Automated Model Monitoring •10 minutes
1 assignment•Total 20 minutes
- Graded Quiz: Interpreting Metrics and Model Improvements•20 minutes
1 ungraded lab•Total 60 minutes
- End-to-End Model Evaluation Practice•60 minutes
You will apply validation techniques to assess model performance on unseen data.
What's included
2 videos1 reading1 assignment
2 videos•Total 6 minutes
- Why Validation Is a Release Gate•3 minutes
- Hold-Out Sets and Evaluation Metrics in Practice•3 minutes
1 reading•Total 10 minutes
- Designing a Validation Checklist for Release Candidates•10 minutes
1 assignment•Total 15 minutes
- Hands-On Activity: Validate a Release Candidate Model•15 minutes
You will analyze A/B test or shadow deployment results to compare new model performance against a baseline.
What's included
2 videos1 reading1 assignment
2 videos•Total 8 minutes
- From Offline Metrics to Online Impact•4 minutes
- A/B Tests vs. Shadow Deployments Explained•4 minutes
1 reading•Total 10 minutes
- Comparing Models Using A/B Testing and Shadow Deployments •10 minutes
1 assignment•Total 15 minutes
- Hands-On Activity: Analyze Shadow Deployment Results•15 minutes
You will evaluate model-drift indicators to trigger retraining workflows.
What's included
2 videos1 reading1 assignment1 ungraded lab
2 videos•Total 8 minutes
- Why Models Drift in Production•4 minutes
- Using PSI for Ongoing Monitoring•4 minutes
1 reading•Total 10 minutes
- Automating Monitoring and Retraining Triggers•10 minutes
1 assignment•Total 20 minutes
- Graded Quiz: Validate, Analyze, and Monitor ML Models•20 minutes
1 ungraded lab•Total 60 minutes
- Build a Drift Monitoring Workflow•60 minutes
In this project, you will design and implement a machine learning model evaluation and monitoring framework for a production system. A technology company has deployed a recommendation model that predicts user engagement with content, but its performance has become inconsistent due to potential data drift and evolving user behavior. Your task is to build an evaluation pipeline that compares model versions, analyzes prediction errors, and monitors performance stability over time. You will train baseline and improved models, analyze training logs and loss curves to verify convergence, evaluate class-imbalance handling techniques to ensure fair evaluation across classes, evaluate them using appropriate metrics, analyze errors with confusion matrices and residual plots, perform statistical comparisons, simulate monitoring scenarios such as A/B testing or shadow deployments, calculate drift indicators like Population Stability Index (PSI), and define conditions for model retraining. The final deliverable is a modular Python evaluation framework along with a written engineering explanation demonstrating how evaluation insights support reliable model deployment decisions.
What's included
2 readings1 assignment
2 readings•Total 12 minutes
- Why Model Evaluation and Monitoring Matter in Production ML Systems •6 minutes
- Project Requirements•6 minutes
1 assignment•Total 70 minutes
- End-to-End Model Evaluation & Monitoring Framework •70 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Explore more from Machine Learning
- Status: Free Trial
Course
- Status: Free Trial
Course
- Status: Free TrialC
Coursera
Course
- Status: Free Trial
Course
Why people choose Coursera for their career
Frequently asked questions
This course is designed for learners with some experience in programming and machine learning. It focuses on techniques used to evaluate and maintain ML models in real-world systems.
You'll learn how to use performance metrics, confusion matrices, residual analysis, and statistical evaluation techniques to assess model performance and diagnose prediction errors.
Models can degrade over time as data changes. Monitoring helps detect issues such as model drift or performance drops so teams can retrain or update models before problems affect users.
More questions
Financial aid available,
¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.
