Machine Learning: Classification
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Machine Learning: Classification
This course is part of Machine Learning Specialization
Instructors: Emily Fox
130,473 already enrolled
Included with
Learn more
Ask Coursera
3,739 reviews
3,739 reviews
Skills you'll gain
- Decision Tree Learning
- Model Optimization
- Model Evaluation
- Natural Language Processing
- Scalability
- Machine Learning
- Probability & Statistics
- Predictive Modeling
- Applied Machine Learning
- Text Mining
- Supervised Learning
- Data Cleansing
- Model Training
- Logistic Regression
- Feature Engineering
- Data Preprocessing
- Machine Learning Algorithms
- Classification And Regression Tree (CART)
Tools you'll learn
Details to know
19 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 10 modules in this course
Case Studies: Analyzing Sentiment & Loan Default Prediction
In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank. These tasks are an examples of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification. In this course, you will create classifiers that provide state-of-the-art performance on a variety of tasks. You will become familiar with the most successful techniques, which are most widely used in practice, including logistic regression, decision trees and boosting. In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale, using stochastic gradient ascent. You will implement these technique on real-world, large-scale machine learning tasks. You will also address significant tasks you will face in real-world applications of ML, including handling missing data and measuring precision and recall to evaluate a classifier. This course is hands-on, action-packed, and full of visualizations and illustrations of how these techniques will behave on real data. We've also included optional content in every module, covering advanced topics for those who want to go even deeper! Learning Objectives: By the end of this course, you will be able to: -Describe the input and output of a classification model. -Tackle both binary and multiclass classification problems. -Implement a logistic regression model for large-scale classification. -Create a non-linear model using decision trees. -Improve the performance of any model using boosting. -Scale your methods with stochastic gradient ascent. -Describe the underlying decision boundaries. -Build a classification model to predict sentiment in a product review dataset. -Analyze financial data to predict loan defaults. -Use techniques for handling missing data. -Evaluate your models using precision-recall metrics. -Implement these techniques in Python (or in the language of your choice, though Python is highly recommended).
Classification is one of the most widely used techniques in machine learning, with a broad array of applications, including sentiment analysis, ad targeting, spam detection, risk assessment, medical diagnosis and image classification. The core goal of classification is to predict a category or class y from some inputs x. Through this course, you will become familiar with the fundamental models and algorithms used in classification, as well as a number of core machine learning concepts. Rather than covering all aspects of classification, you will focus on a few core techniques, which are widely used in the real-world to get state-of-the-art performance. By following our hands-on approach, you will implement your own algorithms on multiple real-world tasks, and deeply grasp the core techniques needed to be successful with these approaches in practice. This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.
What's included
8 videos4 readings
8 videosβ’Total 27 minutes
- Welcome to the classification course, a part of the Machine Learning Specializationβ’1 minute
- What is this course about?β’6 minutes
- Impact of classificationβ’1 minute
- Course overviewβ’3 minutes
- Outline of first half of courseβ’6 minutes
- Outline of second half of courseβ’6 minutes
- Assumed backgroundβ’3 minutes
- Let's get started!β’1 minute
4 readingsβ’Total 35 minutes
- Important Update regarding the Machine Learning Specializationβ’10 minutes
- Slides presented in this moduleβ’10 minutes
- Get help and meet other learners. Join your Community!β’5 minutes
- Reading: Software tools you'll needβ’10 minutes
Linear classifiers are amongst the most practical classification methods. For example, in our sentiment analysis case-study, a linear classifier associates a coefficient with the counts of each word in the sentence. In this module, you will become proficient in this type of representation. You will focus on a particularly useful type of linear classifier called logistic regression, which, in addition to allowing you to predict a class, provides a probability associated with the prediction. These probabilities are extremely useful, since they provide a degree of confidence in the predictions. In this module, you will also be able to construct features from categorical inputs, and to tackle classification problems with more than two class (multiclass problems). You will examine the results of these techniques on a real-world product sentiment analysis task.
What's included
18 videos2 readings2 assignments
18 videosβ’Total 78 minutes
- Linear classifiers: A motivating exampleβ’2 minutes
- Intuition behind linear classifiersβ’4 minutes
- Decision boundariesβ’3 minutes
- Linear classifier modelβ’6 minutes
- Effect of coefficient values on decision boundaryβ’2 minutes
- Using features of the inputsβ’2 minutes
- Predicting class probabilitiesβ’2 minutes
- Review of basics of probabilitiesβ’6 minutes
- Review of basics of conditional probabilitiesβ’9 minutes
- Using probabilities in classificationβ’3 minutes
- Predicting class probabilities with (generalized) linear modelsβ’5 minutes
- The sigmoid (or logistic) link functionβ’5 minutes
- Logistic regression modelβ’5 minutes
- Effect of coefficient values on predicted probabilitiesβ’7 minutes
- Overview of learning logistic regression modelsβ’2 minutes
- Encoding categorical inputsβ’5 minutes
- Multiclass classification with 1 versus allβ’7 minutes
- Recap of logistic regression classifierβ’1 minute
2 readingsβ’Total 20 minutes
- Slides presented in this moduleβ’10 minutes
- Predicting sentiment from product reviewsβ’10 minutes
2 assignmentsβ’Total 60 minutes
- Linear Classifiers & Logistic Regressionβ’30 minutes
- Predicting sentiment from product reviewsβ’30 minutes
Once familiar with linear classifiers and logistic regression, you can now dive in and write your first learning algorithm for classification. In particular, you will use gradient ascent to learn the coefficients of your classifier from data. You first will need to define the quality metric for these tasks using an approach called maximum likelihood estimation (MLE). You will also become familiar with a simple technique for selecting the step size for gradient ascent. An optional, advanced part of this module will cover the derivation of the gradient for logistic regression. You will implement your own learning algorithm for logistic regression from scratch, and use it to learn a sentiment analysis classifier.
What's included
18 videos2 readings2 assignments
18 videosβ’Total 83 minutes
- Goal: Learning parameters of logistic regressionβ’2 minutes
- Intuition behind maximum likelihood estimationβ’4 minutes
- Data likelihoodβ’8 minutes
- Finding best linear classifier with gradient ascentβ’3 minutes
- Review of gradient ascentβ’6 minutes
- Learning algorithm for logistic regressionβ’3 minutes
- Example of computing derivative for logistic regressionβ’6 minutes
- Interpreting derivative for logistic regressionβ’6 minutes
- Summary of gradient ascent for logistic regressionβ’2 minutes
- Choosing step sizeβ’6 minutes
- Careful with step sizes that are too largeβ’4 minutes
- Rule of thumb for choosing step sizeβ’4 minutes
- (VERY OPTIONAL) Deriving gradient of logistic regression: Log trickβ’5 minutes
- (VERY OPTIONAL) Expressing the log-likelihoodβ’3 minutes
- (VERY OPTIONAL) Deriving probability y=-1 given xβ’2 minutes
- (VERY OPTIONAL) Rewriting the log likelihood into a simpler formβ’8 minutes
- (VERY OPTIONAL) Deriving gradient of log likelihoodβ’8 minutes
- Recap of learning logistic regression classifiersβ’2 minutes
2 readingsβ’Total 20 minutes
- Slides presented in this moduleβ’10 minutes
- Implementing logistic regression from scratchβ’10 minutes
2 assignmentsβ’Total 60 minutes
- Learning Linear Classifiersβ’30 minutes
- Implementing logistic regression from scratchβ’30 minutes
As we saw in the regression course, overfitting is perhaps the most significant challenge you will face as you apply machine learning approaches in practice. This challenge can be particularly significant for logistic regression, as you will discover in this module, since we not only risk getting an overly complex decision boundary, but your classifier can also become overly confident about the probabilities it predicts. In this module, you will investigate overfitting in classification in significant detail, and obtain broad practical insights from some interesting visualizations of the classifiers' outputs. You will then add a regularization term to your optimization to mitigate overfitting. You will investigate both L2 regularization to penalize large coefficient values, and L1 regularization to obtain additional sparsity in the coefficients. Finally, you will modify your gradient ascent algorithm to learn regularized logistic regression classifiers. You will implement your own regularized logistic regression classifier from scratch, and investigate the impact of the L2 penalty on real-world sentiment analysis data.
What's included
13 videos2 readings2 assignments
13 videosβ’Total 66 minutes
- Evaluating a classifierβ’3 minutes
- Review of overfitting in regressionβ’3 minutes
- Overfitting in classificationβ’6 minutes
- Visualizing overfitting with high-degree polynomial featuresβ’4 minutes
- Overfitting in classifiers leads to overconfident predictionsβ’5 minutes
- Visualizing overconfident predictionsβ’4 minutes
- (OPTIONAL) Another perspecting on overfitting in logistic regressionβ’9 minutes
- Penalizing large coefficients to mitigate overfittingβ’5 minutes
- L2 regularized logistic regressionβ’5 minutes
- Visualizing effect of L2 regularization in logistic regressionβ’6 minutes
- Learning L2 regularized logistic regression with gradient ascentβ’8 minutes
- Sparse logistic regression with L1 regularizationβ’8 minutes
- Recap of overfitting & regularization in logistic regressionβ’1 minute
2 readingsβ’Total 20 minutes
- Slides presented in this moduleβ’10 minutes
- Logistic Regression with L2 regularizationβ’10 minutes
2 assignmentsβ’Total 60 minutes
- Overfitting & Regularization in Logistic Regressionβ’30 minutes
- Logistic Regression with L2 regularizationβ’30 minutes
Along with linear classifiers, decision trees are amongst the most widely used classification techniques in the real world. This method is extremely intuitive, simple to implement and provides interpretable predictions. In this module, you will become familiar with the core decision trees representation. You will then design a simple, recursive greedy algorithm to learn decision trees from data. Finally, you will extend this approach to deal with continuous inputs, a fundamental requirement for practical problems. In this module, you will investigate a brand new case-study in the financial sector: predicting the risk associated with a bank loan. You will implement your own decision tree learning algorithm on real loan data.
What's included
13 videos3 readings3 assignments
13 videosβ’Total 47 minutes
- Predicting loan defaults with decision treesβ’4 minutes
- Intuition behind decision treesβ’2 minutes
- Task of learning decision trees from dataβ’3 minutes
- Recursive greedy algorithmβ’4 minutes
- Learning a decision stumpβ’4 minutes
- Selecting best feature to split onβ’6 minutes
- When to stop recursingβ’4 minutes
- Making predictions with decision treesβ’1 minute
- Multiclass classification with decision treesβ’3 minutes
- Threshold splits for continuous inputsβ’6 minutes
- (OPTIONAL) Picking the best threshold to split onβ’3 minutes
- Visualizing decision boundariesβ’6 minutes
- Recap of decision treesβ’1 minute
3 readingsβ’Total 30 minutes
- Slides presented in this moduleβ’10 minutes
- Identifying safe loans with decision treesβ’10 minutes
- Implementing binary decision treesβ’10 minutes
3 assignmentsβ’Total 74 minutes
- Decision Treesβ’30 minutes
- Identifying safe loans with decision treesβ’14 minutes
- Implementing binary decision treesβ’30 minutes
Out of all machine learning techniques, decision trees are amongst the most prone to overfitting. No practical implementation is possible without including approaches that mitigate this challenge. In this module, through various visualizations and investigations, you will investigate why decision trees suffer from significant overfitting problems. Using the principle of Occam's razor, you will mitigate overfitting by learning simpler trees. At first, you will design algorithms that stop the learning process before the decision trees become overly complex. In an optional segment, you will design a very practical approach that learns an overly-complex tree, and then simplifies it with pruning. Your implementation will investigate the effect of these techniques on mitigating overfitting on our real-world loan data set.
What's included
8 videos2 readings2 assignments
8 videosβ’Total 40 minutes
- A review of overfittingβ’3 minutes
- Overfitting in decision treesβ’6 minutes
- Principle of Occam's razor: Learning simpler decision treesβ’5 minutes
- Early stopping in learning decision treesβ’7 minutes
- (OPTIONAL) Motivating pruningβ’8 minutes
- (OPTIONAL) Pruning decision trees to avoid overfittingβ’6 minutes
- (OPTIONAL) Tree pruning algorithmβ’4 minutes
- Recap of overfitting and regularization in decision treesβ’1 minute
2 readingsβ’Total 20 minutes
- Slides presented in this moduleβ’10 minutes
- Decision Trees in Practiceβ’10 minutes
2 assignmentsβ’Total 60 minutes
- Preventing Overfitting in Decision Treesβ’30 minutes
- Decision Trees in Practiceβ’30 minutes
Real-world machine learning problems are fraught with missing data. That is, very often, some of the inputs are not observed for all data points. This challenge is very significant, happens in most cases, and needs to be addressed carefully to obtain great performance. And, this issue is rarely discussed in machine learning courses. In this module, you will tackle the missing data challenge head on. You will start with the two most basic techniques to convert a dataset with missing data into a clean dataset, namely skipping missing values and inputing missing values. In an advanced section, you will also design a modification of the decision tree learning algorithm that builds decisions about missing data right into the model. You will also explore these techniques in your real-data implementation.
What's included
6 videos1 reading1 assignment
6 videosβ’Total 25 minutes
- Challenge of missing dataβ’4 minutes
- Strategy 1: Purification by skipping missing dataβ’4 minutes
- Strategy 2: Purification by imputing missing dataβ’5 minutes
- Modifying decision trees to handle missing dataβ’5 minutes
- Feature split selection with missing dataβ’6 minutes
- Recap of handling missing dataβ’2 minutes
1 readingβ’Total 10 minutes
- Slides presented in this moduleβ’10 minutes
1 assignmentβ’Total 30 minutes
- Handling Missing Dataβ’30 minutes
One of the most exciting theoretical questions that have been asked about machine learning is whether simple classifiers can be combined into a highly accurate ensemble. This question lead to the developing of boosting, one of the most important and practical techniques in machine learning today. This simple approach can boost the accuracy of any classifier, and is widely used in practice, e.g., it's used by more than half of the teams who win the Kaggle machine learning competitions. In this module, you will first define the ensemble classifier, where multiple models vote on the best prediction. You will then explore a boosting algorithm called AdaBoost, which provides a great approach for boosting classifiers. Through visualizations, you will become familiar with many of the practical aspects of this techniques. You will create your very own implementation of AdaBoost, from scratch, and use it to boost the performance of your loan risk predictor on real data.
What's included
13 videos3 readings3 assignments
13 videosβ’Total 58 minutes
- The boosting questionβ’4 minutes
- Ensemble classifiersβ’5 minutes
- Boostingβ’6 minutes
- AdaBoost overviewβ’3 minutes
- Weighted errorβ’5 minutes
- Computing coefficient of each ensemble componentβ’5 minutes
- Reweighing data to focus on mistakesβ’5 minutes
- Normalizing weightsβ’2 minutes
- Example of AdaBoost in actionβ’5 minutes
- Learning boosted decision stumps with AdaBoostβ’4 minutes
- The Boosting Theoremβ’4 minutes
- Overfitting in boostingβ’6 minutes
- Ensemble methods, impact of boosting & quick recapβ’4 minutes
3 readingsβ’Total 30 minutes
- Slides presented in this moduleβ’10 minutes
- Exploring Ensemble Methodsβ’10 minutes
- Boosting a decision stumpβ’10 minutes
3 assignmentsβ’Total 90 minutes
- Exploring Ensemble Methodsβ’30 minutes
- Boostingβ’30 minutes
- Boosting a decision stumpβ’30 minutes
In many real-world settings, accuracy or error are not the best quality metrics for classification. You will explore a case-study that significantly highlights this issue: using sentiment analysis to display positive reviews on a restaurant website. Instead of accuracy, you will define two metrics: precision and recall, which are widely used in real-world applications to measure the quality of classifiers. You will explore how the probabilities output by your classifier can be used to trade-off precision with recall, and dive into this spectrum, using precision-recall curves. In your hands-on implementation, you will compute these metrics with your learned classifier on real-world sentiment analysis data.
What's included
8 videos2 readings2 assignments
8 videosβ’Total 31 minutes
- Case-study where accuracy is not best metric for classificationβ’4 minutes
- What is good performance for a classifier?β’4 minutes
- Precision: Fraction of positive predictions that are actually positiveβ’6 minutes
- Recall: Fraction of positive data predicted to be positiveβ’3 minutes
- Precision-recall extremesβ’3 minutes
- Trading off precision and recallβ’5 minutes
- Precision-recall curveβ’6 minutes
- Recap of precision-recallβ’1 minute
2 readingsβ’Total 20 minutes
- Slides presented in this moduleβ’10 minutes
- Exploring precision and recallβ’10 minutes
2 assignmentsβ’Total 60 minutes
- Precision-Recallβ’30 minutes
- Exploring precision and recallβ’30 minutes
With the advent of the internet, the growth of social media, and the embedding of sensors in the world, the magnitudes of data that our machine learning algorithms must handle have grown tremendously over the last decade. This effect is sometimes called "Big Data". Thus, our learning algorithms must scale to bigger and bigger datasets. In this module, you will develop a small modification of gradient ascent called stochastic gradient, which provides significant speedups in the running time of our algorithms. This simple change can drastically improve scaling, but makes the algorithm less stable and harder to use in practice. In this module, you will investigate the practical techniques needed to make stochastic gradient viable, and to thus to obtain learning algorithms that scale to huge datasets. You will also address a new kind of machine learning problem, online learning, where the data streams in over time, and we must learn the coefficients as the data arrives. This task can also be solved with stochastic gradient. You will implement your very own stochastic gradient ascent algorithm for logistic regression from scratch, and evaluate it on sentiment analysis data.
What's included
16 videos2 readings2 assignments
16 videosβ’Total 52 minutes
- Gradient ascent won't scale to today's huge datasetsβ’3 minutes
- Timeline of scalable machine learning & stochastic gradientβ’4 minutes
- Why gradient ascent won't scaleβ’4 minutes
- Stochastic gradient: Learning one data point at a timeβ’3 minutes
- Comparing gradient to stochastic gradientβ’4 minutes
- Why would stochastic gradient ever work?β’4 minutes
- Convergence pathsβ’2 minutes
- Shuffle data before running stochastic gradientβ’2 minutes
- Choosing step sizeβ’4 minutes
- Don't trust last coefficientsβ’2 minutes
- (OPTIONAL) Learning from batches of dataβ’4 minutes
- (OPTIONAL) Measuring convergenceβ’4 minutes
- (OPTIONAL) Adding regularizationβ’3 minutes
- The online learning taskβ’3 minutes
- Using stochastic gradient for online learningβ’4 minutes
- Scaling to huge datasets through parallelization & module recapβ’2 minutes
2 readingsβ’Total 20 minutes
- Slides presented in this moduleβ’10 minutes
- Training Logistic Regression via Stochastic Gradient Ascentβ’10 minutes
2 assignmentsβ’Total 60 minutes
- Scaling to Huge Datasets & Online Learningβ’30 minutes
- Training Logistic Regression via Stochastic Gradient Ascentβ’30 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors
Offered by
Explore more from Machine Learning
- U
University of Washington
Course
- U
University of Washington
Course
- U
University of Washington
Course
Why people choose Coursera for their career
Learner reviews
- 5 stars
76.78%
- 4 stars
18.58%
- 3 stars
3.04%
- 2 stars
0.61%
- 1 star
0.96%
Showing 3 of 3739
Reviewed on May 11, 2016
A bit easy to get through the exercises bur otherwise a very enlightening and inspiring course. - This is btw a positive review if anybody should be in doubt after taking this course :)
Reviewed on Jun 23, 2017
Great course. I learned a lot about Classification theories as well as practical issues. The assignments are very informative providing complimentary understanding to the lectures.
Reviewed on Apr 14, 2017
Extremely clear and informative. Good introduction to ML. I felt the labs could have had us write a little more of our own code, and would have been better to use non-proprietary libraries.
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you canβt afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, youβll find a link to apply on the description page.
More questions
Financial aid available,
