Machine Learning for Data Analysis
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Machine Learning for Data Analysis
This course is part of Data Analysis and Interpretation Specialization
Instructors: Jen Rose
47,202 already enrolled
Included with
Learn more
Ask Coursera
328 reviews
328 reviews
Skills you'll gain
- Machine Learning Algorithms
- Statistical Machine Learning
- Machine Learning Methods
- Model Evaluation
- Exploratory Data Analysis
- Predictive Analytics
- Data Analysis
- Random Forest Algorithm
- Feature Engineering
- Unsupervised Learning
- Classification And Regression Tree (CART)
- Regression Analysis
- Decision Tree Learning
- Applied Machine Learning
- Predictive Modeling
- Machine Learning
Tools you'll learn
Details to know
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 4 modules in this course
Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.
In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable.
What's included
7 videos15 readings1 peer review
7 videosβ’Total 40 minutes
- What Is Machine Learning?β’2 minutes
- Machine Learning and the Bias Variance Trade-Offβ’6 minutes
- What Is a Decision Tree?β’5 minutes
- What is the Process of Growing a Decision Tree?β’4 minutes
- Building a Decision Tree with SASβ’9 minutes
- Strengths and Weaknesses of Decision Trees in SASβ’4 minutes
- Building a Decision Tree with Pythonβ’9 minutes
15 readingsβ’Total 150 minutes
- Some Guidance for Learners New to the Specializationβ’10 minutes
- SAS or Python - Which to Choose?β’10 minutes
- Getting Started with SASβ’10 minutes
- Getting Started with Pythonβ’10 minutes
- Course Codebooksβ’10 minutes
- Course Data Setsβ’10 minutes
- Uploading Your Own Data to SASβ’10 minutes
- Data Set for Decision Tree Videos (tree_addhealth.csv)β’10 minutes
- SAS Code: Decision Treesβ’10 minutes
- CART Paper - Prevention Scienceβ’10 minutes
- Python Code: Decision Treesβ’10 minutes
- Installing Graphviz and pydotplusβ’10 minutes
- Getting Set up for Assignmentsβ’10 minutes
- Tumblr Instructionsβ’10 minutes
- Assignment Exampleβ’10 minutes
1 peer reviewβ’Total 60 minutes
- Running a Classification Treeβ’60 minutes
In this session, you will learn about random forests, a type of data mining algorithm that can select from among a large number of variables those that are most important in determining the target or response variable to be explained. Unlike decision trees, the results of random forests generalize well to new data.
What's included
4 videos4 readings1 peer review
4 videosβ’Total 25 minutes
- What Is A Random Forest and How Is It "Grown"?β’4 minutes
- Building a Random Forest with SASβ’7 minutes
- Building a Random Forest with Pythonβ’6 minutes
- Validation and Cross-Validationβ’8 minutes
4 readingsβ’Total 40 minutes
- SAS code: Random Forestsβ’10 minutes
- The HPForest Procedure in SASβ’10 minutes
- Python Code: Random Forestsβ’10 minutes
- Assignment Exampleβ’10 minutes
1 peer reviewβ’Total 60 minutes
- Running a Random Forestβ’60 minutes
Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. In this session, you will apply and interpret a lasso regression analysis. You will also develop experience using k-fold cross validation to select the best fitting model and obtain a more accurate estimate of your modelβs test error rate. To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you havenβt already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximize your experience with lasso regression analysis. Remember that lasso regression is a machine learning method, so your choice of additional predictors does not necessarily need to depend on a research hypothesis or theory. Take some chances, and try some new variables. The lasso regression analysis will help you determine which of your predictors are most important. Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets. The cross-validation method you apply is designed to eliminate the need to split your data when you have a limited number of observations.
What's included
5 videos3 readings1 peer review
5 videosβ’Total 32 minutes
- What is Lasso Regression?β’5 minutes
- Testing a Lasso Regression with SASβ’10 minutes
- Data Management for Lasso Regression in Pythonβ’4 minutes
- Testing a Lasso Regression Model in Pythonβ’11 minutes
- Lasso Regression Limitationsβ’2 minutes
3 readingsβ’Total 30 minutes
- SAS Code: Lasso Regressionβ’10 minutes
- Python Code: Lasso Regressionβ’10 minutes
- Assignment Exampleβ’10 minutes
1 peer reviewβ’Total 60 minutes
- Running a Lasso Regression Analysisβ’60 minutes
Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis. You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.
What's included
6 videos3 readings1 peer review
6 videosβ’Total 42 minutes
- What Is a k-Means Cluster Analysis?β’7 minutes
- Running a k-Means Cluster Analysis in SAS, pt. 1β’8 minutes
- Running a k-Means Cluster Analysis in SAS, pt. 2β’6 minutes
- Running a k-Means Cluster Analysis in Python, pt. 1β’8 minutes
- Running a k-Means Cluster Analysis in Python, pt. 2β’10 minutes
- k-Means Cluster Analysis Limitationsβ’3 minutes
3 readingsβ’Total 30 minutes
- SAS Code: k-Means Cluster Analysisβ’10 minutes
- Python Code: k-Means Cluster Analysisβ’10 minutes
- Assignment Exampleβ’10 minutes
1 peer reviewβ’Total 60 minutes
- Running a k-means Cluster Analysisβ’60 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors
Offered by
Explore more from Machine Learning
- Status: Free TrialU
University of Glasgow
Course
- Status: PreviewT
The University of Chicago
Course
- Status: Free Trial
Course
- Status: PreviewO
O.P. Jindal Global University
Course
Why people choose Coursera for their career
Learner reviews
- 5 stars
56.70%
- 4 stars
25.60%
- 3 stars
7.62%
- 2 stars
3.96%
- 1 star
6.09%
Showing 3 of 328
Reviewed on Mar 21, 2016
More examples in coding and results are expected. So it is more convenient for students to compare different results and understand deeper
Reviewed on Oct 4, 2016
Very good course. I recommend to anyone who's interested in data analysis and machine learning.
Reviewed on Apr 26, 2020
Since it is a part of a specialization, the topics start somewhere in between and is only recommended for those who have completed the previous courses with in these specialization.
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you canβt afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, youβll find a link to apply on the description page.
More questions
Financial aid available,
