Statistics and Clustering in Python

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 University of London

👁 IBM

Statistics and Clustering in Python

This course is part of Data Science Foundations Specialization

👁 Robert Zimmer

Instructor: Robert Zimmer

2,915 already enrolled

Included with

•

Learn more

Ask Coursera

4 modules

Gain insight into a topic and learn the fundamentals.

4.5

22 reviews

Beginner level

No prior experience required

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

4.5

22 reviews

Beginner level

No prior experience required

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

In this course you will engage in a variety of mathematical and programming exercises while completing a data clustering project.

Skills you'll gain

Tools you'll learn

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

Assessments

34 assignments¹

AI Graded see disclaimer

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Data Science Foundations Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

👁 Image

There are 4 modules in this course

This course is the sixth of eight courses. This project provides an in-depth exploration of key Data Science concepts focusing on algorithm design. It enhances essential mathematics, statistics, and programming skills required for common data analysis tasks. You will engage in a variety of mathematical and programming exercises while completing a data clustering project using the K-means algorithm on a provided dataset.

This week, we will delve into the core concepts of mean, variance, and other basic statistics, laying the groundwork for a solid understanding of data analysis principles. Through hands-on exercises and demonstrations in Python and Jupyter notebooks, we'll explore practical techniques for calculating and interpreting statistical measures.

What's included

10 videos7 readings10 assignments1 peer review1 ungraded lab

10 videos•Total 38 minutes

Introduction to this course in the specialisation•2 minutes
Introduction to Mathematical Concepts of Data Clustering•2 minutes
Mean of One Dimensional Lists•2 minutes
Variance and Standard Deviation•4 minutes
Jupyter Notebooks•6 minutes
Variables•4 minutes
Lists•5 minutes
Computing the Mean•3 minutes
Better Lists: NumPy•4 minutes
Computing the Standard Deviation•6 minutes

7 readings•Total 75 minutes

Course Syllabus•10 minutes
Getting ready for this course•10 minutes
Population vs Sample, Bias•10 minutes
Variability, Standard Deviation and Bias•10 minutes
How to back-up your virtual lab work•5 minutes
Python Style Guide•10 minutes
Numpy and Array Creation•20 minutes

10 assignments•Total 84 minutes

Population vs Sample – Review Information •10 minutes
Mean of One-Dimensional Lists – Review Information•3 minutes
Variance and Standard Deviation – Review Information•3 minutes
Jupyter Notebooks – Review Information•5 minutes
Variables – Review Information •5 minutes
Lists – Review Information •5 minutes
Computing the Mean – Review Information•3 minutes
Better Lists – Review Information •5 minutes
Computing the Standard Deviation – Review Information•5 minutes
Week 1 Summative Assessment•40 minutes

1 peer review•Total 30 minutes

Use Jupyter Notebooks•30 minutes

1 ungraded lab•Total 15 minutes

Jupyter Notebook Environment•15 minutes

This week, we will explore mathematics for multidimensional data. You will also learn how to work with multidimensional data in Python.

What's included

14 videos10 readings14 assignments

14 videos•Total 52 minutes

Multidimensional Data Points and Features•2 minutes
Multidimensional Mean•3 minutes
Dispersion: Multidimensional Variables•3 minutes
Distance Metrics•5 minutes
Normalisation•1 minute
Outliers•1 minute
Basic Plotting•3 minutes
Storing 2D Coordinates in a Single Data Structure•6 minutes
Multidimensional Mean•5 minutes
Adding Graphical Overlays•6 minutes
Calculating the Distance to the Mean•4 minutes
List Comprehension•4 minutes
Normalisation in Python•6 minutes
Outliers and Plotting Normalised Data•3 minutes

10 readings•Total 120 minutes

Multidimensional Data Points and Features Recap•10 minutes
Multidimensional Mean Recap•10 minutes
Multidimensional Variables Recap•10 minutes
Distance Metrics Recap•10 minutes
Normalisation Recap•10 minutes
Note on Matplotlib•10 minutes
Matplotlib Scatter Plot Documentation•20 minutes
Matplotlib Patches Documentation•10 minutes
List Comprehension Documentation•20 minutes
Errata•10 minutes

14 assignments•Total 110 minutes

Multidimensional Data Points and Features – Review Information•3 minutes
Multidimensional Mean – Review Information•3 minutes
Dispersion: Multidimensional Variables – Review Information•5 minutes
Distance Metrics – Review Information•10 minutes
Normalisation – Review Information•5 minutes
Outliers – Review Information•5 minutes
Basic Plotting – Review Information•6 minutes
Storing 2D Coordinates – Review Information•5 minutes
Multidimensional Mean – Review Information•5 minutes
Adding Graphical Overlays – Review Information•10 minutes
Calculating Distance – Review Information•5 minutes
Normalisation in Python – Review Information•5 minutes
Outliers – Review Information•3 minutes
Week 2 Summative Assessment•40 minutes

This week, we will explore data manipulation and visualisation with Python's Pandas library. We will dive deep into the versatile capabilities of Pandas, empowering you to efficiently manipulate, analyse, and interpret data.

What's included

6 videos6 readings7 assignments1 peer review

6 videos•Total 36 minutes

Using the Pandas Library to Read csv Files•5 minutes
Sorting and Filtering Data Using Pandas•8 minutes
Labelling Points on a Graph•4 minutes
Labelling all the Points on a Graph•3 minutes
Eyeballing the Data•6 minutes
Using K-Means to Interpret the Data•9 minutes

6 readings•Total 60 minutes

Code Resources•5 minutes
Pandas Read_CSV Function•15 minutes
More Pandas Library Documentation•10 minutes
The Pyplot Text Function•10 minutes
For Loops in Python•10 minutes
Documentation for sklearn.cluster.KMeans•10 minutes

7 assignments•Total 68 minutes

Using the Pandas Library to Read csv Files – Review Information•5 minutes
Sorting and Filtering Data Using Pandas – Review Information•5 minutes
Labelling Points on a Graph – Review Information•5 minutes
Labelling all the Points on a Graph – Review Information•5 minutes
Eyeballing the Data – Review Information•5 minutes
Using K-Means to Interpret the Data – Review Information•3 minutes
Week 3 Summative Assessment•40 minutes

1 peer review•Total 60 minutes

Create a Labelled Plot of the Happiness Data•60 minutes

This week, we will embark on a journey through the fascinating world of unsupervised learning, where patterns emerge from data without explicit guidance. You will implement the K-means algorithm to solve a real-world problem.

What's included

8 videos3 readings3 assignments3 peer reviews5 discussion prompts

8 videos•Total 28 minutes

Can a Machine Detect Fake Notes?•2 minutes
Working for a Client•5 minutes
How to Organize Work on Your Project•4 minutes
Dealing With Difficulties•3 minutes
No Data no Data Science: Introduction of the Dataset•5 minutes
Modelling•5 minutes
Presenting the Project Results•3 minutes
End of course•1 minute

3 readings•Total 25 minutes

Week 4 Code Resource – the Dataset for our Project•10 minutes
Saving plt.scatter Outputs as Figures•10 minutes
Additional Recommended Reading for Week 4•5 minutes

3 assignments•Total 22 minutes

How Would You Help? – Review Information•2 minutes
Python – Review Information•5 minutes
Week 4 Summative Assessment •15 minutes

3 peer reviews•Total 180 minutes

Exploratory Data Analysis•60 minutes
Clustering•60 minutes
Your Report•60 minutes

5 discussion prompts•Total 70 minutes

What Is Required to Train a Machine to Detect Fake Notes?•10 minutes
Your Project Plan•30 minutes
Self-reflection•10 minutes
Tips for Other Learners•10 minutes
Do You have Data Science Plans?•10 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

👁 Robert Zimmer

Robert Zimmer

University of London

5 Courses•20,995 learners

Offered by

Explore more from Data Analysis

👁 Image
P
Packt
Cluster Analysis and Unsupervised Machine Learning in Python
Course
👁 Image
Status: Preview
U
University of London
Foundations of Data Science: K-Means Clustering in Python
Course
👁 Image
Status: Free Trial
U
University of Colorado Boulder
Data Analysis with Python Project
Course
👁 Image
Status: Preview
E
EDUCBA
R: Apply & Analyze K-Means Clustering for Unsupervised ML
Course

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/statistics-and-clustering-in-python