Statistics and Clustering in Python
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Statistics and Clustering in Python
This course is part of Data Science Foundations Specialization
Instructor: Robert Zimmer
2,915 already enrolled
Included with
Learn more
Ask Coursera
22 reviews
22 reviews
What you'll learn
In this course you will engage in a variety of mathematical and programming exercises while completing a data clustering project.
Skills you'll gain
Tools you'll learn
Details to know
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 4 modules in this course
This course is the sixth of eight courses. This project provides an in-depth exploration of key Data Science concepts focusing on algorithm design. It enhances essential mathematics, statistics, and programming skills required for common data analysis tasks. You will engage in a variety of mathematical and programming exercises while completing a data clustering project using the K-means algorithm on a provided dataset.
This week, we will delve into the core concepts of mean, variance, and other basic statistics, laying the groundwork for a solid understanding of data analysis principles. Through hands-on exercises and demonstrations in Python and Jupyter notebooks, we'll explore practical techniques for calculating and interpreting statistical measures.
What's included
10 videos7 readings10 assignments1 peer review1 ungraded lab
10 videosβ’Total 38 minutes
- Introduction to this course in the specialisationβ’2 minutes
- Introduction to Mathematical Concepts of Data Clusteringβ’2 minutes
- Mean of One Dimensional Listsβ’2 minutes
- Variance and Standard Deviationβ’4 minutes
- Jupyter Notebooksβ’6 minutes
- Variablesβ’4 minutes
- Listsβ’5 minutes
- Computing the Meanβ’3 minutes
- Better Lists: NumPyβ’4 minutes
- Computing the Standard Deviationβ’6 minutes
7 readingsβ’Total 75 minutes
- Course Syllabusβ’10 minutes
- Getting ready for this courseβ’10 minutes
- Population vs Sample, Biasβ’10 minutes
- Variability, Standard Deviation and Biasβ’10 minutes
- How to back-up your virtual lab workβ’5 minutes
- Python Style Guideβ’10 minutes
- Numpy and Array Creationβ’20 minutes
10 assignmentsβ’Total 84 minutes
- Population vs Sample β Review Information β’10 minutes
- Mean of One-Dimensional Lists β Review Informationβ’3 minutes
- Variance and Standard Deviation β Review Informationβ’3 minutes
- Jupyter Notebooks β Review Informationβ’5 minutes
- Variables β Review Information β’5 minutes
- Lists β Review Information β’5 minutes
- Computing the Mean β Review Informationβ’3 minutes
- Better Lists β Review Information β’5 minutes
- Computing the Standard Deviation β Review Informationβ’5 minutes
- Week 1 Summative Assessmentβ’40 minutes
1 peer reviewβ’Total 30 minutes
- Use Jupyter Notebooksβ’30 minutes
1 ungraded labβ’Total 15 minutes
- Jupyter Notebook Environmentβ’15 minutes
This week, we will explore mathematics for multidimensional data. You will also learn how to work with multidimensional data in Python.
What's included
14 videos10 readings14 assignments
14 videosβ’Total 52 minutes
- Multidimensional Data Points and Featuresβ’2 minutes
- Multidimensional Meanβ’3 minutes
- Dispersion: Multidimensional Variablesβ’3 minutes
- Distance Metricsβ’5 minutes
- Normalisationβ’1 minute
- Outliersβ’1 minute
- Basic Plottingβ’3 minutes
- Storing 2D Coordinates in a Single Data Structureβ’6 minutes
- Multidimensional Meanβ’5 minutes
- Adding Graphical Overlaysβ’6 minutes
- Calculating the Distance to the Meanβ’4 minutes
- List Comprehensionβ’4 minutes
- Normalisation in Pythonβ’6 minutes
- Outliers and Plotting Normalised Dataβ’3 minutes
10 readingsβ’Total 120 minutes
- Multidimensional Data Points and Features Recapβ’10 minutes
- Multidimensional Mean Recapβ’10 minutes
- Multidimensional Variables Recapβ’10 minutes
- Distance Metrics Recapβ’10 minutes
- Normalisation Recapβ’10 minutes
- Note on Matplotlibβ’10 minutes
- Matplotlib Scatter Plot Documentationβ’20 minutes
- Matplotlib Patches Documentationβ’10 minutes
- List Comprehension Documentationβ’20 minutes
- Errataβ’10 minutes
14 assignmentsβ’Total 110 minutes
- Multidimensional Data Points and Features β Review Informationβ’3 minutes
- Multidimensional Mean β Review Informationβ’3 minutes
- Dispersion: Multidimensional Variables β Review Informationβ’5 minutes
- Distance Metrics β Review Informationβ’10 minutes
- Normalisation β Review Informationβ’5 minutes
- Outliers β Review Informationβ’5 minutes
- Basic Plotting β Review Informationβ’6 minutes
- Storing 2D Coordinates β Review Informationβ’5 minutes
- Multidimensional Mean β Review Informationβ’5 minutes
- Adding Graphical Overlays β Review Informationβ’10 minutes
- Calculating Distance β Review Informationβ’5 minutes
- Normalisation in Python β Review Informationβ’5 minutes
- Outliers β Review Informationβ’3 minutes
- Week 2 Summative Assessmentβ’40 minutes
This week, we will explore data manipulation and visualisation with Python's Pandas library. We will dive deep into the versatile capabilities of Pandas, empowering you to efficiently manipulate, analyse, and interpret data.
What's included
6 videos6 readings7 assignments1 peer review
6 videosβ’Total 36 minutes
- Using the Pandas Library to Read csv Filesβ’5 minutes
- Sorting and Filtering Data Using Pandasβ’8 minutes
- Labelling Points on a Graphβ’4 minutes
- Labelling all the Points on a Graphβ’3 minutes
- Eyeballing the Dataβ’6 minutes
- Using K-Means to Interpret the Dataβ’9 minutes
6 readingsβ’Total 60 minutes
- Code Resourcesβ’5 minutes
- Pandas Read_CSV Functionβ’15 minutes
- More Pandas Library Documentationβ’10 minutes
- The Pyplot Text Functionβ’10 minutes
- For Loops in Pythonβ’10 minutes
- Documentation for sklearn.cluster.KMeansβ’10 minutes
7 assignmentsβ’Total 68 minutes
- Using the Pandas Library to Read csv Files β Review Informationβ’5 minutes
- Sorting and Filtering Data Using Pandas β Review Informationβ’5 minutes
- Labelling Points on a Graph β Review Informationβ’5 minutes
- Labelling all the Points on a Graph β Review Informationβ’5 minutes
- Eyeballing the Data β Review Informationβ’5 minutes
- Using K-Means to Interpret the Data β Review Informationβ’3 minutes
- Week 3 Summative Assessmentβ’40 minutes
1 peer reviewβ’Total 60 minutes
- Create a Labelled Plot of the Happiness Dataβ’60 minutes
This week, we will embark on a journey through the fascinating world of unsupervised learning, where patterns emerge from data without explicit guidance. You will implement the K-means algorithm to solve a real-world problem.
What's included
8 videos3 readings3 assignments3 peer reviews5 discussion prompts
8 videosβ’Total 28 minutes
- Can a Machine Detect Fake Notes?β’2 minutes
- Working for a Clientβ’5 minutes
- How to Organize Work on Your Projectβ’4 minutes
- Dealing With Difficultiesβ’3 minutes
- No Data no Data Science: Introduction of the Datasetβ’5 minutes
- Modellingβ’5 minutes
- Presenting the Project Resultsβ’3 minutes
- End of courseβ’1 minute
3 readingsβ’Total 25 minutes
- Week 4 Code Resource β the Dataset for our Projectβ’10 minutes
- Saving plt.scatter Outputs as Figuresβ’10 minutes
- Additional Recommended Reading for Week 4β’5 minutes
3 assignmentsβ’Total 22 minutes
- How Would You Help? β Review Informationβ’2 minutes
- Python β Review Informationβ’5 minutes
- Week 4 Summative Assessment β’15 minutes
3 peer reviewsβ’Total 180 minutes
- Exploratory Data Analysisβ’60 minutes
- Clusteringβ’60 minutes
- Your Reportβ’60 minutes
5 discussion promptsβ’Total 70 minutes
- What Is Required to Train a Machine to Detect Fake Notes?β’10 minutes
- Your Project Planβ’30 minutes
- Self-reflectionβ’10 minutes
- Tips for Other Learnersβ’10 minutes
- Do You have Data Science Plans?β’10 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Explore more from Data Analysis
- Status: PreviewU
University of London
Course
- Status: Free TrialU
University of Colorado Boulder
Course
- Status: Preview
Why people choose Coursera for their career
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you canβt afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, youβll find a link to apply on the description page.
More questions
Financial aid available,
ΒΉ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.
