VOOZH about

URL: https://www.coursera.org/learn/data-io-and-preprocessing-with-python-and-sql

⇱ Data I/O and Preprocessing with Python and SQL | Coursera


Data I/O and Preprocessing with Python and SQL

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

Data I/O and Preprocessing with Python and SQL

Instructor: Sean Barnes

Top Instructor

5,124 already enrolled

Ask Coursera

Gain insight into a topic and learn the fundamentals.
4.7

20 reviews

Beginner level

Recommended experience

3 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

Gain insight into a topic and learn the fundamentals.
4.7

20 reviews

Beginner level

Recommended experience

3 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • You’ll work with real-world data as it exists in practice: messy, unstructured, and spread across sources.

  • You’ll learn to extract data from websites, APIs, and databases, and clean it using both Python and SQL, an essential step in any analysis pipeline.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

16 assignments

Taught in English

Build your Data Analysis expertise

This course is part of the DeepLearning.AI Data Analytics Professional Certificate
When you enroll in this course, you'll also be enrolled in this Professional Certificate.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate from DeepLearning.AI

There are 4 modules in this course

Most real-world data isn’t clean, it’s messy, incomplete, and spread across sources like websites, APIs, and databases. In this course, you’ll learn how to collect that data, clean it, and prepare it for analysis using Python and SQL.

You’ll start by extracting data from webpages using tools like Pandas and Beautiful Soup, while also learning how to handle unstructured text and apply ethical scraping practices. Next, you’ll access real-time data through APIs, parse JSON files, and clean numerical data using techniques like normalization and binning. You’ll also learn how to manage authentication with API keys and store them securely. Finally, you’ll work with databases: Querying and joining tables using SQL, validating results, and understanding when to use SQL versus Python for different preprocessing tasks. By the end of the course, you’ll be able to turn raw, real-world data into reliable, analysis-ready inputs—a core skill for any data professional.

This module introduces techniques for acquiring data from a wide range of sources, with a focus on web scraping and text processing. You'll begin by exploring how data flows into analysis pipelines and gain hands-on experience using tools like Pandas and Beautiful Soup to extract, clean, and structure data. You'll apply text preprocessing methods to handle missing values and parse HTML. Plus, you’ll consider the ethical implications of scraping data from the web.

What's included

22 videos3 readings4 assignments1 programming assignment3 ungraded labs

22 videosTotal 81 minutes
  • Welcome to this course!5 minutes
  • Generative AI in this course2 minutes
  • Module 1 introduction1 minute
  • The many sources of data4 minutes
  • Data cleaning and processing4 minutes
  • ETL and ELT4 minutes
  • Introduction to web scraping3 minutes
  • Scraping tables with Pandas4 minutes
  • String methods: replace4 minutes
  • Casting3 minutes
  • Handling missing values5 minutes
  • String methods: contains3 minutes
  • String methods: split and strip4 minutes
  • Networking3 minutes
  • Scraping webpages with requests4 minutes
  • HTML5 minutes
  • Planning HTML parsing3 minutes
  • Parsing HTML with Beautiful Soup5 minutes
  • DataFrame setup5 minutes
  • Regular expressions4 minutes
  • Writing regular expressions with LLMs2 minutes
  • The ethics of web scraping4 minutes
3 readingsTotal 13 minutes
  • Join the DeepLearning.AI Forum to ask questions, get support, or share amazing ideas!2 minutes
  • Additional Web Scraping Practice10 minutes
  • Module 1 lecture notes1 minute
4 assignmentsTotal 80 minutes
  • Module 1 quiz30 minutes
  • Lesson 1 quiz10 minutes
  • Lesson 2 quiz10 minutes
  • Lesson 3 quiz30 minutes
1 programming assignmentTotal 90 minutes
  • Analyzing Tech Industry Jobs and Companies90 minutes
3 ungraded labsTotal 90 minutes
  • Module 1 lecture code30 minutes
  • Practice Lab: Web Scraping with Pandas30 minutes
  • Practice Lab: Web Scraping with Beautiful Soup30 minutes

This module focuses on acquiring data using APIs, as well as applying numerical cleaning techniques. You’ll learn how to retrieve data from web-based APIs, handle authentication securely, and transform raw JSON responses into usable dataframes. The module also covers techniques for cleaning and preparing numerical data, including scaling, binning, normalization, and outlier handling.

What's included

17 videos2 readings4 assignments1 programming assignment3 ungraded labs

17 videosTotal 61 minutes
  • Module 2 introduction1 minute
  • Introduction to APIs4 minutes
  • JSON5 minutes
  • API requests and responses2 minutes
  • Query parameters4 minutes
  • From JSON to a dataframe4 minutes
  • Pagination4 minutes
  • Analyzing the combined DataFrame4 minutes
  • API keys3 minutes
  • Using an API key3 minutes
  • Environmental variables3 minutes
  • Scaling4 minutes
  • Binning4 minutes
  • Normalization5 minutes
  • Identifying outliers2 minutes
  • Handling outliers5 minutes
  • Data quality4 minutes
2 readingsTotal 11 minutes
  • Mechanics of API keys10 minutes
  • Module 2 lecture notes1 minute
4 assignmentsTotal 60 minutes
  • Module 2 quiz30 minutes
  • Lesson 1 quiz10 minutes
  • Lesson 2 quiz10 minutes
  • Lesson 3 quiz10 minutes
1 programming assignmentTotal 90 minutes
  • Identifying Vulnerable Communities using the U.S. Census API90 minutes
3 ungraded labsTotal 90 minutes
  • Module 2 lecture code30 minutes
  • Practice Lab: Using APIs30 minutes
  • Practice Lab: API keys and numerical cleaning30 minutes

This module introduces the fundamentals of data storage and retrieval using databases and SQL. You’ll learn how data is structured in relational systems; explore core concepts like entities, relationships, and schemas; and gain hands-on experience writing SQL queries. You’ll also explore how to query databases from a Python notebook, as well as how generative AI tools can support SQL-based tasks.

What's included

15 videos3 readings4 assignments1 programming assignment2 ungraded labs

15 videosTotal 55 minutes
  • Module 3 introduction1 minute
  • Data storage systems4 minutes
  • What is a database?4 minutes
  • Database management systems3 minutes
  • Tidy data4 minutes
  • Entities and attributes4 minutes
  • Relationships4 minutes
  • Data models and data schemas4 minutes
  • Types of tables3 minutes
  • Introduction to SQL3 minutes
  • SQL code3 minutes
  • Selecting5 minutes
  • Ordering results5 minutes
  • LLMs for databases4 minutes
  • SQL in Python5 minutes
3 readingsTotal 31 minutes
  • [Optional] Practice with selecting20 minutes
  • [Optional] Practice with ordering results10 minutes
  • Module 3 lecture notes1 minute
4 assignmentsTotal 65 minutes
  • Module 3 quiz30 minutes
  • Lesson 1 quiz20 minutes
  • Lesson 2 quiz10 minutes
  • Lesson 3 quiz5 minutes
1 programming assignmentTotal 90 minutes
  • Analyzing Movie Data with SQL90 minutes
2 ungraded labsTotal 60 minutes
  • Module 3 lecture code30 minutes
  • Practice Lab: SQLite in Python30 minutes

In this module, you’ll expand your SQL skills into data preprocessing, validation, and joins (combining tables). You’ll learn how to use SQL for filtering, conditional logic, and handling missing values, and apply validation techniques using aggregation and grouping. The module also explores different types of joins and demonstrates how to use them to combine and analyze data across multiple tables—especially in real-world scenarios like analyzing sports performance data.

What's included

17 videos11 readings4 assignments2 programming assignments4 ungraded labs

17 videosTotal 52 minutes
  • Module 4 introduction1 minute
  • SQL vs Python2 minutes
  • Filtering4 minutes
  • Filtering: Compound conditions4 minutes
  • Filtering: String-based conditions3 minutes
  • Conditionals: CASE3 minutes
  • Handling NULL values3 minutes
  • Data validation4 minutes
  • Validation: COUNT and DISTINCT3 minutes
  • Validation: GROUP BY4 minutes
  • Validation: MIN, MAX, SUM4 minutes
  • Validation: HAVING3 minutes
  • Introduction to joins3 minutes
  • Left joins4 minutes
  • Inner joins3 minutes
  • Outer joins2 minutes
  • Your next steps1 minute
11 readingsTotal 101 minutes
  • [Optional] Practice with filtering10 minutes
  • [Optional] Practice with conditionals and NULL values10 minutes
  • SQL Basics: Data Creation and Modification15 minutes
  • [Optional] Practice with COUNT and DISTINCT10 minutes
  • [Optional] Practice with GROUP BY and aggregations10 minutes
  • [Optional] Practice with HAVING10 minutes
  • [Optional] Practice with LEFT JOINs10 minutes
  • Master INNER JOINs10 minutes
  • [Optional] Practice with INNER JOINs10 minutes
  • Module 4 lecture notes1 minute
  • Acknowledgments5 minutes
4 assignmentsTotal 60 minutes
  • Module 4 quiz30 minutes
  • Lesson 1 quiz10 minutes
  • Lesson 2 quiz10 minutes
  • Lesson 3 quiz10 minutes
2 programming assignmentsTotal 180 minutes
  • Deeper Analysis of the Movie Data with SQL90 minutes
  • NYC Restaurant Inspections90 minutes
4 ungraded labsTotal 120 minutes
  • Module 4 lecture code30 minutes
  • Practice Lab: Analyzing NBA games: Best Players30 minutes
  • Practice Lab: Analyzing NBA games - Validating your data30 minutes
  • Practice Lab: Analyzing NBA games: Performance per game30 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Instructor ratings
5.0 (6 ratings)

Top Instructor

DeepLearning.AI
5 Courses49,517 learners

Explore more from Data Analysis

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
👁 Image

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
👁 Image

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

  • 5 stars

    90%

  • 4 stars

    0%

  • 3 stars

    5%

  • 2 stars

    0%

  • 1 star

    5%

Showing 3 of 20

NR
·

Reviewed on Jun 20, 2025

very precise. touches all relevant concepts with perfect examples. Good datasets and great evaluation.

MN
·

Reviewed on Jun 27, 2025

Very broad and thorough course on data collection techniques, preprocessing, analysis, and visualization. Highly recommend.

CC
·

Reviewed on Oct 22, 2025

Sean Barnes is a great teacher and his courses are terrific. How I wish his courses were available when I first decided to learn data science!

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Financial aid available,