Data I/O and Preprocessing with Python and SQL
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Data I/O and Preprocessing with Python and SQL
This course is part of DeepLearning.AI Data Analytics Professional Certificate
Instructor: Sean Barnes
Top Instructor
5,124 already enrolled
Ask Coursera
20 reviews
Recommended experience
20 reviews
Recommended experience
What you'll learn
You’ll work with real-world data as it exists in practice: messy, unstructured, and spread across sources.
You’ll learn to extract data from websites, APIs, and databases, and clean it using both Python and SQL, an essential step in any analysis pipeline.
Skills you'll gain
Details to know
See how employees at top companies are mastering in-demand skills
Build your Data Analysis expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate from DeepLearning.AI
There are 4 modules in this course
Most real-world data isn’t clean, it’s messy, incomplete, and spread across sources like websites, APIs, and databases. In this course, you’ll learn how to collect that data, clean it, and prepare it for analysis using Python and SQL.
You’ll start by extracting data from webpages using tools like Pandas and Beautiful Soup, while also learning how to handle unstructured text and apply ethical scraping practices. Next, you’ll access real-time data through APIs, parse JSON files, and clean numerical data using techniques like normalization and binning. You’ll also learn how to manage authentication with API keys and store them securely. Finally, you’ll work with databases: Querying and joining tables using SQL, validating results, and understanding when to use SQL versus Python for different preprocessing tasks. By the end of the course, you’ll be able to turn raw, real-world data into reliable, analysis-ready inputs—a core skill for any data professional.
This module introduces techniques for acquiring data from a wide range of sources, with a focus on web scraping and text processing. You'll begin by exploring how data flows into analysis pipelines and gain hands-on experience using tools like Pandas and Beautiful Soup to extract, clean, and structure data. You'll apply text preprocessing methods to handle missing values and parse HTML. Plus, you’ll consider the ethical implications of scraping data from the web.
What's included
22 videos3 readings4 assignments1 programming assignment3 ungraded labs
22 videos•Total 81 minutes
- Welcome to this course!•5 minutes
- Generative AI in this course•2 minutes
- Module 1 introduction•1 minute
- The many sources of data•4 minutes
- Data cleaning and processing•4 minutes
- ETL and ELT•4 minutes
- Introduction to web scraping•3 minutes
- Scraping tables with Pandas•4 minutes
- String methods: replace•4 minutes
- Casting•3 minutes
- Handling missing values•5 minutes
- String methods: contains•3 minutes
- String methods: split and strip•4 minutes
- Networking•3 minutes
- Scraping webpages with requests•4 minutes
- HTML•5 minutes
- Planning HTML parsing•3 minutes
- Parsing HTML with Beautiful Soup•5 minutes
- DataFrame setup•5 minutes
- Regular expressions•4 minutes
- Writing regular expressions with LLMs•2 minutes
- The ethics of web scraping•4 minutes
3 readings•Total 13 minutes
- Join the DeepLearning.AI Forum to ask questions, get support, or share amazing ideas!•2 minutes
- Additional Web Scraping Practice•10 minutes
- Module 1 lecture notes•1 minute
4 assignments•Total 80 minutes
- Module 1 quiz•30 minutes
- Lesson 1 quiz•10 minutes
- Lesson 2 quiz•10 minutes
- Lesson 3 quiz•30 minutes
1 programming assignment•Total 90 minutes
- Analyzing Tech Industry Jobs and Companies•90 minutes
3 ungraded labs•Total 90 minutes
- Module 1 lecture code•30 minutes
- Practice Lab: Web Scraping with Pandas•30 minutes
- Practice Lab: Web Scraping with Beautiful Soup•30 minutes
This module focuses on acquiring data using APIs, as well as applying numerical cleaning techniques. You’ll learn how to retrieve data from web-based APIs, handle authentication securely, and transform raw JSON responses into usable dataframes. The module also covers techniques for cleaning and preparing numerical data, including scaling, binning, normalization, and outlier handling.
What's included
17 videos2 readings4 assignments1 programming assignment3 ungraded labs
17 videos•Total 61 minutes
- Module 2 introduction•1 minute
- Introduction to APIs•4 minutes
- JSON•5 minutes
- API requests and responses•2 minutes
- Query parameters•4 minutes
- From JSON to a dataframe•4 minutes
- Pagination•4 minutes
- Analyzing the combined DataFrame•4 minutes
- API keys•3 minutes
- Using an API key•3 minutes
- Environmental variables•3 minutes
- Scaling•4 minutes
- Binning•4 minutes
- Normalization•5 minutes
- Identifying outliers•2 minutes
- Handling outliers•5 minutes
- Data quality•4 minutes
2 readings•Total 11 minutes
- Mechanics of API keys•10 minutes
- Module 2 lecture notes•1 minute
4 assignments•Total 60 minutes
- Module 2 quiz•30 minutes
- Lesson 1 quiz•10 minutes
- Lesson 2 quiz•10 minutes
- Lesson 3 quiz•10 minutes
1 programming assignment•Total 90 minutes
- Identifying Vulnerable Communities using the U.S. Census API•90 minutes
3 ungraded labs•Total 90 minutes
- Module 2 lecture code•30 minutes
- Practice Lab: Using APIs•30 minutes
- Practice Lab: API keys and numerical cleaning•30 minutes
This module introduces the fundamentals of data storage and retrieval using databases and SQL. You’ll learn how data is structured in relational systems; explore core concepts like entities, relationships, and schemas; and gain hands-on experience writing SQL queries. You’ll also explore how to query databases from a Python notebook, as well as how generative AI tools can support SQL-based tasks.
What's included
15 videos3 readings4 assignments1 programming assignment2 ungraded labs
15 videos•Total 55 minutes
- Module 3 introduction•1 minute
- Data storage systems•4 minutes
- What is a database?•4 minutes
- Database management systems•3 minutes
- Tidy data•4 minutes
- Entities and attributes•4 minutes
- Relationships•4 minutes
- Data models and data schemas•4 minutes
- Types of tables•3 minutes
- Introduction to SQL•3 minutes
- SQL code•3 minutes
- Selecting•5 minutes
- Ordering results•5 minutes
- LLMs for databases•4 minutes
- SQL in Python•5 minutes
3 readings•Total 31 minutes
- [Optional] Practice with selecting•20 minutes
- [Optional] Practice with ordering results•10 minutes
- Module 3 lecture notes•1 minute
4 assignments•Total 65 minutes
- Module 3 quiz•30 minutes
- Lesson 1 quiz•20 minutes
- Lesson 2 quiz•10 minutes
- Lesson 3 quiz•5 minutes
1 programming assignment•Total 90 minutes
- Analyzing Movie Data with SQL•90 minutes
2 ungraded labs•Total 60 minutes
- Module 3 lecture code•30 minutes
- Practice Lab: SQLite in Python•30 minutes
In this module, you’ll expand your SQL skills into data preprocessing, validation, and joins (combining tables). You’ll learn how to use SQL for filtering, conditional logic, and handling missing values, and apply validation techniques using aggregation and grouping. The module also explores different types of joins and demonstrates how to use them to combine and analyze data across multiple tables—especially in real-world scenarios like analyzing sports performance data.
What's included
17 videos11 readings4 assignments2 programming assignments4 ungraded labs
17 videos•Total 52 minutes
- Module 4 introduction•1 minute
- SQL vs Python•2 minutes
- Filtering•4 minutes
- Filtering: Compound conditions•4 minutes
- Filtering: String-based conditions•3 minutes
- Conditionals: CASE•3 minutes
- Handling NULL values•3 minutes
- Data validation•4 minutes
- Validation: COUNT and DISTINCT•3 minutes
- Validation: GROUP BY•4 minutes
- Validation: MIN, MAX, SUM•4 minutes
- Validation: HAVING•3 minutes
- Introduction to joins•3 minutes
- Left joins•4 minutes
- Inner joins•3 minutes
- Outer joins•2 minutes
- Your next steps•1 minute
11 readings•Total 101 minutes
- [Optional] Practice with filtering•10 minutes
- [Optional] Practice with conditionals and NULL values•10 minutes
- SQL Basics: Data Creation and Modification•15 minutes
- [Optional] Practice with COUNT and DISTINCT•10 minutes
- [Optional] Practice with GROUP BY and aggregations•10 minutes
- [Optional] Practice with HAVING•10 minutes
- [Optional] Practice with LEFT JOINs•10 minutes
- Master INNER JOINs•10 minutes
- [Optional] Practice with INNER JOINs•10 minutes
- Module 4 lecture notes•1 minute
- Acknowledgments•5 minutes
4 assignments•Total 60 minutes
- Module 4 quiz•30 minutes
- Lesson 1 quiz•10 minutes
- Lesson 2 quiz•10 minutes
- Lesson 3 quiz•10 minutes
2 programming assignments•Total 180 minutes
- Deeper Analysis of the Movie Data with SQL•90 minutes
- NYC Restaurant Inspections•90 minutes
4 ungraded labs•Total 120 minutes
- Module 4 lecture code•30 minutes
- Practice Lab: Analyzing NBA games: Best Players•30 minutes
- Practice Lab: Analyzing NBA games - Validating your data•30 minutes
- Practice Lab: Analyzing NBA games: Performance per game•30 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Data Analysis
- Status: PreviewN
Northeastern University
Course
- Status: Free Trial
Course
- Status: Free TrialD
Duke University
Course
- Status: Free TrialJ
Johns Hopkins University
Course
Why people choose Coursera for their career
Learner reviews
- 5 stars
90%
- 4 stars
0%
- 3 stars
5%
- 2 stars
0%
- 1 star
5%
Showing 3 of 20
Reviewed on Jun 20, 2025
very precise. touches all relevant concepts with perfect examples. Good datasets and great evaluation.
Reviewed on Jun 27, 2025
Very broad and thorough course on data collection techniques, preprocessing, analysis, and visualization. Highly recommend.
Reviewed on Oct 22, 2025
Sean Barnes is a great teacher and his courses are terrific. How I wish his courses were available when I first decided to learn data science!
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
More questions
Financial aid available,
