BeautifulSoup4 Module - Python

Last Updated : 23 Jul, 2025

BeautifulSoup4 is a user-friendly Python library designed for parsing HTML and XML documents. It simplifies the process of web scraping by allowing developers to effortlessly navigate, search and modify the parse tree of a webpage. With BeautifulSoup4, we can extract specific elements, attributes and text from complex web pages using intuitive methods. This library abstracts away the complexities of HTML and XML structures, enabling us to focus on retrieving and processing the data we need. BeautifulSoup4 supports multiple parsers (like Python’s built-in html.parser, lxml, and html5lib), giving us the flexibility to choose the best tool for our task. Whether we’re gathering data for research, automating data extraction or building web applications.

For example:

Output:

<title>Test Page</title>

Explanation:

BeautifulSoup() function parses the provided HTML content.
Accessing soup.title retrieves the <title> tag from the HTML.

Table of Content

Importing BeautifulSoup4

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

Parameters :

html_doc is a string containing the HTML or XML content to be parsed.
'html.parser' is the parser to use. (Alternatives include 'lxml' or 'html5lib'.)

Return Type : Returns a BeautifulSoup object that represents the parsed document.

Parsing HTML with BeautifulSoup4

BeautifulSoup4 converts raw HTML content into a navigable parse tree.

Output:

Welcome to BeautifulSoup4

Explanation:

find() method searches for the first <h1> tag in the document.
Printing header.text outputs the text content of the <h1> tag.

Extracting Data with BeautifulSoup4

BeautifulSoup4 offers methods like find_all() to extract multiple elements from an HTML document.

Output:

Item 1
Item 2
Item 3

Explanation:

find_all() method retrieves all <li> elements.
Iterating through the returned list prints the text of each list item.

Navigating the Parse Tree with BeautifulSoup4

Beyond simple extraction, BeautifulSoup4 allows you to traverse the document structure using attributes like .parent, .children and .siblings.

Output:

Parent tag: html

Explanation: .parent attribute returns the immediate parent of the found tag, allowing you to traverse upwards in the DOM tree.

Using CSS Selectors with BeautifulSoup4

select() method lets you search for elements using CSS selector syntax.

Output:

Info Paragraph 1
Info Paragraph 2

Explanation:

CSS selector 'div#main p.info' locates all <p> tags with class "info" that are descendants of the <div> with id "main".
select() method returns a list of matching elements.

Comment

Article Tags:

Python

Python BeautifulSoup

Explore

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Python Courses

URL: https://www.geeksforgeeks.org/python/beautifulsoup4-module-python/