VOOZH about

URL: https://www.geeksforgeeks.org/python/how-to-use-xpath-with-beautifulsoup/

โ‡ฑ How to use Xpath with BeautifulSoup ? - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

How to use Xpath with BeautifulSoup ?

Last Updated : 12 Apr, 2025

We have an HTML page and our task is to extract specific elements using XPath, which BeautifulSoup doesn't support directly. For example, if we want to extract the heading from the Wikipedia page on Nike, we canโ€™t do it with just BeautifulSoup, but with a mix of lxml and etree, we can. This article explains how to use XPath with BeautifulSoup by leveraging the lxml module.

Required Libraries

Before we start, install the following Python libraries:

pip install requests
pip install beautifulsoup4
pip install lxml

  • requests: Fetches the HTML content of a webpage.
  • bs4: Parses HTML using BeautifulSoup.
  • lxml: Enables XPath support and integrates with BeautifulSoup.

Understanding XPath

XPath works very much like a traditional file system.

๐Ÿ‘ Image

To access file 1,

C:/File1

Similarly, To access file 2,

C:/Documents/User1/File2

Similarly in HTML:

//div[@id="content"]/h1 #(Finds an <h1> tag inside a <div> with id="content")

With XPath, we can target tags, IDs, classes, text values or even element positions.

Why BeautifulSoup Alone Isnโ€™t Enough

BeautifulSoup supports CSS selectors like .find() and .select(), but not XPath.

To use XPath:

  1. Parse the page with BeautifulSoup.
  2. Convert it into an lxml.etree object.
  3. Use .xpath() to extract data.

This hybrid approach gives you the flexibility of XPath with the simplicity of BeautifulSoup.

To find the XPath for a particular element on a page:

  • Right-click the element in the page and click on Inspect.
  • Right-click on the element in the Elements Tab.
  • Click on copy XPath.
๐Ÿ‘ Image

Approach

  • Import the required modules
  • Send a request to the target webpage
  • Parse the HTML with BeautifulSoup
  • Convert it to lxml.etree format
  • Use .xpath() to extract elements

Note: If XPath is not giving you the desired result then copy the full XPath instead of XPath and the rest other steps would be the same.

Given below is an example to show how Xpath can be used with Beautifulsoup

Example: Extracting the title from the Wikipedia page for Nike using XPath.

Output:

Nike, Inc.

Explanation:

  • etree.HTML(str(soup)): Converts BeautifulSoup object to an XPath-compatible structure.
  • .xpath(...): Executes the XPath query.
  • [0].text: Retrieves the text of the first matching element.
Comment
Article Tags: