In this article, we are going to extract JSON from HTML using BeautifulSoup in Python.
Module needed
- : Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- : Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Import all the required modules.
- Pass the URL in the get function(UDF) so that it will pass a GET request to a URL, and it will return a response.
Syntax: requests.get(url, args)
- Now Parse the HTML content using bs4.
Syntax: BeautifulSoup(page.text, 'html.parser')
Parameters:
- page.text : It is the raw HTML content.
- html.parser : Specifying the HTML parser we want to use.
- Now get all the required data with find() function.
Now find the customer list with li, a, p tag where some unique class or id. You can open the webpage in the browser and inspect the relevant element by pressing right-click as shown in the figure.
👁 Image- Create a Json file and use json.dump() method to convert python objects into appropriate JSON objects.
Below is the full implementation:
Output:
Created Json File
Our JSON file output:
👁 Image