![]() |
VOOZH | about |
Python provides a library called BeautifulSoup to easily allow web scraping. BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The BeautifulSoup object represents the parsed document as a whole. In this article, we'll be scraping a simple website and replacing the content in the parsed "soup" variable.
For the purpose of this article, let's create a virtual environment (venv) as it helps us to manage separate package installations for different projects and to avoid messing up with dependencies and interpreters!
More about, how to create a virtual environment can be read from here: Create a virtual environment
Navigate to your project directory and run this command to create a virtual environment named "env" in your project directory.
python3 -m venv env
Activate the "env" by typing.
source env/bin/activate
Having interpreter activated, we can see the name of an interpreter in our command line before :~$ symbol
pip install bs4
pip install requests
Output:
200
A status of 200 implies a successful request.
Output:
Replacing the content of the parsed soup obj with the ".string" method.
Output:
Thus, the title tag and heading tags have been replaced in the original soup variable.
Note: We can't push the modified page back to the website as those pages are rendered from servers where they are hosted.
Below is the complete program: