![]() |
VOOZH | about |
GitHub is where developers shape the future of software, together, contribute to the open-source community, manage Git repositories, etc. It is one of the most used tools by a developer and its profile is shared to showcase or let others contribute to its projects. Web Scraping using python is also one of the best methods to get data.
In this article, we will create an API to fetch a user's profile image and its followers. Following is the flow in which this blog would guide to create an API:
Step 1: Create a folder (eg. GitHubGFG).
Step 2: Set up the virtual environment. Here we create an environment .env
python -m venv .env
Step 3: Activate the environment.
.env\Scripts\activateπ Image
Step 1: In Python, we have Beautiful Soup which is a library to pull out data from HTML files. To install Beautiful Soup, run a simple command;
pip install beautifulsoup4π Image
Step 2: Install the Requests module of Python. Requests allows to send HTTP/1.1 requests extremely easily.
pip install requests
Create a python file. (eg: github.py)
Step 3: Following are the steps for Scraping data from the Web Page. To get the HTML text from the web page;
github_html = requests.get(f'https://github.com/%7Busername%7D').text
The {username} will have the GitHub username of the required user. To represent the parsed object as a whole we use the BeautifulSoup object,
soup = BeautifulSoup(github_html, "html.parser")
Example:
Output:
π ImageNow find the avatar class in the HTML document as it has the required URL for the profile image.
find_all(): The find_all() method looks through a tagβs descendants and retrieves all descendants that match the filters. Here our filter is an img tag with the class as avatar.
Following is the output of avatar_block:
π ImageThe image URL is in the src attribute, to get the URL text use .get():
Following is the output of img_url:
π ImageFind the first Counter class in the HTML document as it has the required data for the number of repositories.
find(): The find() method looks through a tagβs descendants and retrieves a single descendant that matches the filters. Here our filter is a span tag with the class as Counter.
repos = soup.find('span',class_="Counter").text
The entire code would be as follows:
Output:
https://avatars.githubusercontent.com/u/59017652?v=4 33
We will use Flask which is a micro web framework written in Python.
pip install Flask
Following is the starter code for our flask application.
π ImageOpen localhost on your browser:
π ImageGetting the GitHub username from the URL:
Output:
π ImageWe would now add our code of Web Scrapping and some helper methods provided by Flask to properly return JSON data. jsonify is a function in Flask. It serializes data to JavaScript Object Notation (JSON) format. Consider the following code:
Output:
π ImageIf the username is not correct or for any other reason, we need to add our code in the try and except block to handle exceptions. The final code would be as follows: