![]() |
VOOZH | about |
Web scraping is the process of extracting data from websites automatically. It allows us to collect and use real-time information from the web for various applications.
In this project, we'll understand web scraping by building a Flask app that fetches and displays live cricket scores from an online sports website. This will help us see how to extract specific data using Python and present it in a user-friendly way.
To create a basic flask app, refer to- Create Flask App
After creating and activating a virtual environment install Flask and other libraries required in this project using these commands-
pip install requests
pip install beautifulsoup4
We would use the NDTV Sports Cricket Scorecard to fetch the data. Following are the steps for Scraping data from the Web Page. To get the HTML text from the web page;
html_text = requests.get('https://sports.ndtv.com/cricket/live-scores').text
To represent the parsed object as a whole we use the BeautifulSoup object,
soup = BeautifulSoup(html_text, "html.parser")
Note: It is recommended to run and check the code after each step to know the difference and thoroughly understand the concepts.
Let's look at how to fetch and parse the HTML content of from our taget website:
Output:
Explanation:
Now that we have a basic idea of how to fetch live data from a a URL we can proceed to create a flask app and implement it get the live cricket scores.
This file will contain the code for our main Flask application, we are going to scrape live cricket scores from NDTV Sports using BeautifulSoup and display them in json format.
In this part, we will fetch live cricket scores from the NDTV Sports website using requests and BeautifulSoup. This will allow us to extract match details from the webpage.
Explanation
Now that we have fetched the HTML content, we will extract important match details such as teams, scores, location, and match status.
Explanation
In the final part, we will extract the teamsβ names and scores, then return all the match details as a JSON API using Flask.
Explanation
To run the application, use this command in the terminal-
python app.py
And then visit the developmeent URL- "http://127.0.0.1:5000".
Step 1: You need to create an account on Heroku.
Step 2: Install Git on your machine.
Step 3: Install Heroku on your machine.
Step 4: Login to your Heroku Account
heroku login
Step 5: Install gunicorn which is a pure-Python HTTP server for WSGI applications. It allows you to run any Python application concurrently by running multiple Python processes.
pip install gunicorn
Step 6: We need to create a profile which is a text file in the root directory of our application, to explicitly declare what command should be executed to start our app.
π Imageweb: gunicorn CricGFG:app
Step 7: We further create a requirements.txt file that includes all the necessary modules which Heroku needs to run our flask application.
pip freeze >> requirements.txt
Step 8: Create an app on Heroku, click here.
π ImageStep 9: We now initialize a git repository and add our files to it.
π Imagegit init
git add .
git commit -m "Cricket API Completed"
Step 10: We will now direct Heroku towards our git repository.
heroku git:remote -a cricgfg
Step 11: We will now push our files on Heroku.
git push heroku master
Finally, our API is now available on https://cricgfg.herokuapp.com/
π Image