![]() |
VOOZH | about |
SpaCy is an open-source library for advanced natural language processing in Python. It is perfect for both industrial and scholarly applications because it is made to process vast amounts of text efficiently. Pre-trained models for multiple languages are provided by SpaCy, making tasks like dependency parsing, named entity identification, and part-of-speech tagging possible. Its modular design makes it an adaptable option for developers, enabling smooth integration with other libraries and tools in the NLP ecosystem.
The motive behind this project is to create and develop an application or model that can efficiently summarize a large textual article or text document. This, in turn, helps users such as students, researchers, and teachers to summarize the text. For all this, we require a basic knowledge of Flask, HTML, and NLP.
Open Anaconda Navigator and Launch vs-code or open any other IDE like Pycharm. To create a virtual Environment write the following code in the terminal.
- python -m venv <enviroment name>
- <enviroment name>\Scripts\activate
app.py: The app.py begins by importing necessary libraries for web handling, form creation, and text processing, and initializes a Flask instance with a secret key for session management while loading the SpaCy English model for NLP tasks. It defines a Form class using Flask-WTF, featuring a text input field and a submit button with validation to ensure the field isn't empty.
The application also downloads essential NLTK resources (stopwords and punkt) for tokenization and stopword removal. The root route (/) of the web application creates an instance of the Form, checks if it has been submitted and validated, processes the input text using the prediction function to generate a summary if valid, and renders the home.html template, passing the form and summary for display.
remove_punc(text): This function starts by tokenizing the input text into individual sentences and then further breaks down each sentence into words. It filters out any punctuation marks from these words. After filtering, it reconstructs the sentences from the remaining words and finally returns the text devoid of punctuation.
remove_tags(text): This function defines a list of HTML tags to be removed. It then tokenizes the input text into sentences and further into words within each sentence. It filters out the specified HTML tags from these words. After filtering, the function reconstructs the sentences from the remaining words and returns the cleaned text.
remove_stpwrds(text): This function begins by loading a set of English stopwords. It then tokenizes the text into sentences and further into words within each sentence. The function filters out any stopwords from these words. After filtering, it reconstructs the sentences from the remaining words and returns the text without stopwords.
extract_keywords(text): This function processes the input text using SpaCy to obtain part-of-speech tags for each token. It then filters tokens based on specified tags (PROPN, ADJ, NOUN, VERB). Finally, it collects and returns the filtered keywords that meet the criteria.
summarize_text(text): This function preprocesses the input text by removing punctuation, HTML tags, and stopwords. It then extracts keywords from the cleaned text and calculates their frequency. The function normalizes the keyword frequencies and assigns a strength score to each sentence based on these frequencies. Finally, it selects and returns the top sentences with the highest scores as the summary.
home.html: The provided code sets up a Flask web application that allows users to input text and receive a summarized version in three key points, which are then displayed on a webpage. The form uses the POST method to submit data to the root URL (/). It includes a text area for input and a submit button.
Output:
Just write "python app.py" on the terminal and this would be generated.
After that just click on the "http://127.0.0.1:5000" and you would be redirected to a webpage, which would the homepage of the application.
Output: