![]() |
VOOZH | about |
Parts of Speech (PoS) tagging is a fundamental task in Natural Language Processing (NLP) where each word in a sentence is assigned a grammatical category such as noun, verb, adjective or adverb. This process help machines to understand the structure and meaning of sentences by identifying the roles of words and their relationships.
It plays an important role in various NLP applications including machine translation, sentiment analysis and information retrieval, by bridging the gap between human language and machine understanding.
Consider the sentence: "The quick brown fox jumps over the lazy dog."
After performing POS Tagging, we get:
- "The" is tagged as determiner (DT)
- "quick" is tagged as adjective (JJ)
- "brown" is tagged as adjective (JJ)
- "fox" is tagged as noun (NN)
- "jumps" is tagged as verb (VBZ)
- "over" is tagged as preposition (IN)
- "the" is tagged as determiner (DT)
- "lazy" is tagged as adjective (JJ)
- "dog" is tagged as noun (NN)
Each word is assigned a tag based on its role in the sentence. For example, "quick" and "brown" are adjectives that describe the noun "fox."
Letβs see various steps involved in POS tagging:
There are different types and each has its strengths and use cases. Let's see few common methods:
Rule-based POS tagging assigns POS tags based on predefined grammatical rules. These rules are crafted based on morphological features (like word endings) and syntactic context, making the approach highly interpretable and transparent.
Example:
1. Rule: Assign the POS tag "Noun" to words ending in "-tion" or "-ment".
2. Sentence: "The presentation highlighted the key achievements of the project's development."
3. Tagged Output:
TBT refines POS tags through a series of context-based transformations. Unlike statistical taggers that rely on probabilities or rule-based taggers, it starts with initial tags and improves them iteratively by applying transformation rules.
Example:
It uses probabilistic models to assign grammatical categories (e.g noun, verb, adjective) to words in a text. Unlike rule-based methods which rely on handcrafted rules, it learns patterns from large annotated corpora using machine learning techniques.
These models calculate the likelihood of a tag based on a word and its context, helping to resolve ambiguities and handle complex grammar. Common models include:
Let's see step by step process how POS Tagging works with NLTK:
Here we import the NLTK library and download the necessary datasets using nltk.download().
First we store the sentence and tokenize it into words using word_tokenize(text). Then we apply POS tagging to the tokenized words using pos_tag(words). This assigns a part-of-speech to each word.
Now we print the original sentence and loop through the tagged words to show each word with its POS tag.
Output:
Let's see step by step process how POS Tagging works with SpaCy:
Here we install and import the SpaCy library and install the pre-trained English language model (en_core_web_sm).
We store the sentence in a variable and process it with nlp(text) to get linguistic annotations.
Print the original sentence then loop through the tokens and display each word with its POS tag.
Output: