VOOZH about

URL: https://www.geeksforgeeks.org/nlp/rule-based-approach-in-nlp/

⇱ Rule Based Approach in NLP - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Rule Based Approach in NLP

Last Updated : 23 Jul, 2025

Natural Language Processing serves as an interrelationship between human language and computers. It is a subfield of Artificial Intelligence that helps machines process, understand and generate natural language intuitively. Common tasks done by NLP are text and speech processing, language translation, sentiment analysis, etc. The use cases include spam detection, chatbots, text summarization, etc.

There are three types of NLP approaches:

  1. Rule-based Approach - Based on linguistic rules and patterns
  2. Machine Learning Approach - Based on statistical analysis
  3. Neural Network Approach- Based on various artificial, recurrent, and convolutional neural network algorithms

Rule-based approach in NLP

Rule-based approach is one of the oldest NLP methods in which predefined linguistic rules are used to analyze and process textual data. Rule-based approach involves applying a particular set of rules or patterns to capture specific structures, extract information, or perform tasks such as text classification and so on. Some common rule-based techniques include regular expressions and pattern matches.

Steps in Rule-based approach in NLP:

  1. Rule Creation: Based on the desired tasks, domain-specific linguistic rules are created such as grammar rules, syntax patterns, semantic rules or regular expressions.
  2. Rule Application: The predefined rules are applied to the inputted data to capture matched patterns.
  3. Rule Processing: The text data is processed in accordance with the results of the matched rules to extract information, make decisions or other tasks.
  4. Rule refinement: The created rules are iteratively refined by repetitive processing to improve accuracy and performance. Based on previous feedback, the rules are modified and updated when needed.
👁 Image
Steps in Rule-Based Approach

Libraries that can be used for a rule-based approach are: Spacy(Best suited for production), fast.ai, NLTK(Not preferred nowadays)
In this article, we'll work with the Spacy library to demonstrate the Rule-based Approach. Spacy is an open-source software library designed for advanced Natural Language Processing (NLP) tasks. It is built in Python and provides a wide range of functionalities for processing and analyzing large volumes of text data

A rule-matching engine in Spacy called the Matcher can work over tokens, entities, and phrases in a manner similar to regular expressions.

Spacy Installation:

# Spacy Installation
!pip install - U spacy
!pip install - U spacy-lookups-data
!python - m spacy download en_core_web_sm # For English language

Example 1: Matching Token with Rule-based Approach

Step 1: The necessary modules are imported

Step 2: The English Language Spacy model is loaded

Step 3: The input text is added and all the tokens are separated.

Output:

Tokens: [Natural, Language, Processing, serves, as, an, interrelationship, between, human, 
language, and, computers, ., Natural, Language, Processing, is, a, subfield, of, Artificial,
 Intelligence, that, helps, machines, process, ,, understand, and, generate, natural, 
 language, intuitively, .]
Number of token : 34

Step 4: The rule-based matching Engine 'Matcher' is loaded.

Step 5: The rule or the pattern to be searched in the text is added. Here the words 'language' and 'human' are set as patterns.

Step 6: The pattern is added to the matcher object using the 'add' method with the first parameter as ID and the second parameter as the pattern.

Step 7: The matcher object is called with the 'doc' object input text to match the pattern. The result is stored in 'matches' variable

Step 8: The matched results are extracted and printed.

Output:

match_id:9580390278045680890, string_id:TokenMatch, Start:1, End:2, Text:Language
match_id:9580390278045680890, string_id:TokenMatch, Start:8, End:9, Text:human
match_id:9580390278045680890, string_id:TokenMatch, Start:9, End:10, Text:language
match_id:9580390278045680890, string_id:TokenMatch, Start:14, End:15, Text:Language
match_id:9580390278045680890, string_id:TokenMatch, Start:31, End:32, Text:language

Example 2: Matching Phrases with the Rule-based Approach

Step 1: The PhraseMatcher module is imported from Spacy

Step 2: The English Language Spacy model is loaded

Step 3: The input text is added as 'doc' object

Output:

Natural Language Processing serves as an interrelationship between human language and computers.
 Natural Language Processing is a subfield of Artificial Intelligence that helps machines process,
 understand and generate natural language intuitively.

Step 4: The PhraseMatcher object is instantiated.

Step 5: The list of phrases is added in term_list which is converted to a patterns object using 'make_doc' method to speed up the process.

Step 6: The created rule is added to the matcher object

Step 7: The matcher object is called on the input text 'doc' with parameter 'is_spans=True' that returns span objects directly. The extracted results are printed.

Output:

Language Processing :- Phrase Match
human language :- Phrase Match
Language Processing :- Phrase Match

Example 3: Named Entity Recognization with Spacy

Step 1: Import spacy and Load the English Language Spacy model

Step 2:  Named Entity Recognization with Spacy

Output:

Text:Pawan Kumar Gunjan, Label:PERSON
Text:India, Label:GPE
Text:India, Label:GPE
Text:the Republic of India, Label:GPE
Text:South Asia, Label:LOC
Text:seventh, Label:ORDINAL
Text:second, Label:ORDINAL
Text:the Indian Ocean, Label:LOC
Text:the Arabian Sea, Label:LOC
Text:the Bay of Bengal, Label:LOC
Text:Pakistan, Label:GPE
Text:China, Label:GPE
Text:Nepal, Label:GPE
Text:Bhutan, Label:GPE
Text:Bangladesh, Label:GPE
Text:Myanmar, Label:GPE

Advantages of the Rule-based approach:

  • Easily interpretable as rules are explicitly defined
  • Rule-based techniques can help semi-automatically annotate some data in domains where you don't have annotated data (for example, NER(Named Entity Recognization) tasks in a particular domain).
  • Functions even with scant or poor training data
  • Computation time is fast and it offers high precision
  • Many times, deterministic solutions to various issues, such as tokenization, sentence breaking, or morphology, can be achieved through rules (at least in some languages).

Disadvantages of the Rule-based approach:

  • Labor-intensive as more rules are needed to generalize
  • Generating rules for complex tasks is time-consuming
  • Needs regular maintenance
  • May not perform well in handling variations and exceptions in language usage
  • May not have a high recall metric

Why Rule-based Approach with Machine Learning and Neural Network Approaches?

  1. Rule-based NLP usually deals with edge cases when included with other approaches. 
  2. It helps to speed up the data annotation. For instance, a rule-based technique is used for URL formats, date formats, etc., and a machine learning approach can be used to determine the position of text in a pdf file (including numerical data).
  3. Also, in languages other than English annotated data is really scarce even for common tasks which are carried out by Rule-based NLP. 
  4. By using a rule-based approach, the computation performance of the pipeline is also improved. 
     
Comment
Article Tags:

Explore