![]() |
VOOZH | about |
The Lancaster Stemmer or the Paice-Husk Stemmer, is a robust algorithm used in natural language processing to reduce words to their root forms. Developed by C.D. Paice in 1990, this algorithm aggressively applies rules to strip suffixes such as "ing" or "ed."
Prerequisites: NLP Pipeline, Stemming
You can easily implement the Lancaster Stemmer using Python. Here’s a simple example using the 'stemming' library, which can be installed using the following command:
!pip install stemming
Now, proceed with the implementation:
Output:
Original words: ['The', 'cats', 'are', 'running', 'swiftly', '.']
Stemmed words: ['Th', 'cat', 'ar', 'run', 'swiftli', '.']
The Lancaster Stemmer works by repeatedly applying a set of rules to remove endings from words until no more changes can be made. It simplifies words like "running" or "runner" into their root form, such as "run" or even "r" depending on how aggressively the algorithm applies its rules.