VOOZH about

URL: https://www.geeksforgeeks.org/dsa/remove-duplicate-words-from-sentence-using-regular-expression/

⇱ Remove duplicate words from Sentence using Regular Expression - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Remove duplicate words from Sentence using Regular Expression

Last Updated : 12 Jul, 2025

Given a string str which represents a sentence, the task is to remove the duplicate words from sentences using regular Expression in Programming Languages like C++, Java, C#, Python, etc.

Examples of Remove Duplicate Words from Sentences

Input: str = "Good bye bye world world" 
Output: Good bye world 
Explanation: We remove the second occurrence of bye and world from Good bye bye world world

Input: str = "Ram went went to to to his home" 
Output: Ram went to his home 
Explanation: We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.

Input: str = "Hello hello world world" 
Output: Hello world 
Explanation: We remove the second occurrence of hello and world from Hello hello world world. 

Approach

1. Get the sentence.
2. Form a regular expression to remove duplicate words from sentences. 

regex = "\\b(\\w+)(?:\\W+\\1\\b)+";

The details of the above regular expression can be understood as: 

  • "\\b": A word boundary. Boundaries are needed for special cases. For example, in "My thesis is great", "is" wont be matched twice.
  • "\\w+" A word character: [a-zA-Z_0-9] 
  • (?:\\W+\\1\\b)+ : This part is a non-capturing group (denoted by (?:...)). It's used to group together the repeated words. Let's break it down further:
  • "\\W+" : This matches one or more non-word characters (anything that is not a word character).
  • "\\1:" This is a back reference to the first capturing group (\\w+). It ensures that the same word that was captured earlier is repeated. The \\1 references the exact text captured by the first capturing group.
  • "\\b" Another word boundary anchor to ensure that the repeated word is a whole word.
  • "+" This quantifier ensures that the non-capturing group (?:\\W+\\1\\b) matches one or more times, effectively matching one or more repeated words.

3. Match the sentence with the Regex. In Java, this can be done using Pattern.matcher().
4. return the modified sentence.

Below is the implementation of the above approach:


Output
Good bye world
Ram went to his home
Hello world

Complexity of the above Programs

Time Complexity : O(n), where n is length of string
Auxiliary Space : O(1)

Comment