β’ Java String.codePointCount()
β’ Java String.codePointAt()
β’ Java String.concat()
β’ Java String.contains()
β’ Java String.copyValueOf()
β’ Java String.endsWith()
β’ Java String.format()
β’ Java String.getBytes()
β’ Java String.indexOf()
β’ Java String.intern()
β’ Java String.isEmpty()
β’ Java String.lastIndexOf()
β’ Java String.regionMatches()
β’ Java String.replace()
β’ Java String.startsWith()
β’ Java String.subSequence()
β’ Java String.substring()
β’ Java String.toLowerCase()
β’ Java String.toUpperCase()
β’ Java String.trim()
β’ Java String.valueOf()
1. Overview
Working with Strings is an essential part of many Java applications. One of the most powerful tools is the String.replaceAll() method.
In this tutorial, weβll learn how this method works and explore some practical examples.
2. The String.replaceAll() Method
The String.replaceAll() method allows us to search for patterns within a String and replace them with a desired value. Itβs more than a simple search-and-replace tool, as it leverages regular expressions (regex) to identify patterns, making it highly flexible for a variety of use cases.
Letβs look at its signature:
public String replaceAll(String regex, String replacement)
It accepts two parameters:
- regex β a regular expression used to match parts of the String
- replacement β the String that replaces each match found
Next, letβs see an example:
String input = "Hello w o r l d";
String result = input.replaceAll("\\s", "_");
assertEquals("Hello_w_o_r_l_d", result);
In this example, we replace all regex pattern β\\sβ, which each match a single whitespace character, with a β_β character for each match.
Letβs take the same input and perform a different replacement:
result = input.replaceAll("e.*o", "X");
assertEquals("HX r l d", result);
In the above code, the regex βe.*oβ matches any segments that begin with βeβ and end with βoβ. Therefore, βello w oβ from input gets replaced with βXβ.
Itβs worth noting that β.*β performs a greedy match. That is to say, it matches from the first βeβ to the last βoβ in the String. If we want a pattern to match from the first βeβ to the next βoβ instead of the last, in other words, to be a non-greedy match, we can use the regex βe[^0]*oβ or βe.*?oβ:
result = input.replaceAll("e.*?o", "X");
assertEquals("HX w o r l d", result);
result = input.replaceAll("e[^o]*o", "X");
assertEquals("HX w o r l d", result);
βe.*?oβ matches from βeβ matches any substring that starts with an βeβ, ends with an βoβ, and has the shortest possible sequence of characters in between. The non-greedy β*?β quantifier ensures that the match is as short as possible, stopping at the first valid occurrence of the βoβ after the βeβ.
On the other side, βe[^o]*oβ matches Strings that:
- Start with an βeβ
- Contain zero or more characters that arenβt β0β between βeβ and βoβ
- End with an βoβ
As we can see, both approaches produce the expected result: βHX w o r l dβ.
3. String.replace() vs String.replaceAll()
The String class provides replace() and replaceAll(). The two methods look similar, and they can produce the same results sometimes:
String input = "hello.java.hello.world";
String replaceResult = input.replace("hello", "hi");
assertEquals("hi.java.hi.world", replaceResult);
String replaceAllResult = input.replaceAll("hello", "hi");
assertEquals("hi.java.hi.world", replaceAllResult);
In this example, both replace(βhelloβ, βhiβ) and replaceAll(βhelloβ, βhiβ) replace all βhelloβs in input with βhiβ. So, some of us may ask, whatβs their difference then?
The key difference between replace() and replaceAll() is that replace() always performs literal String replacement. In contrast, replaceAll() uses regex to match patterns, making it a more advanced tool for string manipulation.
The two methods produced the same result in the above example since the characters in the regex pattern βhelloβ have no special meanings.
Now, letβs say we want to replace all dots β.β with colons β:β. If we still pass the same parameters to replace() and replaceAll() this time, they produce different results:
replaceResult = input.replace(".", ":");
assertEquals("hello:java:hello:world", replaceResult);
replaceAllResult = input.replaceAll(".", ":");
assertEquals("::::::::::::::::::::::", replaceAllResult);
As we can see, the replaceAll() method replaces every character in input with β:β. This is because β.β has a special meaning in regex: matching any character.
Next, letβs see how to tell the regex engine to treat special characters as literal.
4. Handling Special Characters in Regex
Many characters have special meanings in regex, for example:
- β.β β Matches any character
- β[β and β]β β Define a character class
- β(β and β)β β Define a capture group
- β|β β Logical OR
- β¦
Sometimes, when we have these characters in our regex pattern, we want the regex engine to treat them as literal characters. We have two options to achieve that: escaping the character using backslash or putting it in a character class.
Next, letβs solve the problem in a previous example: replacing all β.β characters with β:β:
String input = "hi.java.hi.world";
String result = input.replaceAll("\\.", ":");
assertEquals("hi:java:hi:world", result);
result = input.replaceAll("[.]", ":");
assertEquals("hi:java:hi:world", result);
As we can see, we can get the expected result by escaping β.β or putting β.β in a character class.
Next, letβs see another example. Letβs say we have the String β (debug) hello.worldβ and we want to replace β(debug)β with β[info]β:
input = "(debug) hello.world";
result = input.replaceAll("(debug)", "[info]");
assertEquals("([info]) hello.world", result);
result = input.replaceAll("[(]debug[)]", "[info]");
assertEquals("[info] hello.world", result);
result = input.replaceAll("\\(debug\\)", "[info]");
assertEquals("[info] hello.world", result);
As the above code shows, since β(β and β)β are special characters in regex, replaceAll(β(debug)β, β[info]β) doesnβt give us the expected output.
However, escaping β(β and β)β or adding them to character classes solves the problem.
5. When the Regex Is Invalid
We understand that String.replaceAll() works based on regex. Next, letβs figure out what happens if the regex we passed to the method is invalid:
String input = "Hello world";
assertThrows(PatternSyntaxException.class, () -> input.replaceAll("e**", "X"));
In this example, we pass the regex βe**β to replaceAll(). But, βe**β is an invalid regex due to the improper usage of the β*β quantifiers.
The first β*β is a quantifier. So, βe*β means βzero or multiple contiguous βeβ characters. The second β*β is another quantifier, but it doesnβt have a valid preceding element to act on. Therefore, βe**β is an invalid regex.
As the test above shows, when we pass an invalid regex to String.replaceAll(), a PatternSyntaxException is raised.
6. Itβs More Than Just a Search-And-Replace Tool
Although String.replaceAll() sounds like a simple search-and-replace tool, it can do much more than that. In this section, letβs look at some advanced usages.
6.1. Referencing Capture Groups in the Replacement
Letβs say we have a nine-digit input:
String input = "123456789";
We aim to split the input into three three-digit groups, reverse and join the three groups with βββ:
String expected = "789-456-123";
Next, letβs solve this problem using String.replaceAll():
String result = input.replaceAll("(\\d{3})(\\d{3})(\\d{3})", "$3-$2-$1");
assertEquals(expected, result);
As the code shows, we created three capturing groups in the regex. Further, we can use $1, $2, and $3 to refer to those groups in the replacement String. This allows us to rearrange groups easily.
Additionally, we can use the named capturing group β(?<groupName>)β in the regex and reference those groups in the replacement by names β β${groupName}β:
result = input.replaceAll("(?<first>\\d{3})(?<second>\\d{3})(?<third>\\d{3})", "${third}-${second}-${first}");
assertEquals(expected, result);
In this example, we defined names for the three capturing groups (βfirstβ, βsecondβ and βthirdβ), and referenced them by name in the replacement String.
Named capturing groups have several advantages regarding readability, maintainability, and clarity, particularly when dealing with complex patterns.
6.2. Inserting Hyphens Between Any Two Contiguous Characters
Letβs say we are given a String with unknown length:
String input = "abcdefg";
Now, our task is inserting βββs between any two contiguous characters:
String expected = "a-b-c-d-e-f-g";
We can use String.replaceAll() with a lookaround assertion to solve it:
String result = input.replaceAll("(?<=.)(?=.)", "-");
assertEquals(expected, result);
In the regex β(?<=.)(?=.)β:
- (?<=.) β Lookbehind assertion ensures that there is any character (.) before the current position.
- (?=.) β Lookahead assertion ensures that there is any character (.) after the current position.
Itβs important to note that lookbehind and lookahead assertions donβt consume the character before or after them. Therefore, this regex matches the position between any contiguous characters.
When this pattern is passed to replaceAll(), it replaces those in-between positions with a hyphen (βββ).
7. Conclusion
String.replaceAll() is a powerful tool for manipulating text in Java. Its power lies in its ability to work with regular expressions, enabling us to perform complex replacements in just a few lines of code.
In this article, weβve explored its typical usage through examples and discussed the difference between String.replace() and String.replaceAll().
