In Perl, a
Regular expression(a.k.a regexes or regexps or REs) is a way of describing a set of strings without having to list all strings in your program or simply we can say that it is a sequence of characters that are used for pattern matching. In Perl, regular expressions have different uses:
- Firstly, they are used in conditionals to determine whether a string matches a particular pattern.
Example:Usage of Regular expressions in conditionals.
Output:
👁 Image
Here, the input provided by the user is matched, if we have a word "hungry", i.e, the if condition is true it will print "What would you have?" otherwise it will jump onto next condition or a statement
- Secondly, they can locate patterns within a string and can replace them with something else.
Example: Substitution Operator
Output:
👁 Image
In the above code substitution of "good" is done in place of "worst".
-
Finally, patterns can specify not only where something is, but also where it isn’t. So the
split operator uses a regular expression to specify where the data isn’t. That is, the regular expression defines the separators that delimit the fields of data.
Example: Split Operator
Output:
👁 Image
Here, in the above example, the split function matches on a single comma character.
Backtracking
Another important feature of Regular expression matching is backtracking, which is currently used (when needed) by all regular non-possessive expression
quantifiers(used to count total all the matches, instead of the default of matching just once), namely "*", *?, "+", +?, {n, m}, and {n, m}?. Backtracking is often optimized internally, but the general principle outlined here is valid(it’s returning from an unsuccessful recursion on a tree of possibilities). Perl backtracks when it attempts to match patterns with a regular expression, and its earlier attempts don’t pan out or simply we can say backtracking means storing the matching patterns for future use.
For Example: /.*?/ might be used to match something similar to an HTML tag like “
Bold”. This pushes the two parts of the pattern to match the exact same string, for this case its 'B'.
Let us take another example,
/^ab*bc*d/
The above regexp can be read as:
1. Starting at the beginning of the string
2. Match an 'a'.
3. Match as many 'b's as possible, but not matching any is ok.
4. Match as many 'c's as possible, but not matching any is ok.
5. Match as many 'd's as possible, but not matching any is ok.
Match against 'abbbccdddd'.
👁 Image
Here we can see we are backtracking into step 3 because step 4 is not feasible, so we backtrack and find the best solution to proceed into step 4.