VOOZH about

URL: https://thenewstack.io/regular-expressions-and-solving-the-food-taster-dilemma/

⇱ Regular Expressions and Solving the Food Taster Dilemma - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-08-24 03:00:25
Regular Expressions and Solving the Food Taster Dilemma
tutorial,
Software Development

Regular Expressions and Solving the Food Taster Dilemma

A look at lookaround functions in regular expressions; and a reminder of why paranoid kings and emperors employed food tasters.
Aug 24th, 2022 3:00am by David Eastman
👁 Featued image for: Regular Expressions and Solving the Food Taster Dilemma
Feature image via Shutterstock.
In my previous regex post, we covered simple regular expressions that work with most code base libraries, as well as with the search functions in many editors. Things were fairly simple to comprehend, and all was good with the world. So now I’m going to ruin all that. Just a reminder of why we should value regular expressions. By knowing one way to search through text (even a 70-year-old way) in a computational style, we can not only solve problems with different tools, we can also better understand the problems with search itself. This time I’m going for the next step up in difficulty: the lookaround functions. But first, we have to understand consumption. You’ll soon appreciate why I didn’t cover this in the first post.

Consumption

When the regex process matches successfully, it includes the matched character in the result and then moves on to the next character. This sounds like the right thing to do, but it does mean you can’t check actions before you commit to them. This is why paranoid kings and emperors employed food tasters; you couldn’t un-eat an apple after you discovered it was poisoned. But if you remember, when we saw anchors we noticed they didn’t consume. They were used to clamp the search to the beginning or end of a line. We used this to find the first word in a line of Shakespearean text: 👁 Image
Here is an example that shows the food taster dilemma in detail. How can you find letter combinations that break the rule “i before e except after c”? This familiar but rather unfortunate rule of English language has a lot of exceptions. If we just apply two simple searches for the two sets of offending exceptions, it won’t work as you can see here: 👁 Image
Clearly, the search can’t tell if it’s looking at a good “cei” or it has found a genuine rule-breaking “ei”. At least applying the second pattern is straightforward for catching half the offenders that contain “cie”: 👁 Image
But how can we detect both sets of offenders correctly with just one query? The answer would seem to be that we can use an alternation (an “either-or” rule) to combine both rules and then disallow the “c” when checking for the “ei” rule: 👁 Image
The above solution looks good at first, but then the last three of the renegade spellings has captured the “s”, “v” and “w” — which are not in themselves problem letters, but they matched with the negated metacharacter.

Look Around without Consuming

We need to be able to “look around” without consuming. Command the food taster to have a nibble, and see what happens. Below we use the correct expression to ignore the result of the “ei” check when a “c” appears in front of it: 👁 Image
That worked. The new terrible-looking profusion of characters is a negative lookbehind. A close inspection of the screenshot above shows that there is a little warning flag on the right side — it is warning us that “the browser may not support negative lookbehind”! Let us admire the whole lookaround family:
<code class="language-diff">
(?=a) positive lookahead

(?!a) negative lookahead

(?<=a) positive lookbehind 

(?<!a) negative lookbehind
</code>

The Sub Regex

In the diagram above, where I have put an “a”, you can place a character or indeed any regex. This is matched by the lookaround before the rest of the expression is processed. So technically, we can have a form of “if .. then” — like a programming fork. Let’s say we want to find words for a Wordle-type puzzle; for example, five letters long that includes the renegade combinations from our former problem. So we want to only look at five-letter length words. To do this we want to use the following functions, most of which I introduced in my previous regex article:
<code class="language-diff">\w A word character
 b The boundary between a word and a non-word character (this doesn’t consume)
{5} Repeat the preceding metacharacter 5 times
</code>
So, a word of length five letters would be matched by the expression:
<code class="language-diff">\b\w{5}\b
</code>
Think of this as five letters consecutively sandwiched between a non-word character (eg. a space, punctuation or end of line). It works with our example words: 👁 Image
So now, we can combine this with our “i before e” renegade detector. Now, doing one “sub” calculation before going on to do another is fine for a general-purpose programming language, but a little bit of a stretch for regex. Sure it works, but you will start to produce some difficult-to-read code that might be tough to debug. Nevertheless, if we do a positive lookahead sub calculation with our length solution, and then follow it on with our renegade test: 👁 Image
It doesn’t work! But that’s because we have positioned ourselves in front of the word and we are not just freely looking for the combo anywhere on the line. We need to represent the entire word. So we pad our detector expression with word characters that might appear before and after: 👁 Image
That worked! Finally, let’s just prove that if we expand the repetition length to between four and 10 letters, we really capture all the renegade examples (and none of the good guys): 👁 Image
Now part of the reason we are doing this is that we can use this tool in other editors. Or can we? Let us try just the renegade detector in two different editors. First, Microsoft Word: 👁 Image
“In its own unique way” is a rather large red flag. Not only does it not support lookaround (no surprise there) it doesn’t even support alternation. Sad face. 👁 Image
Sublime is a developer-friendly editor, and has no trouble with regex. You just have to hit the “.*” button (that is the informal sign that regex is welcome here) and off you go: 👁 Image
So, some success at least. I hope text editors and search facilities retain the faithful regex, and that you remember this independent solution when you need to find text treasure hidden in a forest of words.
TRENDING STORIES
David has been a London-based professional software developer with Oracle Corp. and British Telecom, and a consultant helping teams work in a more agile fashion. He wrote a book on UI design and has been writing technical articles ever since....
Read more from David Eastman
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.