VOOZH about

URL: https://thenewstack.io/tla-the-best-debugger-optimizer-youve-never-heard-of/

⇱ TLA+: The Best Debugger/ Optimizer You’ve Never Heard of - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-01-26 08:00:26
TLA+: The Best Debugger/ Optimizer You’ve Never Heard of
news,
Software Development

TLA+: The Best Debugger/ Optimizer You’ve Never Heard of

Jan 26th, 2022 8:00am by Joab Jackson
👁 Featued image for: TLA+: The Best Debugger/ Optimizer You’ve Never Heard of
Image by Andrzej Rembowski from Pixabay. 

👁 TLA+ logo
In 2012, when Amazon Web Services launched DynamoDB as a commercial service, it arrived with an extra assurance for reliability. The replication and fault tolerance mechanisms for the distributed database system went the usual gauntlet of design and code reviews, extensive testing and was examined in a series of informal correctness proofs.

But in this case, an additional check was taken. The codebase for these functions was also modeled through a formal specification language built on discrete math, called  TLA+. TLA+ found a previously missed bug that could have led to data loss when a particular sequence of failures and recovery steps was interleaved with other processing. This was an extremely subtle bug; the shortest error trace exhibiting the bug contained 35 high-level steps.

Through TLA+, AWS engineers also found other previously undetectable subtle bugs, and, better yet, were able to” make aggressive performance optimizations without sacrificing correctness,” noted a subsequent paper from the cloud giant “How Amazon Web Services Uses Formal Methods.

“Some of the more subtle, dangerous bugs turn out to be errors in design. We have found that testing the code is inadequate as a method for finding these errors, as the number of reachable states of the code is astronomical,” the engineers wrote.

Formally verifying the correctness of a program has long been assumed to be a costly task, hence its use is restricted primarily for safety-critical applications. But AWS software engineers have asserted it can be used more broadly. In particular, it could be used to solve thorny concurrency problems in distributed cloud native systems.

Microsoft has benefitted from TLA+ as well. TLA+ caught a critical error in the memory design of the Xbox, a bug that would have shut down the console after four hours of gameplay.

Amazon Web Services’ distributed systems engineer Murat Demirbas tested TLA+ in a class he taught at the University of Buffalo and found the language useful. Even though the learning curve was steep, “The students liked TLA+ since it gave them a way to experiment and supported them in reasoning about the algorithms,” he wrote. Citing a paper, industry observer Adrian Coyler suggested TLA+ could be used to check smart contracts.

Still finding bugs despite rigorous testing? It might be time for some formal specification and model checking to help solve difficult design problems in critical systems.

How It Works

Software consultant Hillel Wayne first got to know TLA+ working on an educational system, which was, in his words, an “accidentally-distributed system,” where a lot of reoccurring “weird bugs” kept cropping up. They were often collisions between third-party software that the ops team had little control over.

“I saw how Amazon used it, and thought that maybe it would be something worth trying out. I ended up learning it in a couple of weeks,” Wayne said. Applying at work, he caught a bunch of critical bugs, as well as shaved some time off of the delivery schedules.

Wayne wondered why few others seemed to be using it, aside from Amazon. It turned out there was a serious lack of educational material on using the specification language, aside a few books that were more focused on the formal mathematical side of things. To fill this gap of information around TLA+, Wayne wrote a series of tutorials, a book and now professionally teaches TLA+.

A mathematical language, TLA+ uses temporal logic to reason about concurrent systems. The name is an acronym for Temporal Logic of Actions, as it was defined by its creator Leslie Lamport. A pioneer in the science of distributed systems, Lamport has always advocated the importance of developers modeling their systems before actually coding them.

For any system, it specifies both what a system should do — its desired “correctness properties” —  and how that system should work, its design. An accompanying explicit state model checker checks for desired correctness across all possible execution states

TLA+ is a bit like architectural design. Just as home builders just don’t start erecting a house at random, so too should large-scale software development follow a high-level design. Coders won’t be able how different components may interact in a higher level.

A architectural diagram by itself provides no way of being tested. But TLA+ also provides a place to include properties of each component. With properties, TLA+ can model check the design, looking possible combinations and edge conditions that could cause bad states. One easy example is a queue that could change the order in which it processes messages, which would potentially confuse a transaction between two components.

There are a number of ways to work with TLA+. One can sketch out a high-level overview of the program they want to create, then map to TLA+ principles, then derive the code from that. Or they can go in the other direction and translate an existing code base to TLA+ to get a high-level overview.

Many TLA+ users rely on TLC, a model checker, part of Lamport’s TLA+ Toolbox, that can run through all the possible permutations of a system. Models can also be checked by hand, but this truly is time-consuming.

A typical problem that TLA+ could tackle is that of guaranteeing idempotency, where a message needs to be sent only once. This requirement can be derived from the various vendor guarantees, such as simply never sending a message more than once, and sending it at least once.

“It’s being able to essentially test your architectural diagrams,” Wayne said.

https://www.youtube.com/watch?v=ATobswwFwQA&t=19s

Slow Acceptance

Although there was a bit of buzz around TLA+ about a decade ago, the hype seems to have waned, despite its immediate applicability to distributed cloud native systems. When it surfaces on discussion boards such as Hacker News, most curious parties admit it could help them with their work, but they don’t have time to learn or use TLA+.

There is a bit of a learning curve. In the workshop Wayne teaches, the students are usually able to find actual bugs in complex code in about day three. Most of his students are developers and software engineers, with a few IT architects and project managers as well. On their own, they may reach that level of proficiency after a few weeks, Wayne estimates. Often a tool called PlusCal can be used, which offers a more developer-centric (rather than mathematician-centric) pseudocode-like interface to the specs.

A bigger barrier to adoption is the time it demands from the project engineers. The output is a design, not code, and so the engineer still must make the translation from design to code or vice-versa (Though there is an upside — even the manual work of mapping a program, or proposed program, can make obvious to engineers errors that they may not have thought about before).

Plus, an enterprise’s full adoption of TLA+ goes beyond the coder, necessitating change to how the project work itself is structured. Managers of a time-sensitive project (and what project isn’t?) may not want to factor in an extra step for verification. To get the full benefit of TLA+, a project team should factor in the additional step to incorporate TLA+.

TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
Amazon Web Services is a sponsor of The New Stack.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.