VOOZH about

URL: https://thenewstack.io/kairux-root-cause-debugging-with-the-inflection-point-hypothesis/

⇱ Kairux: Identifying and Debugging Root Causes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-02-27 05:00:03
Kairux: Identifying and Debugging Root Causes
Science / Software Development / Software Testing

Kairux: Identifying and Debugging Root Causes

Kairux is a fault localization research prototype that uses unit testing and adaptive dynamic slicing to work backward from the failure execution to identify what went wrong and why.
Feb 27th, 2023 5:00am by Jessica Wachtel
👁 Featued image for: Kairux: Identifying and Debugging Root Causes

Anyone who’s worked with distributed systems knows there are far too many opportunities for something to go wrong. Pinpointing the exact spot in the source code is like looking for a needle in a haystack.

Kairux is a fault localization tool for Java-based systems, one that uses unit testing and adaptive dynamic slicing to work backward from the failure execution to identify what went wrong and why.

By implementing the Inflection Point Hypothesis — that the root cause of any failure is also the step where a failure execution and successful execution diverge — Kairux can identify the bug and present the source code needed to understand why it happened.

About 77% of failures in distributed systems are caused by more than just one thing. There’s no perfect solution, and picking the “right” root cause is subjective. While Kairux can’t guarantee that the “right” root cause is selected, it can identify the last root case so by continuing to apply Kairux repeatedly, multiple bugs in an execution could be identified.

Kairux first reduces the potential number of successful threads to compare against, then returns the one with the longest common prefix to the failure thread.

This research project was highlighted this month by Usenix, though the authors of the project have not indicated whether they plan to open source the code, which works for Java code, or embark on a commercial implementation.

What’s the Inflection Point Hypothesis?

The Inflection Point Hypothesis is simple and straightforward. The inflection point is the root cause of an issue,  the first instruction where the failure execution diverges from the non-failure execution. So, the best way to pinpoint the inflection point is to find the successful execution with the longest common prefix to the failure execution.

The inflection point holds significance as the last step in the failure execution where failure is still avoidable. By using this hypothesis as the guiding principle, fault localization becomes a principled search problem.

👁 Image

A failure caused by a read-after-write data race. In example-failure, Thread 1 modified a = -1, which is no longer 0 causing a failure. failure-instr-seq is the instruction sequence that caused the bug in example-failure. If all successful combinations of instructions are created and compared to failure-instr-seq, eventually the sequence with the longest common prefix will surface. In this example, it’s instr-seq-n. (Source: Usenix)

Kairux Automates the Inflection Point Hypothesis

Kairux is a set of algorithms for Java-based distributed systems that abstracts away all the heavy lifting. It takes the following inputs:

  • Steps needed to reproduce the bug, usually packaged in a unit test.
  • The failure symptom itself.
  • The source code.
  • All unit tests.

In order to output:

  • The inflection point.
  • The instruction sequence with the longest common prefix.
  • The steps needed to reproduce the instruction sequence with the longest common prefix.

The “why” and “how” of the bug can be understood by comparing the failed execution’s source code with the longest successful source code. The “where” is more readily located by minimizing the number of potential sources. Kairux reduces the potentially infinite number of combinations with the following key concepts:

  • Adaptive dynamic slicing: Only sequences and source code related to the bug are considered. All remaining target instruction sequences are separated into subsequences in different threads with the thread containing the bug processed first before adaptively extending the analysis to other threads.
  • Unit test utilization: Kairux only considers successful sequences included in existing unit tests’ and that list is further minimized by prioritizing the code most similar to the failure execution first.
  • Valid execution modification: To make the best use of the test-provided sequences, Kairux attempts to modify the test’s input parameters to reduce any divergences from the failure thread when they arise. When the updated parameters no longer lead to valid executions, the attempts end.

Greater detail about Kairux’s algorithms can be found in the technical paper.

Architecture and Implementation of Kairux

👁 Image

Kairux’s architecture. (Source: Usenix)

Kairux uses static analysis over dynamic analysis due to the high overhead costs associated with dynamic analysis. The static slice is a super-set of the instructions that belong in the dynamic slice of any failure execution.

Kairux builds the dynamic slice it needs by setting breakpoints, then reproducing the failure execution. Each breakpoint it’s able to hit is recorded and a trace is obtained, followed by a dependency analysis, and then annotating network communication libraries. From this, a dynamic slice is obtained.

Additional work includes the use of breakpoints to enforce different thread scheduling and assigning unique tags to differentiate different runtime instances of a source code object and track data flow.

The End Result

The short answer: it works. The root causes for failures were located 70% of the time. The number of sequences examined were reduced from a possible infinity to 0.2% in the process. Failures caused by “missing events,”  something that should have happened but didn’t, and failures caused by anomalous events were both explained.

Kairux was evaluated on randomly sampled, real-world failures from HBase, HDFS and ZooKeeper.

TRENDING STORIES
Jessica Wachtel is a developer marketing writer at InfluxData where she creates content that helps make the world of time series data more understandable and accessible. Jessica has a background in software development and technical journalism.
Read more from Jessica Wachtel
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Unit.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.