VOOZH about

URL: https://thenewstack.io/distributed-tracing-why-its-needed-and-how-it-evolved/

⇱ Distributed Tracing: Why It’s Needed and How It Evolved - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2020-09-16 12:00:30
Distributed Tracing: Why It’s Needed and How It Evolved
contributed,sponsor-servicenow,sponsored,sponsored-post-contributed,
Observability / Software Development

Distributed Tracing: Why It’s Needed and How It Evolved

Distributed tracing is the “call stack” for a distributed system, a way to represent a single request as it flows from one computer to another.
Sep 16th, 2020 12:00pm by Austin Parker
👁 Featued image for: Distributed Tracing: Why It’s Needed and How It Evolved
Feature image via Pixabay.
ServiceNow sponsored this post.
Austin Parker
Austin Parker is the Principal Developer Advocate at Lightstep and maintainer on the OpenTracing and OpenTelemetry projects. In addition to his professional work, he's taught college classes, spoken about all things DevOps and Distributed Tracing, and even found time to start a podcast. Austin is also the co-author of Distributed Tracing in Practice, published by O'Reilly Media.

When you think about “tracing,” what do you think of? It’s one of those words in software development that is overloaded with meaning. Some people may think of a “stack trace,” the familiar blob of text issued by an application runtime that shows each function call preceding a point in the code where an exception occurred. Other people may go a step further, and think of the action of tracing — navigating through logs from different services and computers, literally tracing the path of a request as it moves through a system. Conveniently enough, “distributed tracing” is really a combination of both of these processes — you can think of it as the “call stack” for a distributed system, a way to represent a single request as it flows from one computer to another.

Now, if this doesn’t make a ton of sense, that’s normal — I just dropped a lot of terminology on you! This series is going to demystify distributed tracing, starting from the basics. Today, I’m going to talk about distributed systems — what they are, why we use them, and why the rise of distributed systems has made tracing so important.

So, let’s start at the beginning: code. A computer, without software, can’t do a whole lot — it can do a lot of math, really fast, but not a whole lot else. Software is just instructions to the computer, and code is a human-readable way to express those instructions. You can code in a lot of different programming languages — Java, C#, Go, JavaScript, Ruby, and hundreds more — but all of the programs you write have things in common with each other. Software needs to do something to be useful to people.

Twenty years ago, the way we used software was very different than the way we do today. We didn’t have “the cloud,” and the internet itself was a nascent technology. That said, since the 1970’s, a new type of software system was being developed — the distributed system. Now, the idea of a distributed system wasn’t new, per se — but by the 1970s computer technology had advanced to the point where they were feasible. In a distributed system, computers can act as both clients and servers, allowing for tasks to be performed on different machines. These systems can leverage economies of scale, allowing for large quantities of messages or data to be stored on a central server, which can be accessed by lightweight clients over a network. The servers take care of the “heavy lifting” of processing the data, while the clients simply make requests for what they need. This basic idea led to more codified forms, such as a three-tier or n-tier architecture, or even peer-to-peer architecture, where an “application” spread out into more independent services, working in concert with each other to satisfy a user’s request.

A note on architectures, tiers, and layers: Formally, “tiers” and “layers” are not substitutable; a tier refers to a discrete, physical unit, whereas a layer is a logical group of software components. That said, the two terms are often used interchangeably in conversation.

As high-speed internet access became more prevalent throughout the United States and the rest of the world, software architecture changed with it. Rather than specialized client software on home computers, web browsers began to act as an interface to more complex server applications running in remote data centers. These server applications, in turn, began to grapple with a problem — scaling. Not the kind of scaling you do trying to climb a wall — although, I’m sure that many programmers were driven up the wall trying to bring more capacity online! Scaling an application under load can be challenging, depending on how it’s designed. If your application is stateful (as in, it maintains some sort of “user state” in-memory that needs to exist for a long period of time), then it can be extremely difficult to add capacity — especially when you need more memory, storage, or CPUs that can only be obtained by physically buying and installing more servers. These challenges led to changing techniques: creating stateless services, and breaking them into smaller units of functionality. If you’ve heard of a “service-oriented architecture” (or SOA), this is where it came into its own — being able to split up a service into different parts, communicating with each other over a network, made it possible to more easily scale your application in response to demand.

ServiceNow Cloud Observability powered by Lightstep helps organizations manage the growing scale and complexity of cloud and cloud-native infrastructure, for complete visibility across the enterprise. For more information, visit: ServiceNow Observability
Learn More
The latest from Lightstep

It is into this world that distributed tracing found its purchase. If your application is split across many individual servers, you need a way to understand the behavior and performance of that entire system, rather than just its individual parts. The failure mode of your application changes — an individual service crashing may result in unexpected or unexplained behavior in a completely different part of the system. When these failures occur, you need more than just a stack trace logged to the offending machine — you need to be able to see the entire request, from beginning to end. Developers came up with a lot of different solutions to this problem — centralized logging, remote debugging, and a variety of other tools to aid in diagnosing problems with distributed systems. Over time, though, the problems continued to compound. Applications became more complex, more distributed. New deployment platforms and tooling — virtual machines, containers, Kubernetes — made it easier to create more complex applications, with more moving parts. The cloud made it possible to easily provision new infrastructure and scale it around the world, and all of this led to even more complexity and confusion.

Let’s look at this in a bit more detail — what are the problems that crop up with these distributed systems and why do we need these tools?

  • Conway’s Law states that, loosely, a system will mirror the way an organization is structured. As software organizations become more complex, naturally their applications will as well. If you work for a very large company, this should be apparent — you may work on a small part of a much larger system that is expected to work in concert with other services written by developers across the country, or even around the world. This makes it challenging to understand how a failure in your service impacts other services — or vice versa.
  • Developers, broadly speaking, don’t want to be tied down to a single language or technology. Some of this is a result of organizational dynamics — for example, integrating a team or product from an acquired company — and some of it is due to the rapidly changing nature of the software industry. New languages like Go and Rust continue to gain adoption and favor with developers wanting to write high performance, maintainable code. Typescript, Python, and other dynamic languages, offer benefits of their own to different developers — especially those working in data science. Web developers have seen exponential growth in the sophistication of their tools as well, as the browser becomes the predominant application runtime. Everyone has different needs and wants to use different tools.
  • Small teams, too, are feeling the pain of distributed systems. Even with a single language and a relatively small application, the rise of “cloud native” software that is designed to be built as a collection of smaller services that rely heavily on external deployment systems and APIs for functionality, puts developers in the pinch of not being able to know what broke and why. This is exacerbated by the velocity of a small team — if you’re deploying new releases of your software, multiple times a day, then you need immediate information and live traceability of what’s happening in production.

One solution to this constellation of complexity is distributed tracing — specifically, distributed tracing built for cloud native, polyglot applications. However, what is distributed tracing? Why do we need it? What are, exactly, the problems caused by these distributed systems? In the next part of this series, I’ll cover the issues that distributed systems can lead to, and why distributed tracing is the backbone of understanding how our systems function.

ServiceNow Cloud Observability powered by Lightstep helps organizations manage the growing scale and complexity of cloud and cloud-native infrastructure, for complete visibility across the enterprise.
Learn More
The latest from ServiceNow
TRENDING STORIES
Austin Parker is the Principal Developer Advocate at LightStep and maintainer on the OpenTracing and OpenTelemetry projects. In addition to his professional work, he's taught college classes, spoken about all things DevOps and Distributed Tracing, and even found time to...
Read more from Austin Parker
ServiceNow sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.