VOOZH about

URL: https://thenewstack.io/google-that-code-how-sourcegraph-simplifies-development/

⇱ Google That Code: How Sourcegraph Simplifies Development - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-12-22 06:28:57
Google That Code: How Sourcegraph Simplifies Development
sponsor-datastax,sponsored-post-contributed,
Data / Software Development

Google That Code: How Sourcegraph Simplifies Development

With the insight that big code is big data, we can use the power of knowledge graphs to help us to search and understand any codebase in the world.
Dec 22nd, 2022 6:28am by Sam Ramji
👁 Featued image for: Google That Code: How Sourcegraph Simplifies Development
Image by oatawa on Shutterstock.
DataStax sponsored this post.

We often think that modern computing is divided between code and data. Functionally, this makes sense as we look at any given app. But when we look at a standard microservices architecture, the breadth and depth of the code itself is more than a few files of text — it becomes a dataset of its own. Our ability to manage codebases is limited by our understanding of them, and it is time for us to use the tools built for big data and apply them to the era of big code.

The most famous big data tool is search. The power of search saves precious time. Sourcegraph co-founder and chief technology officer Beyang Liu understood this when he set out to introduce it to the developer world. He knew the pain of entering a new company and learning a new codebase.

Understanding different people’s opinions and styles of code can be overwhelming, and codebases grow over time in unpredictable and confusing ways. So Liu built Sourcegraph, a tool to help developers be more productive. It’s essentially a search engine for code.

With the insight that big code is big data, we can use the power of knowledge graphs to help us to search for and understand any codebase in the world.

DataStax, an IBM company, provides the real-time vector data tools that Gen AI apps need, with seamless integration with developers’ stacks of choice.
Learn More
The latest from DataStax

I recently spoke with Liu about his journey with Sourcegraph and his long-term goals (to hear the full conversation, listen to the Open Source Data podcast).

What Is Sourcegraph?

Sourcegraph is a free, open source technology that enables you to search across your entire codebase. Its primary goal is to help tackle the most significant part of a software engineer’s job: understanding existing code.

It does so by:

  • Searching everything at once without cloning and locally searching.
  • Easily sharing essential lines of code.
  • Managing with integrated development environment (IDE)-inspired features.

“For most software engineers, the biggest part of your job is not writing new code. It’s making sense and understanding all of the code that already exists,” Liu said.

There are two foundational components to Sourcegraph: the search component and the global reference graph.

Search Component

Like most search engines, Sourcegraph’s search component takes a query and presents the best results. Say that a developer is looking for to-dos in a specific repository. The developer can enter a search query like this repo:facebook/react content:TODO and it will search for any to-dos in the specified directory. You can see a real example of searching the Facebook React-native repository here. One of the key technologies to make this possible is an index format optimized for searching code.

While he was an engineering intern on Google Apps’ backend team in 2010, Liu was inspired by his use of Google Code Search — it’s what led him to employ the index format. Another thing that caught his eye was Russ Cox’s work on the initial implementation of Google’s internal code search and Han-Wen Nienhuys’s re-implementation of it in the form of an open source library called Zoekt.

“The centerpiece of that experience was this Code Search engine that indexed all of the codes at Google and made it accessible to every developer, whether you were an intern or a very senior, Jeff Dean-level engineer,” Liu said.

Global Reference Graph

The global reference graph helps you to understand the codebase and perform functions such as “go to definition” and find references, which requires mapping the entire codebase to take you to the right place.

Sourcegraph uses a range of compiler libraries and open protocols to achieve this and has its own protocols, such as Source Lib and SCIP, which are more suitable for Sourcegraph’s requirements.

“It’s all about providing this language-agnostic interface to these language-specific indexers that use compiler knowledge to build the global reference graph,” Liu said.

From Chaos to Action

Sourcegraph started when Liu got his first job out of school, at Palantir Technologies. He faced one of the problems everyone has faced when starting a new job as a software engineer:

“I got drop-shipped into this large, complex codebase that had been through multiple owners,” Liu recalled. “It was a bit messy, and I remember, at the end of that first month or so, looking back and asking myself, ‘What have I accomplished here? I’ve been spending all my time just trying to make sense of what’s going on in this code and figuring out why it’s written the way it is. It seems like more of my job is just exploring the existing code and figuring out how the relatively small piece I’m trying to add fits into that broader picture.’”

Liu’s time at Google exposed him to a suite of internal developer tools, one of which was Google Code Search, which made all the code at Google accessible. This experience, along with the onboarding pain at Palantir, drove Liu to create something that would help other software engineers avoid the same issues.

Conversations with Quinn Slack, a colleague of Liu’s at Palantir, about creating a tool for a universal code search turned into action, out of which came SourceGraph.

Sourcegraph’s Future

In 2011, Marc Andreessen wrote about how software is eating the world. The signs are everywhere: from the food you order, to booking a ride, to controlling your house’s heating.

But Liu thinks that we’re only seeing the tip of the iceberg. He said that understanding code will become an everyday thing.

He compared it to literacy, saying, “We once lived in a world where being able to read and write was limited to a very small, elite portion of society, limiting the extent to which human civilization could advance.”

When code powers almost everything in our life, understanding it will become a universal requirement, Liu said. This thought is what fueled Liu’s passion for building Sourcegraph. Creating a search engine for code will grant people access to the vast open source ecosystem — all with a simple search query.

DataStax, an IBM company, provides the real-time vector data tools that Gen AI apps need, with seamless integration with developers’ stacks of choice.
Learn More
The latest from DataStax
TRENDING STORIES
Sam Ramji is chief strategy officer of DataStax. A 25-year veteran of the Silicon Valley and Seattle technology scenes, Sam has helped build two multibillion dollar markets (API management at Apigee and enterprise service bus at BEA Systems) and redefined...
Read more from Sam Ramji
DataStax sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Sourcegraph, Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.