VOOZH about

URL: https://thenewstack.io/apache-pinot-brings-real-time-analysis-to-columnar-data/

⇱ Apache Pinot Brings Real Time Analysis to Columnar Data - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-12-13 13:11:23
Apache Pinot Brings Real Time Analysis to Columnar Data
Data / Data Streaming / Databases

Apache Pinot Brings Real Time Analysis to Columnar Data

Look out, there is a new open source data analytics database system on the scene, called Apache Pinot — and it is fast.
Dec 13th, 2024 1:11pm by Joab Jackson
👁 Featued image for: Apache Pinot Brings Real Time Analysis to Columnar Data

Apache Pinot began life as a project within LinkedIn in 2013 as a way to run an analysis against a single metric captured across millions of users of all of the services.

The company had already developed Apache Kafka to manage the millions of messages its systems were producing each day. Still, this task wasn’t just a message-passing problem but one of analyzing a single column of data, one similar to  “who viewed each user’s profile?” quickly enough so it would be useful to its users in real-time.

The feature was originally developed on a combination of Elasticsearch and an online transactional processing (OLTP) database, but it involved running thousands of servers concurrently to get the answer, an expensive proposition.

With Pinot, the company’s engineers were able to bring the number of servers needed down to around 75.

Pinot was born to solve the problem of “running analytical queries for hundreds of millions of users at scale, in a low-cost manner,” explained Chinmay Soman, head of product for StarTree, which offers a fully managed cloud native version of Pinot.

Pinot brings “simplification in the data stack,” Soman said in an interview with TNS. “The problem is not new. It’s been solved by many legacy technologies. What Pinot brings is the simplification and the scale for these problems.”

Real-Time Analytics

The technology was quickly picked up by other webscale companies, such as Uber, Google, DoorDash and Stripe. About 1,000 organizations are using the open source version of the software.

Stripe, which does billions of transactions a day, uses Pinot to give payment analysis data back to its merchants: cashflow analysis, late collection payments, revenue-per-user, and so on.

Think of Apache Pinot as a combination of analytical and traditional transactional databases. “It’s built an analytical database but can handle the scale of an OLTP database.” It can do large-scale analysis on Google BigQuery or Snowflake but at a fraction of the time.

Pinot can process hundreds of thousands of SQL-based queries per second with less than 99-millisecond latency, which is a throughput that even MySQL scaled out to thousands of nodes could match, Soman said. And some of the largest Pinot deployments are indexing up to a million events per second.

Pinot was open sourced in 2015 and was first accepted by Apache in 2018. Version 1 of Pinot was released in September 2023 and added the ability to do query-time joins of two tables, as well as the ability to do “upserts,” a combination of UPDATE and INSERT that ensures the latest data is either added or updated to the database.

👁 Pinot connection diagram

A Serving Layer for Data

One can think of Pinot as a serving layer for data. Data can be stored in an object store such as Amazon Web Services‘ Simple Storage Service (S3), and perhaps formatted with Apache Iceberg.

“Kafka is semi-stateful,” Soman explained, “It will store data for one week, but it is not designed to store stateful data. With Pinot, you can store data wherever you want and query individual items.”

Nor is Kafka an analytics engine. Even Apache Flink, often used with Kafka, is designed for more processing and filtering. In fact, all three tools can be used together in a stack referred to as the KFP stack.

On GitHub, StarTree offers a series of recipes on where Pinot would be a good fit for tasks such as:

  • Batch data ingestion
  • Streaming ingestion
  • Upserts
  • Geospatial processing
  • transformation functions
  • Similarity search (AI)

In November, StarTree updated its StarTree Cloud service to include role-based access control (RBAC), pauseless ingestion, schema evolution and data backfill.

TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
AWS, Google and Snowflake are sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Real.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.