VOOZH about

URL: https://thenewstack.io/two-sizes-fit-most-postgresql-and-clickhouse/

⇱ Two Sizes Fit Most: PostgreSQL and ClickHouse - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-04-13 07:56:26
Two Sizes Fit Most: PostgreSQL and ClickHouse
contributed,sponsor-gitlab,sponsored,sponsored-post-contributed,
Data / Storage

Two Sizes Fit Most: PostgreSQL and ClickHouse

PostgreSQL has been an improvement on its predecessors and successors, while ClickHouse enables qualitatively different approaches to analytics.
Apr 13th, 2022 7:56am by Sid Sijbrandij
👁 Featued image for: Two Sizes Fit Most: PostgreSQL and ClickHouse
Feature image via Pixabay
GitLab sponsored this post.

Since the introduction of System R in 1974, relational databases in general, and SQL databases in particular, have risen to become the dominant approach to data persistence in the industry and have maintained that dominance despite various significant challengers. Though some have rumored the death and decline of traditional relational databases, PostgreSQL has turned out to be an improvement on its predecessors, as well as its supposed successors.

In fact, the open source MySQL database was so ubiquitous that it became part of the LAMP stack (Linux, Apache, MySQL, Perl) that dominated early web development.

The one big exception to this trend is online analytical processing (OLAP), where specialized techniques that can drastically improve the performance of certain workloads have met with use cases that actually require these techniques, with new contenders such as ClickHouse enabling qualitatively different approaches to analytics.

One Size Does Not Fit All

Sid Sijbrandij
Sid is the co-founder, CEO and board chair of GitLab, Inc., the DevOps platform. Sid spent four years building recreational submarines for U-Boat Worx and, while at the Dutch Ministry of Justice and Security (Ministerie van Justitie en Veiligheid), he worked on the Legis project, which developed several innovative web applications to aid lawmaking. He first saw Ruby code in 2007 and loved it so much that he taught himself how to program. In 2012, as a Ruby programmer, he encountered GitLab and discovered his passion for open source. Soon after, Sid commercialized GitLab, and by 2015, he led the company through Y Combinator’s Winter 2015 cohort. Under his leadership, the company has grown with more than 30 million registered users from startups to global enterprises.

As often happens when a technology becomes dominant, it gets applied unthinkingly, even when it may not be appropriate, so all kinds of data was and is being pushed into general-purpose relational databases.

Extreme examples could be found, such as developers creating remote Oracle databases for data sets with a total of five small elements (not columns, pieces of data) or Apple pushing their system logs into an SQLite database (a mistake they later corrected).

Bind10 development started under the premise to solve scaling issues with Bind9 as DNS nameserver, using SQLite as backend. The Internet System Consortium discontinued DNS development in 2014, and the OSS project Bundy remains inactive. PowerDNS focused on performance scaling with MySQL/PostgreSQL early.

In 2005, Michael Stonebraker, database researcher behind Ingres and later PostgreSQL, together with Uğur Çetintemel, penned a paper titled “One Size Fits All”: An Idea Whose Time Has Come and Gone in which they argued that this had gone too far for too long and backed up their argument with benchmark results.

In short, there were many workloads outside the core application of databases, online transaction processing (OLTP), where the general database architectures were outclassed sufficiently that it did not make sense to use them.

It should be noted that Stonebraker and Çetintemel did not argue against relational databases or SQL but against a specific architecture descendent from the original System R and Ingres systems that were, and still are, being used by most general-purpose database systems.

This architecture has the following features:

  • Disk and row-oriented storage and indexing structures.
  • Multithreading to hide latency.
  • Locking-based concurrency control mechanisms.
  • Log-based recovery.

In addition to special-purpose text indexing, the primary use case for which the traditional architecture was proving inadequate was data warehouses, for which column stores were proving to be 10 to 100 times more efficient than traditional row stores.

ClickHouse

The prediction that OLAP database engines would split off from mainstream databases has largely come to pass in the industry. OLAP databases have become a significant category in their own right with Vertica, the commercial offshoot of the original cstore discussed in the paper, as one of the major players.

The practical advantages of these databases for analytical work are, as predicted, substantial enough that having a separate database engine is warranted.

Or even necessary, as was the case for Yandex’s ClickHouse OLAP database, which recently spun out into a startup that just received a $250 million Series B.

The ClickHouse developers wanted to have real-time analytics not with customized data structures, but instead with a generalized database engine query-able with SQL.

Of course, there is a reason this is usually done with customized data structures: Doing so with a generalized database engine was considered impossible, partly because it was impossible with existing engines.

As is often the case, impossible turns out to “just” be a lot of work and some brilliant engineering, and after a few years, the developers had what they sought: a database engine specialized for OLAP but general enough, query-able via SQL and still capable of real-time analytics.

Both the engineering and the benchmark results are impressive, including our own tests (video).

It is significantly faster than PostgreSQL extensions such as Citus or TimescaleDB, and reportedly also faster than Vertica.

The End of An Era?

The 2005 paper left OLTP as the only area where the traditional (disk-based, row-oriented, multithreaded) architecture was viable. Two years later, Stonebraker published “The End of an Architectural Era (It’s Time for a Complete Rewrite),” where he argued that even for OLTP, the existing engines can be surpassed by more than a factor of 10.

The key insight was that long-held assumptions about the relative performance of different components were no longer accurate, and so it turns out that, according to Stonebraker et al, around 90% of the performance budget of the database engine is used not for actually processing data but for overheads such as buffer management and locking.

So if we could remove those overheads, we could make a database engine that is 10 times faster than existing ones. Achieving these gains would require building a database engine that is single-threaded and works in main memory, a radical departure from the existing architecture.

But not an unprecedented one. Main memory capacities have improved many more than a million times since those early databases were designed, so many workloads that used to require persistent storage due to size can now be handled in memory.

For example, even in the early 2000s, Yahoo had a policy that any dataset less than 2GB should live in RAM, not on disk. A little later, event poster architectures, in-process REST and the LMAX exchange with the Disruptor pattern demonstrated that moving from complex multithreaded, disk-based systems to single-threaded RAM-based architectures could yield tremendous benefits in terms of simplicity, reliability and performance.

And that was with 32-bit computing. Nowadays, we can get single servers with tens of terabytes of memory configured for us at the click of a mouse (with a subsequent bill…), so the workloads we can keep in RAM are quite substantial.

Stonebraker and his team built H-Store, an academic prototype, and VoltDB, a commercial product with the associated startup.

It hasn’t taken the database world by storm.

Surprisingly, that isn’t because it doesn’t work as intended. From all reports, it seems like it does work pretty much as advertised.

However, its promises came with tradeoffs, and it seems like those tradeoffs aren’t ideal for most domains. First, though keeping the entire database in RAM at all times is feasible nowadays, it’s probably not a good price/performance tradeoff for most domains, for which most data is cold.

Second, although much higher peak performance is possible, machines are now so fast that the highest possible peak performance is only necessary for very special domains.

Third, with machines now so fast and performance now usually adequate, performance focus has shifted from peak or even throughput to worst-case latencies. With only a single thread accessing the database, a single long query can stall the entire database and cause extremely bad tail-end latencies. So the fastest database using conventional measures of throughput and peak performance may actually require supreme care not to score worst in the performance metric people now care most about.

Last, but not least, requiring a distributed setup, while fine for large installs, creates a high burden for entry-level setups, meaning there is no good on-ramp for this technology.

PostgreSQL

So it looks like the era of the conventionally architected OLTP database has not, in fact, ended. Of course, this shouldn’t be of too much concern to Professor Stonebraker as the contender coming out on top is still his brainchild, just an earlier one. No, not Ingres the early public peer to IBM’s System R, but its successor, PostgreSQL.

PostgreSQL is becoming, or has become, the dominant variant of these conventionally architected databases, according to both industry sentiment and database rankings. It is certainly among the engines not beholden to a major database vendor in one way or another.

At GitLab, we also use PostgreSQL with replication, having moved away from MySQL in 2019, partly because PostgreSQL has features that are important for us, partly because most of our customers were using it anyhow.

GitLab is the most comprehensive, intelligent DevSecOps platform for software innovation. GitLab enables organizations to increase developer productivity, improve operational efficiency, reduce security and compliance risk, and accelerate digital transformation.
Learn More
The latest from GitLab

What About NoSQL?

Although the NoSQL movement of the early 2000s did point to some shortcomings in the dominant databases, the technological prescriptions actually only made sense in rare circumstances.

NoSQL isn’t really a category but more a feature of rich database engines like PostgreSQL.

With increasing compute power and fast storage, high volumes and transaction rates can be handled with traditional database engines, and nontraditional engines can serve relational models using SQL as the interface, see VoltDB and Google’s Spanner, built on top of Bigtable.

There are use cases where a relational database is not needed and a key-value store is sufficient or a JSON document store is called for; for example, JSON types in PostgreSQL handle most of these use cases just fine. Even for very specialized use cases such as text or geographic information system data, modern relational database engines provide direct support.

GitLab is the most comprehensive, intelligent DevSecOps platform for software innovation. GitLab enables organizations to increase developer productivity, improve operational efficiency, reduce security and compliance risk, and accelerate digital transformation.
Learn More
The latest from GitLab
TRENDING STORIES
Sid Sijbrandij is the co-founder, CEO and board chair of GitLab, Inc., the DevOps platform. Sid spent four years building recreational submarines for U-Boat Worx and, while at the Dutch Ministry of Justice and Security (Ministerie van Justitie en Veiligheid),...
Read more from Sid Sijbrandij
GitLab sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, ClickHouse.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.