![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Today, we’re venturing into the riveting realm of databases. Now, I can hear you sigh: “Another database to learn? Seriously?” But before you sprint for the hills, allow me to introduce you to ClickHouse, the Sonic the Hedgehog of the database multiverse.
ClickHouse is an open source, column-oriented database management system (DBMS) designed for running real-time analytical queries and updates on mammoth datasets. And by “mammoth” I mean “if you printed it out, you’d probably need a forest’s worth of paper” big.
To uncover this mystery, let’s peer into its architecture.
ClickHouse is like a powerful system used by university administrators. As a student, you may want to quickly check your grade on a single assignment or test, which is a simple, straightforward transaction. However, the administrators need to perform more complex operations. They’re calculating class averages, evaluating grade distributions for the entire semester, analyzing patterns in student performance across all subjects and more. To accomplish these tasks, they’re not just looking at one student’s grades, but rather, they’re analyzing vast volumes of data from all students.
Did I mention ClickHouse loves big data? This database system scales beautifully across clusters, so your data can grow bigger than a reality TV star’s ego, and ClickHouse would still handle it without breaking a sweat. Need to add more nodes to your cluster? No problem. Want to keep your data replicated for higher availability? ClickHouse says, “Sure, why not?”
At the heart of ClickHouse’s distinctiveness is its true column-oriented DBMS design. This unique architecture ensures compact storage with no extra data accompanying the values, a trait that notably enhances processing speed. Supporting constant-length values, ClickHouse guarantees efficient space utilization, reinforcing its speedy performance. Notably, ClickHouse’s capacity to handle hundreds of millions of rows per second surpasses systems like HBase and Cassandra, setting a new industry standard.
The uniqueness of ClickHouse also shines in its flexible functionality as a database management system. Rather than being confined to a single database, ClickHouse enables the real-time creation of tables and databases, data loading and query execution. This adaptability ensures seamless database operations without the need for server reconfiguration or restarts.
Additional features that amplify ClickHouse’s uniqueness include:
In essence, the culmination of these features makes ClickHouse a potent, flexible and efficient system, uniquely positioned to handle large-scale, real-time data processing needs.
Just to prove I’m not pulling your leg here, let’s look at some real-world use cases.
Yes, the folks who practically hold up half the internet use ClickHouse for real-time query analytics on terabytes of data every single day! Cloudflare uses ClickHouse to manage real-time DNS query analytics for up to 6 million requests per second, which involves processing terabytes of data. From an architectural perspective, ClickHouse’s column-oriented database design plays a pivotal role.
The new architecture includes:
Server diagram for Cloudflare’s central data center based on Clickhouse. Source: https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/
As you can see, the architecture of the new data pipeline is simpler and fault tolerant. It provides analytics for all of Cloudflare’s more than 7 million customers’ domains totaling more than 2.5 billion monthly unique visitors and over 1.5 trillion monthly page views.
The world’s second-largest web analytics platform, Yandex.Metrica uses ClickHouse to handle over a trillion rows of data. A trillion! Yandex uses Clickhouse for:
These use cases and the colossal amount of data processed speak volumes about ClickHouse’s capabilities, but the intriguing part is how ClickHouse handles this scale. The underlying architectural design of ClickHouse, including its distributed storage and computing capabilities, allows Yandex to handle such a large amount of data with ease. The flexible sharding and replication strategies implemented by ClickHouse ensure data reliability and high availability, key elements in Yandex’s high-volume, high-velocity data scenario.
Let’s take a look at how ClickHouse compares to PostgreSQL for dealing with a typical workload in clickstream and traffic analysis, web analytics, machine-generated data, structured logs and web event data. This benchmark scenario reflects the typical queries in ad-hoc analytics and real-time dashboards. The dataset used was acquired from the actual traffic recording of one of the world’s largest web analytics platforms. Both ClickHouse and PostgreSQL systems have been optimally tuned and deployed on a c6a.4xlarge server with 500GB gp2 storage.
The benchmark data has been obtained from the ClickHouse Benchmark.
This parameter refers to the time taken to load the dataset into the database.
The benchmark shows that ClickHouse loads the data significantly faster than PostgreSQL. Specifically, ClickHouse is approximately 23 times faster in loading data compared to PostgreSQL.
This parameter refers to the space occupied by the data in the database.
ClickHouse also proves to be more storage-efficient. The benchmark indicates that ClickHouse uses about 8.5 times less storage compared to PostgreSQL for the same dataset.
Based on the ClickHouse benchmark, ClickHouse significantly outperforms PostgreSQL in data load time and storage size efficiency when optimized and deployed under the same conditions. It is important to note that these results pertain to a specific analytical scenario and real-world results might vary based on the specific use case and tuning of the systems.
You can also check out how Clickhouse compares to other databases in the benchmark report.
Think you might be ready to try ClickHouse? There are a few ways to start, most fundamentally with the open source version.
Prefer to avoid hosting and scaling yourself? Tinybird, a tool that developers affectionately dub “ClickHouse++” takes ClickHouse’s already robust capabilities, offers serverless hosting and adds even more developer-focused goodness to the mix, including
If you’re a data engineer or a software developer constantly juggling large volumes of data and crunching numbers for real-time analytics, ClickHouse is your best bet. Once you’ve tasted the speed of ClickHouse (and Tinybird), there’s no going back.
Q: Is ClickHouse suitable for online transaction processing (OLTP) systems?
No, ClickHouse is designed primarily for online analytical processing (OLAP). It’s perfect for real-time analytical queries on large data volumes, not transactional systems.
Q: How does ClickHouse manage data redundancy and availability?
ClickHouse supports asynchronous multimaster replication. You can configure it to keep copies of your data on different nodes for higher availability.
Q: What language does ClickHouse use for queries?
ClickHouse uses SQL for queries. So, if you’re familiar with SQL, you’ll feel right at home.
Q: How does Tinybird enhance ClickHouse’s functionality?
Tinybird is a serverless platform that lets you build real-time analytics APIs on top of ClickHouse at high speed. It provides a much more ergonomic developer experience with features designed for real-time app development. So, it’s like adding an extra layer of speed and convenience to your ClickHouse setup.