First ClickHouse research paper: How do you make a modern data analytics database lightning-fast?

We’re thrilled to announce that the first ClickHouse research paper was accepted and is now published at VLDB.

VLDB—the international conference on very large databases—is widely regarded as one of the leading conferences in the field of data management. Among the hundreds of submissions, VLDB generally has an acceptance rate of ~20%.

This year, VLDB 2024, held in Guangzhou, China, marked the 50th anniversary of the conference, making it one of the longest-running data management conferences.

VLDB 2024 Research paper.002.png

The conference featured 250 paper presentations and 10 accompanying workshops on the latest research and industry trends.

This year’s dominant topic was machine learning in all shapes and forms but also lots of papers in core database areas like query engines, storage, and database theory appeared.

VLDB 2024 Research paper.003.png

A sneak peek into the ClickHouse paper

Our publication is the culmination of a months-long, cross-functional effort to offer readers a concise description of ClickHouse’s most interesting architectural and system design components that make it so fast. And now, for the very first time, it’s available.

In the paper, you’ll learn about:

The history of ClickHouse

When were major features described in this paper introduced to ClickHouse, and what features and enhancements are planned for the future?

VLDB 2024 Research paper.004.png

The architecture of ClickHouse

Layers, components, and execution modes.

VLDB 2024 Research paper.005.png

The storage layer of ClickHouse

On-disk format, data pruning techniques, merge-time data transformations, updates and deletes, idempotent inserts, data replication, and ACID compliance.

VLDB 2024 Research paper.006.png

The query processing layer of ClickHouse

SIMD parallelization, multi-core parallelization, multi-node parallelization, and performance optimization techniques.

VLDB 2024 Research paper.007.png

The integration layer of ClickHouse

Native support for 90+ file formats and 50+ integrations with external systems.

VLDB 2024 Research paper.008.png

Benchmarks

Performance comparison of ClickHouse with other databases frequently used for analytics. Note: lower is better.

VLDB 2024 Research paper.009.png

We hope this whetted your appetite. If you’re interested, you can read the whole paper right here right now (you can scroll through the pages):

@media screen and (max-width: 600px) {
iframe.embedded_pdf, p.embedded_pdf {
display: none;
}
}

ClickHouse at VLDB 2024

Paper presentation

VLDB 2024 Research paper.010.png

Alexey Milovidov, our CTO and the creator of ClickHouse, presented the paper last week in Guangzhou (slides here), followed by a Q&A (that quickly ran out of time!). You can catch the recorded presentation here:

Poster presentation

In addition to the paper presentation, authors of accepted VLDB papers were asked to give a poster presentation.

VLDB 2024 Research paper.011.png

VLDB 2024 Research paper - poster.001.png

Bonus meetup talk

And as luck has it, we also hosted a ClickHouse Guangzhou User Group Meetup just a few days before VLDB. At that meetup, we presented an extended version (slides here) of Alexey’s conference talk:

From coast to coast–the journey of our first research paper

We conclude with a bonus section for readers curious about the backstory of our first research paper.

After ClickHouse became open source in 2016, its popularity grew while the pace of development accelerated as well. The ClickHouse team has been so focused on building the world’s fastest analytics database in the past eight years that there hasn’t been time to publish an academic paper on ClickHouse.

VLDB 2024 Research paper.012.png

However, during a ClickHouse company offsite meeting at the stunning Mediterranean coastline of the French Riviera in October 2023, Tanya Bragin, our VP of Product and Marketing at ClickHouse, raised the idea of finally writing a foundational paper on ClickHouse and submitting it to VLDB taking place this year in Guangzhou, China, in the Guangdong province on the north shore of the South China Sea.

VLDB 2024 Research paper.013.png

We quickly put a small team of authors together, and while some of us had already written research papers as PhD students at university, others were new to this. An intensive writing process kicked off in November 2023 with status calls almost daily, as the paper authors live in different locations. We submitted our final version in April 2024.

VLDB 2024 Research paper.014.png

Summary

VLDB 2024 Research paper.015.png

We had a blast last week! Apart from feasts of delicious Cantonese cuisine, the ClickHouse team spent last week at the special 50th-anniversary VLDB 2024 conference in Guangzhou, China, where our CTO and creator of ClickHouse, Alexey Milovidov, proudly presented our first ClickHouse research paper to the scientific community.

We hope you enjoy reading the paper and watching the recording of Alexey’s presentation. We would love to hear what you think.

Lastly, for your convenience, here is a list with links to the paper and all it’s accompanying material mentioned in this post: