Welcome to the September ClickHouse newsletter, which will round up what’s
happened in real-time data warehouses over the last month. This month, we have
the much-awaited JSON data type, our 1st ClickHouse research paper, a Private
Preview of BYOC on AWS, better PyPi stats with Ibis, and more!
Inside this issue
- Featured community member
- Upcoming events
- VLDB 2024: First ClickHouse research paper
- How Reco leverages advanced analytics to detect sophisticated SaaS threats
- 24.8 LTS release
- Better PyPI stats with Ibis, ClickHouse, and Shiny
- ClickHouse Cloud: BYOC AWS in Private Preview
- Quick reads
- Post of the month
Featured community member
beehiiv is a newsletter platform that helps creators, publishers, and
businesses build and grow their email audiences. They collect events capturing
every time an email is processed, every time it lands in an inbox, every time
it’s deferred, every time it’s bounced, every time you open it, every time you
click a link, and so on.
Eric has worked at beehiv for just over a year and was responsible for moving data operations from Postgres to ClickHouse Cloud. There’s a user story on the work he and his team did, and he also presented at the
New York meetup in the summer.
Eric previously worked as a Tech Lead at Arthur.ai, where he architected and built the company’s data ingestion pipeline, storage, and much of the backend infrastructure.
Upcoming events
Global events
- ClickHouse Cloud Live Update – Sep 24
-
24.9 release community call
– Sep 26
Free training
-
Query optimization with ClickHouse workshop
– Sep 25
-
In-Person ClickHouse Workshop
– Singapore – Oct 3
Events in EMEA
-
Meetup in Tel Aviv
– Sep 22 -
Meetup in Madrid
– Oct 22 -
Meetup in Barcelona
– Oct 29 -
Meetup in Oslo
– Oct 31 -
Meetup in Ghent
– Nov 19 -
Meetup in Dubai
– Nov 21 -
Meetup in Paris
– Nov 26
Events in Asia Pacific
-
DataEngBytes – Sydney
– Sep 24
-
DataEngBytes – Perth
– Sep 27 -
DataEngBytes – Melbourne
– Oct 1 -
DataEngBytes – Auckland
– Oct 4 -
Big Data & AI World Asia
– Oct 10 -
Cloud Excellence Summit NSW
– Oct 17 -
Data & AI Summit VIC
– Oct 22
VLDB 2024: First ClickHouse research paper
It’s been almost a year in the making, and at the end of August, we presented
our first research paper at VLDB 2024.
VLDB—the international conference on very large databases—is widely regarded
as one of the leading conferences in data management. VLDB generally has an
acceptance rate of ~20% among the hundreds of submissions.
The paper concisely describes ClickHouse’s most interesting architectural and
system design components, which make it so fast. We’ve embedded the PDF of the
paper in the blog post linked below.
How Reco leverages advanced analytics to detect sophisticated SaaS threats
Reco is a full-lifecycle SaaS security solution that uses ClickHouse as the
foundation of its advanced analytics system. Nir Barak explains how ClickHouse
gives them a holistic view of data across multiple layers and allows them to
detect outliers and anomalies.
24.8 LTS release
The 24.8 release is here, and it has an exciting feature that I (and many of
you) have been waiting for – the new JSON data type!
It’s in experimental mode, but that didn’t stop us from taking it through its
paces while exploring structured data of events in football/soccer matches.
This release also introduces the TimeSeries table engine, which can store
Prometheus data, and a new Kafka table engine that supports exactly-once event
processing.
Better PyPI stats with Ibis, ClickHouse, and Shiny
ClickPy
is a ClickHouse-backed application that analyzes the download of Python
packages published on PyPI. In addition to the front-end application, you can
also query the underlying data, which is exactly what Cody Peterson has
done.
Cody shows how to connect to ClickPy using
Ibis
and then explores the seasonality of downloads of the clickhouse-connect
package by day of the week and month. The results are visualized using
plot.ly, and Cody then puts everything together into a Shiny
application.
ClickHouse Cloud: BYOC AWS in Private Preview
ClickHouse Cloud has been
running for almost two years and supports all the major cloud platforms, AWS, Azure, and GCP. So far, it’s
been a SaaS offering that runs entirely on ClickHouse’s cloud account, which
made it a non-starter for users with strict data residency and compliance
requirements.
We’re therefore happy to announce the Private Preview release of Bring Your
Own Cloud (BYOC) on AWS. BYOC is a fully managed ClickHouse Cloud service
deployed to your AWS account.
The waiting list is now open, so be sure to sign up, and we’ll contact you to
set you up.
Quick reads
-
Heng Ma shows how to
build a system that enriches shopping cart events with product details. Using Rising Wave, a Kafka event data stream is joined with a product
catalog, and the enriched events are written to ClickHouse using the Rising
Wave-ClickHouse connector. -
Auxten released
a new version of chDB, the in-process embedded version of ClickHouse, that can query Pandas DataFrames 87 times faster than the initial version. -
I loved
this video
from Jess Archer’s talk at Laracon US 2024. It is an excellent introduction
to ClickHouse and shows where it’s better than MySQL. -
Sai Srirampur
shares his tips for ClickHouse data modeling aimed at Postgres users. He explains various strategies to handle duplicates when using the
ReplacingMergeTree table engine, how to handle null values, and the
importance of ordering keys
Post of the month
Our favorite post this month was by
Michael Driscoll about
the new JSON data type: