259 points by markhneedham 13 hours ago | 48 comments
fuziontech 9 hours ago
Since we've been using ClickHouse long before this JSON functionality was available (or even before the earlier version of this called `Object('json')` was avaiable) we ended up setting up a job that would materialize json fields out of a json blob and into materialized columns based on query patterns against the keys in the JSON blob. Then, once those materialized columns were created we would just route the queries to those columns at runtime if they were available. This saved us a _ton_ on CPU and IO utilization. Even though ClickHouse uses some really fast SIMD JSON functions, the best way to make a computer go faster is to make the computer do less and this new JSON type does exactly that and it's so turn key!
https://posthog.com/handbook/engineering/databases/materiali...
The team over at ClickHouse Inc. as well as the community behind it moves surprisingly fast. I can't recommend it enough and excited for everything else that is on the roadmap here. I'm really excited for what is on the horizon with Parquet and Iceberg support.
ramraj07 10 hours ago
Snowflake released a white paper before its IPO days and mentioned this same feature (secretly exploding JSON into columns). Explains how snowflake feels faster than it should, they’ve secretly done a lot of amazing things and just offered it as a polished product like Apple.
statictype 41 minutes ago
breadwinner 8 hours ago
[1] https://docs.pinot.apache.org/basics/indexing/star-tree-inde...
zX41ZdbW 7 hours ago
This is incorrect. ClickHouse is designed for distributed setups from the beginning, including cross-DC installations. It has been used on large production clusters even before it was open-sourced. When it became open-source in June 2016, the largest cluster was 394 machines across 6 data-centers with 25 ms RTT between the most distant data-centers.
cvalka 5 hours ago
haolez 8 hours ago
breadwinner 8 hours ago
Spark can do analysis on huge quantities of data, and so can Microsoft Fabric. What Pinot can do that those tools can't is extremely low latency (milliseconds vs. seconds), concurrency (1000s of queries per second), and ability to update data in real-time.
Excellent intro video on Pinot: https://www.youtube.com/watch?v=_lqdfq2c9cQ
listenallyall 8 hours ago
akavi 6 hours ago
SoftTalker 2 hours ago
cyanydeez 6 hours ago
whalesalad 7 hours ago
peteforde 1 hour ago
Can someone briefly explain how or if adding data types to JSON - a standardized grammar - leaves something that still qualifies as JSON?
I have no problem with people creating supersets of JSON, but if my standard lib JSON parser can't read your "JSON" then wouldn't it be better to call it something like "CH-JSON"?
If I am wildly missing something, I'm happy to be schooled. The end result certainly sounds cool, even though I haven't needed ClickHouse yet.
chirau 21 minutes ago
The data remains standard JSON and so standard JSON parsers wouldn’t be affected since the optimizations are part of the storage layer and not the JSON structure itself.
lemax 54 minutes ago