Skip to content

Instantly share code, notes, and snippets.

@remorses
Last active March 24, 2026 15:02
Show Gist options
  • Select an option

  • Save remorses/2e33b4ce7ed28d64e5d38a58d875e768 to your computer and use it in GitHub Desktop.

Select an option

Save remorses/2e33b4ce7ed28d64e5d38a58d875e768 to your computer and use it in GitHub Desktop.

Hybrid ClickHouse + Iceberg Architecture

Overview

Three-tier query architecture with Iceberg on R2 as the source of truth, chDB in a Cloudflare Container as the always-available query engine, and an optional ClickHouse server on Fly.io for fast pre-aggregated dashboards.

┌──────────────────────────────────────────────────────────────────────┐
│  Your app / browser / backend                                        │
│  OTel SDK sends logs/spans/metrics                                   │
└──────────────┬───────────────────────────────────────────────────────┘
               │
               │  OTLP/HTTP (protobuf)
               ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Cloudflare Pipelines → R2 (Iceberg, Parquet)                        │
│  SOURCE OF TRUTH — immutable, self-describing                        │
└──────────────────────────────────────────────────────────────────────┘
               │
       ┌───────┴──────────────────┐
       ▼                          ▼
┌──────────────────┐    ┌─────────────────────────┐
│  chDB            │    │  ClickHouse on Fly.io   │
│  (CF Container)  │    │  (OPTIONAL)             │
│                  │    │                         │
│  No state needed │    │  MergeTree + MVs         │
│  Reads Parquet   │    │  syncs from R2 every 5m  │
│  from R2 direct  │    │  persistent volume       │
│                  │    │                         │
│  1-10s (scan)    │    │  5-50ms (indexed, cached) │
└──────────────────┘    └─────────────────────────┘
       ▲                          ▲
       │                          │
       └────────┬─────────────────┘
                │
                ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Query Router (Cloudflare Worker)                                    │
│                                                                      │
│  ClickHouse healthy + has MV for this query?                         │
│    YES → ClickHouse on Fly.io (5-50ms)                               │
│    NO  → chDB in CF Container (1-10s)                                │
└──────────────┬───────────────────────────────────────────────────────┘
               ▼
         Dashboard UI

Ingestion flow

Ingestion never touches ClickHouse or chDB. Data goes straight to R2.

┌──────────────────────────────────────────────────────────────────────┐
│  Your app / browser / backend                                        │
│                                                                      │
│  OTel SDK sends logs/spans/metrics                                   │
└──────────────┬───────────────────────────────────────────────────────┘
               │
               │  OTLP/HTTP (protobuf over HTTPS)
               │  POST https://pipelines.cloudflare.com/v1/logs
               ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Cloudflare Pipelines                                                │
│                                                                      │
│  Receives OTel events, batches them, writes Parquet files            │
│  into Iceberg table format with R2 Data Catalog as the catalog       │
│                                                                      │
│  Partitioned by: day (e.g. date=2026-03-24/)                         │
└──────────────┬───────────────────────────────────────────────────────┘
               │
               │  S3-compatible API (HTTPS PUT)
               │  Writes Parquet files + updates Iceberg manifests
               ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Cloudflare R2                                                       │
│                                                                      │
│  r2://my-bucket/otel/                                                │
│    ├── metadata/          ← Iceberg catalog (manifests, snapshots)   │
│    └── data/                                                         │
│         ├── date=2026-03-22/  part-00001.parquet                     │
│         ├── date=2026-03-23/  part-00001.parquet                     │
│         └── date=2026-03-24/  part-00001.parquet  part-00002.parquet │
│                                                                      │
│  THIS IS THE SOURCE OF TRUTH. Immutable. Self-describing.            │
│  If everything else dies, data is safe here.                         │
└──────────────────────────────────────────────────────────────────────┘

If ClickHouse on Fly.io exists, it syncs from R2 on a schedule:

┌──────────────────────────────────────────────────────────────────────┐
│  ClickHouse on Fly.io (OPTIONAL)                                     │
│  (persistent volume for MergeTree, but rebuildable from R2)          │
│                                                                      │
│  Sync job every 5 min:                                               │
│  INSERT INTO otel_logs SELECT ... FROM iceberg_db.otel_logs          │
│  WHERE timestamp > last_sync_timestamp                               │
│         │                                                            │
│         │  S3-compatible API (HTTPS GET)                              │
│         ▼                                                            │
│  Reads Parquet from R2, inserts into local MergeTree                 │
│                                                                      │
│  As rows land in otel_logs, materialized views fire automatically:   │
│                                                                      │
│  otel_logs ──┬──► mv_errors          (errors per hour per type)      │
│              ├──► mv_pageviews       (views per day per page)        │
│              ├──► mv_web_vitals      (LCP/CLS p75/p95 per page)     │
│              ├──► mv_api_latency     (p50/p99 per route per hour)    │
│              └──► mv_funnel          (conversion steps per day)      │
│                                                                      │
│  TTL: raw data auto-deletes after 30 days                            │
│  MVs: aggregated data kept longer (90-365 days, tiny)                │
└──────────────────────────────────────────────────────────────────────┘

Query flow

┌───────────────────┐
│  Dashboard UI     │
└────────┬──────────┘
         │
         │  HTTPS POST /query
         │  { "query": "top_errors", "timeRange": "7d" }
         ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Query Router (Cloudflare Worker)                                    │
│                                                                      │
│  1. Map dashboard query → SQL                                        │
│     "top_errors" → SELECT error_type, count() ...                    │
│                                                                      │
│  2. Health check: is ClickHouse reachable?                           │
│     Cache health status in Worker global (refresh every 30s)         │
│                                                                      │
│  3. Route decision:                                                  │
│     ┌────────────────────────────────────────────────────────┐       │
│     │  if (clickhouseHealthy && hasMaterializedView(query))  │       │
│     │    → rewrite SQL to use MV table name                  │       │
│     │    → proxy to ClickHouse HTTP (Fly.io)                 │       │
│     │                                                        │       │
│     │  else                                                  │       │
│     │    → send SQL to chDB Container                        │       │
│     │    → chDB reads Parquet from R2 directly               │       │
│     └────────────────────────────────────────────────────────┘       │
│                                                                      │
│  4. Normalize response format (both return same row shape)           │
│                                                                      │
└─────────┬──────────────────────────────────┬─────────────────────────┘
          │                                  │
          │ Fast path                        │ Default path
          │ ClickHouse HTTP                  │ chDB Container
          │ port 8443 (TLS)                  │ CF Container HTTP
          │ 5-50ms                           │ 1-10s
          ▼                                  ▼
┌──────────────────┐              ┌──────────────────────────────────┐
│  ClickHouse      │              │  chDB in Cloudflare Container    │
│  on Fly.io       │              │                                  │
│  (OPTIONAL)      │              │  Stateless HTTP service          │
│                  │              │  POST /query { sql: "..." }      │
│  Reads from:     │              │       │                          │
│  local MergeTree │              │       ▼                          │
│  materialized    │              │  chDB engine (in-process)        │
│  views on NVMe   │              │       │                          │
│                  │              │       │ S3 API (HTTPS GET)       │
│                  │              │       │ same region, fast         │
│                  │              │       ▼                          │
│                  │              │  Reads Parquet from R2            │
│                  │              │                                  │
│                  │              │  No disk, no state, no persist   │
│                  │              │  Scale to zero when idle          │
│                  │              │  Full ClickHouse SQL dialect      │
└────────┬─────────┘              └───────────────┬──────────────────┘
         │                                        │
         │ (sync from R2 every 5 min)             │ (reads on every query)
         ▼                                        ▼
┌──────────────────────────────────────────────────────────────────────┐
│  Cloudflare R2 — Iceberg tables (Parquet)                            │
│  SINGLE SOURCE OF TRUTH                                              │
└──────────────────────────────────────────────────────────────────────┘

chDB Container detail

chDB runs the full ClickHouse query engine in-process. No server, no persistent state. Perfect for Cloudflare Containers' ephemeral model.

┌─────────────────────────────────────────────┐
│  Cloudflare Container                        │
│                                              │
│  Tiny HTTP server (Bun / Node)               │
│    POST /query { sql: "SELECT ..." }         │
│         │                                    │
│         ▼                                    │
│    chDB engine (linked in-process)           │
│         │                                    │
│         │ S3 API — reads Parquet from R2     │
│         │ (same Cloudflare region, low lat)   │
│         ▼                                    │
│    Returns JSON rows                         │
│                                              │
│  No disk, no state, no persistence           │
│  Container can die and restart any time       │
│  Scale to zero, spin up on first query        │
│  Cold start: ~2-5s                            │
│  Query: 1-10s (Parquet scan from R2)          │
│  Full ClickHouse SQL (JOINs, window funcs,    │
│  subqueries, DISTINCT, UNION — all work)      │
└─────────────────────────────────────────────┘

Why chDB over R2 SQL as the fallback

Both ClickHouse server and chDB speak the same SQL dialect. The Router Worker doesn't need to translate between two languages.

R2 SQL chDB
JOINs No Yes
Subqueries No Yes
Window functions No Yes
DISTINCT No Yes
UNION No Yes
SQL dialect DataFusion (Postgres-like) ClickHouse (same as Fly server)
Functions 190 1000+ (full ClickHouse)
Query speed on Parquet 3-30s 1-10s (faster engine)
Persistent state needed No No
Scale to zero Yes Yes (CF Container)

Three deployment tiers

Same codebase at all tiers. Upgrade by adding infrastructure, not changing code.

Tier 1: chDB only (zero server infra)

  App → Pipelines → R2
                     │
                     ▼
              chDB in CF Container
              reads Parquet from R2
              1-10s queries
              full ClickHouse SQL
              scale to zero
                     │
                     ▼
               Dashboard

Tier 2: Add ClickHouse on Fly.io (fast dashboards)

  App → Pipelines → R2
                     │
                ┌────┴────┐
                ▼         ▼
           chDB       ClickHouse ←── syncs from R2 every 5m
           (fallback)  (5-50ms, MVs, indexes)
                │         │
                └────┬────┘
                Router Worker
                     │
                 Dashboard

Tier 3: Multi-region ClickHouse + chDB fallback

  App → Pipelines → R2
                     │
          ┌──────────┼──────────┐
          ▼          ▼          ▼
        CH (US)    CH (EU)    chDB
        Fly.io     Fly.io     (fallback)
          │          │          │
          └────┬─────┴────┬─────┘
          Router Worker (routes to nearest healthy CH)
               │
           Dashboard

Protocol summary

Hop Protocol Required?
App → Pipelines OTLP/HTTP (protobuf) Always
Pipelines → R2 S3 API (internal) Always
Dashboard → Router Worker HTTPS REST Always
Router → chDB Container HTTPS (container HTTP) Always (default path)
chDB → R2 S3 API over HTTPS Always (on every query)
Router → ClickHouse ClickHouse HTTP (port 8443) Optional (Tier 2+)
ClickHouse → R2 (sync) S3 API over HTTPS Optional (Tier 2+)

Failure & recovery

Normal operation (Tier 1, chDB only):
  Dashboard → Router → chDB Container → R2 (1-10s) ✓
  No sync, no state, nothing to break

Normal operation (Tier 2, with ClickHouse):
  Dashboard → Router → ClickHouse MV (5-50ms) ✓
  Sync job pulls from R2 every 5 min ✓

ClickHouse disk dies:
  Router detects CH unhealthy (health check fails)
  Dashboard → Router → chDB Container → R2 (1-10s) ✓  ← auto fallback
  CH restarts, backfills from R2 (5-30 min)
  Dashboard → Router → ClickHouse MV (5-50ms) ✓  ← back to normal

ClickHouse process crashes:
  Fly.io restarts it in ~5-10s
  MergeTree data intact on persistent volume
  During restart: Router → chDB (1-10s) ✓
  After restart: Router → ClickHouse (5-50ms) ✓

chDB Container cold start:
  First query after idle: ~2-5s container boot + query time
  Subsequent queries: 1-10s (container stays warm)

R2 outage (very rare):
  chDB queries fail (can't read Parquet)
  ClickHouse still works from local MergeTree data ✓
  No new data until R2 recovers

Query latency comparison

Query chDB only (Tier 1) With ClickHouse (Tier 2)
Error count per hour, 7 days 2-5s 10ms
Top 10 errors 2-6s 30ms
Unique visitors per day 5-15s 5ms (HLL sketch)
Top pages by views 3-10s 20ms
Errors for user_id = X 2-6s 5ms (indexed)
Bounce rate per page 8-20s 10ms (pre-aggregated)
Ad-hoc JOIN query 3-15s 3-15s (no MV, falls to chDB)
Ad-hoc historical (6 months) 10-30s 10-30s (falls to chDB)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment