If you've made it this far, you're probably following my journey building Userloom.io. Things are going well so far. In the last post, I explained why ClickHouse was the obvious choice for analytics storage. Now it's time to actually deploy the thing.
I decided to go self-hosted for two reasons:
- Control. I want full control over cluster topology, configuration tuning, and scaling decisions.
- Learning. I want to understand how to architect and deploy a distributed database cluster myself. No black boxes.
So the goal for the week was clear: build a production-ready ClickHouse cluster with complete operational control.
Then the R&D rabbit hole began.
Why I Skipped ClickHouse Cloud
Let me get this out of the way first. ClickHouse Cloud is a great product. But I ruled it out for this project.
I wanted full control over the cluster topology, configuration tuning, and scaling decisions. I didn't want to wait behind support tickets to change settings or debug issues.
Bare metal (or VMs) it is.
Why Terraform + Ansible Over Kubernetes
Here's where I'll be completely honest: I don't have deep experience with clusters or advanced DevOps. So I turned to everyone's best friends, Claude and ChatGPT, to help me think through options.
My first instinct was to learn Kubernetes. I have plenty of experience with Docker and containers, so K8s felt like the natural next step. But after some research, I realized it wasn't the right choice for this stage.
Info
At small scale (5 nodes), Kubernetes adds significant overhead without proportional benefits. The 2-4GB RAM per node for K8s components alone would eat into resources better spent on ClickHouse itself.
Here's the comparison that convinced me:
| Factor | Kubernetes | Terraform + Ansible |
|---|---|---|
| Operational complexity | High (need to manage K8s itself) | Low (direct server management) |
| Resource overhead | 2-4GB RAM per node for K8s components | Near zero |
| ClickHouse I/O performance | Tricky with PVCs | Native disk access, optimal |
| Debugging | More layers to troubleshoot | SSH in, check logs, done |
| Learning curve | Steep if not already using K8s | Moderate |
| Time to production | Longer | Faster |
The decision became obvious. Terraform + Ansible gives me infrastructure-as-code discipline without the Kubernetes tax.
Why Both Terraform and Ansible?
Terraform is declarative for cloud resources. Ansible is procedural for server configuration. They complement each other, though there's some overlap.
I could use Ansible alone. If you're comfortable creating servers manually in Hetzner's UI (takes 10 minutes), Ansible handles the rest just fine.
But Terraform adds value when you want to:
- Tear down and recreate environments reliably
- Manage DNS, firewalls, private networks, and load balancers as code
- Maintain multiple environments (staging, production)
- Review infrastructure changes in pull requests
I'm planning to replicate this environment later, so investing in Terraform now reduces future headaches. If we're doing this, let's do it right.
The Cluster Architecture
After a week of research, here's the architecture I landed on:
Five nodes total:
- Nodes 1-4: ClickHouse servers (2 shards × 2 replicas)
- Nodes 1-3: Also run ClickHouse Keeper for consensus
- Node 5: Dedicated monitoring stack (Prometheus, Grafana, Alertmanager, backups)
Everything lives inside a private network. The only public exposure is through the load balancer with SSL termination.
SSH Access
SSH (port 22) is restricted to your IP only. Never expose SSH to the public internet on production infrastructure.
Understanding ClickHouse Keeper and Quorum
Info
ClickHouse Keeper is the modern replacement for ZooKeeper, built directly into ClickHouse. It uses the same Raft consensus protocol but eliminates the need to manage a separate Java-based ZooKeeper cluster.
ClickHouse Keeper uses a consensus protocol called Raft. It needs a quorum (majority) to make decisions.
Here's the math:
| Nodes | Quorum | Survives N Failures |
|---|---|---|
| 1 | 1 | 0 (no fault tolerance) |
| 2 | 2 | 0 (both needed, pointless) |
| 3 | 2 | 1 node can fail ✓ |
| 4 | 3 | 1 node can fail (same as 3) |
| 5 | 3 | 2 nodes can fail |
Why 3 Keeper Nodes?
Three is the minimum for fault tolerance (survives 1 failure), and odd numbers avoid split-brain scenarios. Four nodes provides no extra fault tolerance over three. For my scale, five Keeper nodes would be overkill.
What Keeper actually does:
- Coordinates which replica is the leader for writes
- Tracks replication state across nodes
- Manages distributed DDL operations (CREATE/ALTER TABLE)
- Stores merge scheduling metadata
Sharding vs Replication: What's the Difference?
This confused me at first. Let me break it down simply.
Sharding = Horizontal Scaling (Split Your Data)
Sharding distributes data across multiple servers. Each shard holds a subset of the total data, partitioned by a shard key (like user_id, tenant_id, or date).
When to add shards:
- Single node can't hold all your data
- Query performance degrades due to data volume
- Write throughput exceeds single node capacity
Replication = High Availability (Copy Your Data)
Replication creates identical copies of each shard on multiple nodes. Writes go to any replica, then sync automatically. If one replica dies, others keep serving.
Benefits:
- Zero data loss if a node fails
- Read scaling (distribute reads across replicas)
- Online maintenance (take one replica offline without downtime)
My Setup: 2 Shards × 2 Replicas
Here's what this configuration provides:
- Can lose any 1 node without data loss
- Can lose 1 node per shard and stay online
- 2x write throughput (parallel writes to shards)
- 4x read throughput (all 4 nodes serve reads)
- 2x storage capacity (data split across shards)
How Distributed Queries Work
The table setup uses two layers:
-- Local table: lives on each shard, holds that shard's data only
CREATE TABLE events ON CLUSTER 'analytics' (
tenant_id UInt32,
event_time DateTime,
event_type String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(event_time)
ORDER BY (tenant_id, event_time);
-- Distributed table: virtual table that queries ALL shards
CREATE TABLE events_distributed ON CLUSTER 'analytics' AS events
ENGINE = Distributed('analytics', 'default', 'events', tenant_id);
When you query events_distributed, here's what happens:
- Query hits any node through the load balancer
- That node fans out to one replica per shard
- Each shard processes its portion in parallel
- Results merge and return to client
The client doesn't need to know anything about shards or replicas. It just queries events_distributed and ClickHouse handles the rest.
Key Takeaways for Self-Hosted ClickHouse
If you're considering self-hosted ClickHouse:
- Skip Kubernetes at small scale. Terraform + Ansible gives you infrastructure-as-code without the overhead.
- Use 3 Keeper nodes. It's the minimum for fault tolerance, and 4 nodes doesn't improve anything.
- Understand the difference between shards and replicas. Shards split data for scale. Replicas copy data for availability.
- Start with 2 shards × 2 replicas. It's a solid foundation that survives node failures and scales reads/writes.
- Keep everything in a private network. Expose only what you need through a load balancer.
The whole setup took about a week of learning and iteration. Most of that time was understanding concepts, not writing code. Once the architecture clicked, the Terraform and Ansible configs came together quickly.
Next up: Building the JavaScript SDK for client-side event tracking and batching.