Why We're Obsessing Over Signal Capture

Userloom has analytics. Good analytics. But we're not building it so you can watch numbers go up. We're building it so you know exactly when to reach out, and never miss the moment.

Behavioral email. In-app surveys. That's what saves users from churning. But those features are only as good as the signals feeding them. If we don't capture behavior perfectly, the emails go out at the wrong time. The surveys miss the moment. The whole system falls apart.

That's why we're obsessing over the foundation. Analytics isn't the destination. It's the engine that powers everything else.

The Problem With Existing Tools

Here's what frustrated us about Mixpanel, Amplitude, and even Segment when building B2B products:

Three calls to do one thing. Want to identify a user and their company? That's an identify() call, a set_group() call, and maybe a get_group().set() call. Three network requests. Three chances to fail. Three opportunities for your data to end up in a weird partial state.

I have debugged too many "why is this user not linked to their company?" issues. It's always a race condition or a dropped request.

When your behavioral triggers depend on clean data, "mostly works" isn't good enough.

One Call To Rule Them All

So Userloom's $identify does everything at once:

{
  "batch": [{
    "id": "evt_001",
    "event": "$identify",
    "distinct_id": "anonymous_session_abc",
    "timestamp": "2024-12-31T10:00:00Z",
    "properties": {
      "traits": {
        "$user_id": "user_123",
        "$email": "jane@acme.com",
        "$name": "Jane Smith",
        "$created_at": "2024-12-31T10:00:00Z",
        "$group": {
          "$group_id": "acme_inc",
          "$group_type": "company",
          "$name": "Acme Inc",
          "$created_at": "2024-01-01T00:00:00Z",
          "$plan": "enterprise"
        }
      }
    }
  }],
  "sent_at": "2024-12-31T10:00:01Z"
}

One request. User created. Company created (or updated). User linked to company. Anonymous session merged. Done.

If it fails, nothing happened. No zombie users. No orphaned companies. No "user exists but isn't linked" nightmares.

Clean data in. Reliable triggers out.

The `$` Prefix Convention

I borrowed this from PostHog (they use $ for default properties), but took it further.

Fields with $ are system fields - reserved names with special meaning to Userloom. These power core features: $email for identity, $group for company relationships, $plan for segmentation.

Fields without $ are your fields - track whatever matters to your product. feature_used, export_format, team_size. All fully queryable, all available for triggers.

The convention keeps things clean: you'll never accidentally overwrite a system field, and Userloom will never clash with your custom properties.

Anonymous → Known: The Identity Gap

This is where most behavioral email fails.

User browses anonymously, views pricing three times, then signs up. Your email tool has no idea they were ever on the pricing page. You can't send them the discount offer because you don't know they need it.

The distinct_id + $user_id pattern solves this:

Track anonymous user with distinct_id: "anon_xyz"
User signs up
Send $identify with both the anonymous ID and the new $user_id
System merges the history

Now you can trigger: "User viewed pricing 3+ times → Send discount offer." That's the insight that actually drives conversions.

Cookieless? Covered.

But what happens when there's no anonymous ID at all? Privacy-focused browsers, cookie blockers, incognito mode.

That's where fingerprinting comes in. When the SDK can't persist an anonymous ID, Userloom falls back to device fingerprinting: a combination of browser characteristics, screen resolution, timezone, and other signals that create a probabilistic identifier.

It's not 100% foolproof, but it's good enough to stitch together most anonymous sessions. And when the user finally identifies themselves, all that fingerprint-linked history merges into their profile.

Important: No personal data is gathered or stored during the fingerprinting process. It's an anonymous device identifier, not PII.

Batching By Default

There's no /track endpoint. No /identify endpoint. Everything goes through /batch.

Even if you're sending one event.

Why? Because:

SDKs can queue events and flush periodically
Offline-first becomes trivial
Retry logic is simpler (retry the batch, not individual calls)
Fewer connections = less overhead

It's a small API surface that handles everything. And when you're capturing signals from website, mobile, API, webhooks, and forms—simplicity matters.

Context Separation

Event data is split between properties (what happened) and context (where/how):

{
  "properties": {
    "feature_name": "export",
    "format": "csv",
    "row_count": 1500
  },

  "context": {
    "page": {
      "url": "https://app.example.com/reports",
      "path": "/reports",
      "title": "Reports Dashboard",
      "referrer": "https://app.example.com/home"
    },
    "screen": {
      "width": 1920,
      "height": 1080
    },
    "library": {
      "name": "userloom-js",
      "version": "1.0.0"
    },
    "locale": "en-US",
    "userAgent": "Mozilla/5.0..."
  }
}

Why this helps:

Clean separation of concerns
context can be auto-collected by SDKs (page, screen, library info)
properties stay focused on business-meaningful data
Easier to filter out noise when building triggers

Flexible Group Types

I didn't hardcode "company." The schema uses $group_type:

"$group_type": "company"

Because B2B isn't always Company → Users. Sometimes it's:

Company → Workspace → Users
Organization → Project → Members
Franchise → Location → Staff

One schema handles all of it. One foundation for any structure.

Why This Matters

I could have shipped a basic analytics layer and moved on to the "exciting" features: email templates, survey builders, dashboards.

But behavioral email only works if you know the behavior. Triggered surveys only work if you catch the trigger. Every feature I build depends on signals being captured correctly, completely, and reliably.

So I'm doing this right. The unsexy work. The foundation.

Because when you send that perfectly-timed email that saves a user from churning? It all starts here.

What's Next

The schema is done (for now 😉). Now comes the fun part: building the ingest pipeline and seeing if this actually holds up at scale.

I'll share the infrastructure decisions next, including why I chose Cloudflare Workers over AWS Lambda, how events flow from SDK to ClickHouse, and how I'm getting 8-9x cost savings.

Why We're Obsessing Over Signal Capture

Under the Hood

The Problem With Existing Tools

One Call To Rule Them All

The `$` Prefix Convention

Anonymous → Known: The Identity Gap

Cookieless? Covered.

Batching By Default

Context Separation

Flexible Group Types

Why This Matters

What's Next

Related Posts

How to Built a Self-Hosted ClickHouse Cluster from Scratch

How to Built a Custom ClickHouse Migration System in TypeScript

Stop losing users to silence.

Why We're Obsessing Over Signal Capture

Under the Hood

The Problem With Existing Tools

One Call To Rule Them All

The $ Prefix Convention

Anonymous → Known: The Identity Gap

Cookieless? Covered.

Batching By Default

Context Separation

Flexible Group Types

Why This Matters

What's Next

Related Posts

How to Built a Self-Hosted ClickHouse Cluster from Scratch

How to Built a Custom ClickHouse Migration System in TypeScript

Stop losing users to silence.

The `$` Prefix Convention