Skip to Content
DevelopersConceptsSegment substrate

Segment substrate

A segment in Active Reach is a named subset of contacts defined by a rule. Many features depend on knowing — quickly and accurately — whether a given contact is in a given segment: journey triggers, suppression, audience sync to ad platforms, message personalisation, analytics.

This page documents the canonical substrate that answers “is contact C in segment S right now?”.

Canonical storage

The source of truth is ClickHouse: a single wide table, aegis.segment_members, indexed for both lookups directions:

aegis.segment_members segment_id UUID contact_id UUID workspace_id UUID added_at DateTime expires_at DateTime -- for time-bounded segments source Enum -- 'rule_match' | 'manual_add' | 'import'

Every segment-aware system reads from this table. There is no per-contact denormalised column (e.g. contacts.segment_ids) — that pattern was orphaned in a 2026 migration after it kept drifting from the canonical source.

Read patterns

QuestionQuery shape
”Is contact C in segment S?”SELECT 1 FROM aegis.segment_members WHERE segment_id = S AND contact_id = C
”All members of segment S”SELECT contact_id FROM aegis.segment_members WHERE segment_id = S
”All segments contact C is in”SELECT segment_id FROM aegis.segment_members WHERE contact_id = C
”Members of S with expires_at > now()Add AND (expires_at IS NULL OR expires_at > now())

The table is partitioned by workspace_id so cross-workspace reads naturally avoid each other. workspace_id (brand-tenant) and location_id (outlet) are distinct dimensions — every membership row stamps workspace_id, and outlet-scoped segments additionally stamp location_id. Brand-tier segments leave location_id NULL.

Write patterns

Membership is recomputed by a per-segment evaluator that runs:

  • On schedule for static segments (every N minutes — N configured per segment)
  • On event-trigger for dynamic segments (the event itself re-evaluates the affected contacts)
  • On manual add / remove when an operator forces membership

Writes use the CDP query engine (see CDP query engine) — the same shared primitives that power the segment-rule parser and the dispatcher. SQL is parameterised; user input never reaches the SQL string concatenation path.

The orphan-removal history

contacts.segment_ids was a JSONB column on the contacts table that stored the contact’s current segment memberships as an array. It existed for fast point-lookup before the ClickHouse substrate was canonical.

It became an anti-pattern when:

  • Three different writers updated it on three different schedules (segment evaluator, manual-add API, import path)
  • Readers couldn’t tell whether a missing membership was “not in segment” or “writer hasn’t caught up yet”
  • Backfill jobs created duplicates that ClickHouse didn’t reflect

Resolution (Phases 0.5 / 1 / 2 / 3, shipped 2026-05-15): every reader migrated to aegis.segment_members; the column was scheduled for removal after a 30-day soak.

If you’re writing a new feature: never assume contacts.segment_ids is current. Query ClickHouse.

TTL-bound membership

Segments can be defined with a TTL — e.g. “in this segment for 24 hours after the trigger event”. Active Reach handles this by setting expires_at on the membership row, not by background-cleaning rows after expiry.

Reads must include the expiry check (expires_at IS NULL OR expires_at > now()). A background compactor garbage-collects expired rows; queries don’t depend on it being timely.

Streaming reads (analytics → segment bridge)

Some segments compute over analytical lenses — funnels, cohorts, attribution paths — without storing per-user materialised columns. These are evaluated at read time via FieldSource kinds:

FieldSourceWhat it computes at read time
FunnelPosition”Where is this contact in the configured funnel?”
CohortMembership”Is this contact in the cohort defined by event X happening between dates Y and Z?”
ConversionAttribution”Last-touch / first-touch attributed campaign / channel for this contact’s conversions”
ContactChannelEngagement”Has this contact engaged on channel X in the last N days?”

See USP derived fields for the read-time pattern.

Sync to ad platforms

aegis.segment_members is also the source for ad-platform audience sync. For each segment marked Sync to ad platforms:

  1. A reader job pages through current aegis.segment_members rows
  2. For each contact, hashed identifiers (email, phone, mobile ad ID) are extracted
  3. Hashed identifiers are uploaded to the configured platforms (Meta Custom Audiences, Google Customer Match)
  4. Removals from the segment trigger high-priority remove operations on the platforms

The sync is incremental — only deltas since the last sync are pushed. See retargeting cascade for the operator-side configuration.

What’s next