Segment substrate
A segment in Active Reach is a named subset of contacts defined by a rule. Many features depend on knowing — quickly and accurately — whether a given contact is in a given segment: journey triggers, suppression, audience sync to ad platforms, message personalisation, analytics.
This page documents the canonical substrate that answers “is contact C in segment S right now?”.
Canonical storage
The source of truth is ClickHouse: a single wide table, aegis.segment_members, indexed for both lookups directions:
aegis.segment_members
segment_id UUID
contact_id UUID
workspace_id UUID
added_at DateTime
expires_at DateTime -- for time-bounded segments
source Enum -- 'rule_match' | 'manual_add' | 'import'Every segment-aware system reads from this table. There is no per-contact denormalised column (e.g. contacts.segment_ids) — that pattern was orphaned in a 2026 migration after it kept drifting from the canonical source.
Read patterns
| Question | Query shape |
|---|---|
| ”Is contact C in segment S?” | SELECT 1 FROM aegis.segment_members WHERE segment_id = S AND contact_id = C |
| ”All members of segment S” | SELECT contact_id FROM aegis.segment_members WHERE segment_id = S |
| ”All segments contact C is in” | SELECT segment_id FROM aegis.segment_members WHERE contact_id = C |
”Members of S with expires_at > now()” | Add AND (expires_at IS NULL OR expires_at > now()) |
The table is partitioned by workspace_id so cross-workspace reads naturally avoid each other. workspace_id (brand-tenant) and location_id (outlet) are distinct dimensions — every membership row stamps workspace_id, and outlet-scoped segments additionally stamp location_id. Brand-tier segments leave location_id NULL.
Write patterns
Membership is recomputed by a per-segment evaluator that runs:
- On schedule for static segments (every N minutes — N configured per segment)
- On event-trigger for dynamic segments (the event itself re-evaluates the affected contacts)
- On manual add / remove when an operator forces membership
Writes use the CDP query engine (see CDP query engine) — the same shared primitives that power the segment-rule parser and the dispatcher. SQL is parameterised; user input never reaches the SQL string concatenation path.
The orphan-removal history
contacts.segment_ids was a JSONB column on the contacts table that stored the contact’s current segment memberships as an array. It existed for fast point-lookup before the ClickHouse substrate was canonical.
It became an anti-pattern when:
- Three different writers updated it on three different schedules (segment evaluator, manual-add API, import path)
- Readers couldn’t tell whether a missing membership was “not in segment” or “writer hasn’t caught up yet”
- Backfill jobs created duplicates that ClickHouse didn’t reflect
Resolution (Phases 0.5 / 1 / 2 / 3, shipped 2026-05-15): every reader migrated to aegis.segment_members; the column was scheduled for removal after a 30-day soak.
If you’re writing a new feature: never assume contacts.segment_ids is current. Query ClickHouse.
TTL-bound membership
Segments can be defined with a TTL — e.g. “in this segment for 24 hours after the trigger event”. Active Reach handles this by setting expires_at on the membership row, not by background-cleaning rows after expiry.
Reads must include the expiry check (expires_at IS NULL OR expires_at > now()). A background compactor garbage-collects expired rows; queries don’t depend on it being timely.
Streaming reads (analytics → segment bridge)
Some segments compute over analytical lenses — funnels, cohorts, attribution paths — without storing per-user materialised columns. These are evaluated at read time via FieldSource kinds:
| FieldSource | What it computes at read time |
|---|---|
FunnelPosition | ”Where is this contact in the configured funnel?” |
CohortMembership | ”Is this contact in the cohort defined by event X happening between dates Y and Z?” |
ConversionAttribution | ”Last-touch / first-touch attributed campaign / channel for this contact’s conversions” |
ContactChannelEngagement | ”Has this contact engaged on channel X in the last N days?” |
See USP derived fields for the read-time pattern.
Sync to ad platforms
aegis.segment_members is also the source for ad-platform audience sync. For each segment marked Sync to ad platforms:
- A reader job pages through current
aegis.segment_membersrows - For each contact, hashed identifiers (email, phone, mobile ad ID) are extracted
- Hashed identifiers are uploaded to the configured platforms (Meta Custom Audiences, Google Customer Match)
- Removals from the segment trigger high-priority remove operations on the platforms
The sync is incremental — only deltas since the last sync are pushed. See retargeting cascade for the operator-side configuration.
What’s next
- CDP query engine — the shared primitive that backs segment SQL generation
- USP derived fields — the read-time FieldSource pattern
- Data model — how contacts, events, segments compose