Skip to Content
DevelopersConceptsCDP query engine

CDP query engine

The CDP query engine is the internal layer that translates segment definitions and field references into SQL — for ClickHouse, Postgres, or both. Every system that asks the substrate a question goes through these primitives. They exist so that segment-rule logic isn’t reimplemented per call site (which historically produced drift, bugs, and SQL-injection vectors).

What the engine is responsible for

ConcernHow the engine handles it
Translate operator vocabulary (equals, is_in, between, contains, is_null, relative_date) into SQLMapped at one place via the _build_pg_predicate and _build_ch_predicate helpers
Parameterise every operator argumentValues are bound via :param (Postgres) or {param:Type} (ClickHouse) — never interpolated
Coerce types correctlyPythonic values → DB-native types per dialect (e.g. ISO strings → DateTime, lists → IN tuples)
Short-circuit edge casesEmpty IN lists return false immediately, not WHERE col IN ()
Discover registered fieldsCompositeFieldRegistry enumerates available field paths from app-layer registrations
Disambiguate dialectPostgres-side resolution uses _build_pg_predicate; ClickHouse-side uses _build_ch_predicate; both share the same operator vocabulary

Why a shared engine

Before the engine was extracted, each consumer (segment evaluator, suppression checker, audience sync, AI parser) generated SQL its own way. The result:

  • 39 sites used SQL interpolation; several were unsafe
  • Operator vocabulary diverged: some sites supported between, others only gt/lt
  • transaction.purchase was canonicalised in some places, order_placed in others
  • The same segment rule could produce different result sets depending on which consumer evaluated it

The engine resolves these by being the single point of SQL generation. Adding an operator means adding it once.

Operator vocabulary

The supported operators are the same across PG and CH resolvers:

OperatorWhat it does
eq / neqEquality
gt / gte / lt / lteNumeric / date comparisons
containsSubstring match (case-insensitive)
betweenRange — inclusive both ends
in / not_inSet membership
is_null / is_not_nullNull checks
relative_date”Last N days”, “next N days”, “is today”, “is this week” — see Relative date predicates below

Relative date predicates

A class of operators that work across PG (for contacts.* attributes) and CH (for event-derived fields). Common forms:

PredicateBehaviour
”Birthday in next 7 days”Computes month/day window around now(), ignoring year
”Anniversary today”Auto-discovered for any contact attribute matching `(anniversary
”Delivery yesterday”Resolves actual_delivery vs scheduled delivery; supports delivery_recency_days
”Loyalty tier expires next 14 days”Reads loyalty_membership_subscriptions.tier_protection_expires_at

The “Ambiguous phrases” feature (AMBIGUOUS_PHRASES) auto-mirrors tense — “delivery today” resolves whether the delivery is scheduled for today or already happened today, depending on context.

Field discovery

Every queryable field is registered via CompositeFieldRegistry. Registration declares:

  • The field’s path (e.g. contact.email, event.transaction.purchase.amount)
  • The source kind (ContactsAttribute, EventsAggregate, ContactChannelEngagement, FunnelPosition, etc.)
  • Optional embedded config (e.g. the channel name for an engagement field, baked into the field path)

The is_filterable: true flag determines whether the field shows up in the segment-builder UI. The AI parser and the FE manual builder both read from the same registry — a registry-surface-parity drift guard ensures they stay in sync.

FieldSource pattern

Most field paths resolve to one of a small set of FieldSource kinds. Each kind handles its own SQL generation:

FieldSourceSubstrateExample field path
ContactsAttributePostgres contacts.*contact.email
EventsAggregateClickHouse events_with_ttlevent.transaction.purchase.count_30d
ContactChannelEngagementClickHouse delivery_eventsengagement.whatsapp.last_clicked_at
ConversionAttributionCH events_with_ttl WHERE event_type='conversion.attributed'attribution.last_campaign
FunnelPositionCH + funnel deffunnel.cart_to_checkout.position
CohortMembershipCH + cohort defcohort.q2_signups.is_member
LoyaltyAttributePostgres loyalty_* tablesloyalty.tier

Each kind has a focused implementation. New analytical lenses become new kinds — the engine itself doesn’t need to change.

See USP derived fields for the read-time pattern that lets us add analytical lenses without storing per-user materialised columns.

Drift guards (CI)

Three CI tests prevent the engine from drifting away from its callers:

TestWhat it asserts
test_field_registry_parityFIELD_SOURCES keys are a subset of SYSTEM_VARIABLES (AI surface and manual surface agree on filterable fields)
test_pg_resolver_vocabularyPostgres resolver supports the same operator vocabulary as CH resolver
test_canonical_purchase_eventNo call site uses legacy order_placed; all converged on transaction.purchase

These run on every PR and fail the build on regression.

SQL-injection posture

Every operator argument is bound, not interpolated. The engine refuses to accept a non-bound value — passing a raw string where a bound placeholder is expected is a TypeError, not an exception at SQL time. The 23-commit stack that extracted the engine closed 7 historical SQL-injection sites; the architecture is now safe-by-construction.

What’s next