Reference

Bridge Tables and Many-to-Many Modeling

Use this page to model many-to-many relationships safely in dimensional systems.

Goal:

preserve business meaning for multivalued relationships
prevent metric duplication in joins
expose analyst-friendly patterns for allocation and grouping

Where Many-to-Many Appears #

Common examples:

policy shared by multiple insured parties
sale attributed to multiple channels or campaigns
customer belonging to multiple segments
product mapped to multiple taxonomies

Bridge Table Pattern #

A bridge table resolves many-to-many by storing relationship rows and optional allocation weights.

sql

CREATE TABLE bridge_policy_insured (
  policy_key BIGINT,
  insured_party_key BIGINT,
  relationship_type VARCHAR,
  allocation_weight DECIMAL(18,8),
  effective_start_ts TIMESTAMP,
  effective_end_ts TIMESTAMP,
  is_current BOOLEAN
);

A policies table and an insured_parties table linked through a bridge_policy_insured table; policy 1002 splits across P-01 and P-02 with allocation weights of 0.5 each — A bridge table resolves many-to-many: each (policy, insured) pairing is one row, with an allocation_weight to split facts across parties.

Allocation Strategy #

If one fact row maps to multiple bridge rows, decide one of these explicitly:

full attribution (duplicate total intentionally)
weighted attribution (split total by weights)
primary attribution (single chosen relationship)

Document strategy in metric definitions.

Weighted Attribution Query #

sql

SELECT
  b.insured_party_key,
  SUM(f.written_premium * b.allocation_weight) AS allocated_written_premium,
  SUM(f.earned_premium * b.allocation_weight) AS allocated_earned_premium
FROM fact_policy_financials f
JOIN bridge_policy_insured b
  ON b.policy_key = f.policy_key
 AND f.snapshot_ts >= b.effective_start_ts
 AND f.snapshot_ts < b.effective_end_ts
GROUP BY b.insured_party_key;

Validation SQL #

sql

-- 1) for weighted-allocation bridges, weights should sum to 1 by policy/version
-- (full-attribution bridges carry weight = 1 on every row and intentionally
-- fail this check -- skip it there and validate the duplication factor instead)
SELECT policy_key, effective_start_ts, SUM(allocation_weight) AS weight_sum
FROM bridge_policy_insured
GROUP BY policy_key, effective_start_ts
HAVING ROUND(SUM(allocation_weight), 6) <> 1;

-- 2) duplicate relationship rows
SELECT policy_key, insured_party_key, effective_start_ts, COUNT(*) AS row_count
FROM bridge_policy_insured
GROUP BY policy_key, insured_party_key, effective_start_ts
HAVING COUNT(*) > 1;

-- 3) overlap checks on validity windows
SELECT a.policy_key
FROM bridge_policy_insured a
JOIN bridge_policy_insured b
  ON a.policy_key = b.policy_key
 AND a.insured_party_key = b.insured_party_key
 AND a.effective_start_ts < b.effective_end_ts
 AND b.effective_start_ts < a.effective_end_ts
 AND a.effective_start_ts <> b.effective_start_ts
GROUP BY a.policy_key;

Common Mistakes #

Joining a many-to-many relationship directly without weight logic.
Not versioning bridge rows when relationships change over time.
Mixing primary and weighted attribution in different dashboards.
Leaving allocation semantics undocumented.

Senior Engineer Playbook #

Allocation reconciliation: assert SUM(metric) before the bridge join equals SUM(metric * weight) after it, one check that catches both bad weights and unintended fanout.
Failure postmortem pattern: compare allocated vs unallocated totals by period to detect attribution drift.
Migration playbook: introduce bridge with parallel output tables, then switch consumers after weighted reconciliation sign-off.

Interactive SQL Check #

Allocate earned premium to insured parties with a weighted bridge. The seed data has no bridge table, so the starter seeds one inline via a VALUES CTE (weights per policy sum to 1).

Join the bridge to ins_premium_facts on policy_id, multiply earned_premium by allocation_weight, and return one row per insured_party with the allocated total rounded to 2 decimals.

Loading SQL editor...

Warning

Bridge modeling is mostly about semantic control. If allocation rules are unclear, every KPI downstream can diverge.