What Would Have Happened...?
Structural Causal Models turn risk modeling into risk AI — by moving beyond prediction to answering counterfactual questions about this claim.
The Bottom Line
- The Problem: You need to answer "Would this loss have occurred anyway?" for specific claims — not averages. Standard risk models can't do this.
- The Insight: Same peril, different answers — Rung 2 (average policyholder) → $85,000. Rung 3 (this policyholder) → $76,000. The difference is who you're asking about.
- The Action: Use Structural Causal Models with explicit exogenous variables. They enable individual-level counterfactual reasoning — the foundation for loss attribution, reserve adequacy, and defensible underwriting.
1The ProblemRung 1 gives wrong answers. Rung 2 gives right answers to the wrong question.
Risk decisions are about specific exposures: "Did the storm cause this loss, or was the structure already compromised?" "Would this claim have been filed without the policy change?" "Was the underwriting decision defensible for this applicant?"
These are counterfactual questions — they require Rung 3 on Pearl's Ladder of Causation. Rung 2 gets you closer: causal models that predict what happens on average when you intervene. But averages don't defend individual decisions. And most risk teams aren't even at Rung 2 — they're at Rung 1, where the problems are more fundamental.
Rung 1: The Correlation Trap
The trap is wrong answers (confusing correlation with causation).
Most risk models — GLMs, gradient-boosted trees, even deep learning — operate on Rung 1. They find patterns in historical data and project them forward. This works until it doesn't:
Spurious pricing. The model sees that coastal properties have higher claims and raises premiums accordingly. But it can't distinguish storm surge damage from aging infrastructure that happens to cluster near coasts. It optimizes for the pattern, not the mechanism — and misprices the entire book.
Confounded attribution. Claims involving attorneys cost more. Rung 1 concludes: attorneys increase severity. But the causal arrow may run the other way — high-severity claims attract attorneys. Without causal structure, the model conflates consequence with cause, and the reserve strategy follows the wrong signal.
Simpson's paradox4 in development. Aggregate data shows that a book's reserves are developing favorably. But segment by litigation status and the trend reverses in every subgroup. Rung 1 sees the aggregate. Rung 2 would see the subgroups. Rung 3 would tell you what this claim would have developed to without litigation.
The common thread: more data makes these problems worse, not better. A larger dataset reinforces the spurious pattern. The only escape is causal structure — and that requires climbing the ladder.
4 Simpson's paradox occurs when a trend that appears in aggregate data reverses when the data is segmented by a confounding variable. It's one of the clearest demonstrations of why correlation without causal structure misleads.
Rung 2: Right Answer, Wrong Person
The trap is right answers to the wrong question (population averages applied to individual decisions).
Rung 2 fixes the causal confusion — it correctly estimates the effect of interventions. But it answers for the average policyholder, not the one sitting in front of you:
Mitigation that doesn't apply. A causal model shows that roof reinforcement reduces hurricane claims by 30% on average. The insurer offers a premium discount. But for this property — built on fill soil with a cracked foundation — the roof was never the vulnerability. The discount is wasted; the risk is unchanged.
Reserves that mask outliers. Average development factors say this book will settle within projections. But this claim — with its unique combination of jurisdiction, injury type, and legal representation — is not average. The population estimate absorbs the outlier. The individual reserve is wrong.
Pricing that ignores context. The average effect of a rating variable is statistically significant and directionally correct. But for this applicant — with their specific combination of exposure characteristics — the variable's effect is negligible or reversed. The price is defensible on average, but indefensible for this individual.
Rung 2 gets the mechanism right but treats everyone the same. Rung 3 treats everyone as themselves.
Rung 3: Right Answer, Right Person
The breakthrough is the right answer for the right person (individual counterfactual reasoning).
Rung 3 uses abduction to infer each individual's unique characteristics — the exogenous variables (U) — and then reasons2 about what would have happened to this person in particular, specifically:
Claim-level attribution. Not "storms increase losses on average" but "this storm caused $47,000 of this $62,000 claim — the remaining $15,000 was pre-existing deterioration." The adjuster, the reinsurer, and the regulator all get a defensible number.
Individualized reserves. Not average development factors applied to every open claim, but a specific projection for this claim given its unique combination of injury, jurisdiction, counsel, and treatment history. The reserve reflects the individual, not the portfolio.
Defensible pricing. Not "this rating variable is significant on average" but "for this applicant, with their specific exposure profile, the premium would have been $X if the variable had been different." That's the answer a regulator is looking for — and the answer Rung 2 cannot provide.
2 We say reasoning deliberately. SCMs don't pattern-match — they perform logical inference over causal structures: premises in, conclusions out. That's what makes them AI in the classical sense, not just statistics with more data.
Rung 1: Correlation
- ✗ "What patterns exist in historical losses?"
- ✗ Answers are associations — not causes
- ✗ Can't distinguish cause from confound
- ✗ More data reinforces spurious patterns
Rung 2: Population Averages
- ✗ "What's the average loss given this peril?"
- ✗ Answers are distributions across the book
- ✗ Can't attribute a specific loss to a specific cause
- ✗ Can't defend an individual underwriting decision
Rung 3: Individual Counterfactuals
- ✓ "Would this loss have occurred without the peril?"
- ✓ Answers are point values for this policyholder
- ✓ Enables claim-level attribution
- ✓ Foundation for defensible pricing and reserving
Pearl's Causal Hierarchy1 reveals a hard boundary. The hierarchy isn't a ranking of sophistication — it's a proof of impossibility. Each rung answers a class of questions that the rung below it is mathematically incapable of addressing.
No amount of data, compute, or model size can cross from one rung to the next. Each requires fundamentally different mathematical machinery.
1 Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
2How It WorksAbduction, action, prediction — the three-step counterfactual process.
A Structural Causal Model differs from a causal Bayesian network in one critical way: it includes exogenous variables (U) — the unobserved factors that make each individual unique. In risk: construction quality, maintenance history, local soil conditions, occupant behavior — everything the data doesn't capture but the outcome depends on.
With U variables in place, the model supports a three-step counterfactual process:
The Three-Step Counterfactual Process
Abduction
Given evidence — this property's claim history, construction, location, and mitigation — infer what values of U would produce that outcome for this specific property. This is working backwards: from what happened to why this property is different.
Action
Hypothetically change the intervention variable — e.g., "what if this property had had roof reinforcement?" This is Pearl's graph surgery: delete all incoming edges to the intervened variable and replace them with the new value. The variable no longer "listens" to its parents — the mitigation is imposed, not observed.
Prediction
Recompute the outcome using the same U values — because we're asking about the same property. Would this claim have occurred, given this property's unique characteristics, if the mitigation had been in place? The answer is deterministic for this individual.
This three-step process is what makes Rung 3 possible. Without abduction, you can't fix U. Without graph surgery, you can't intervene cleanly. Without the structural equations, you can't propagate the intervention through the causal mechanism. In risk modeling, this is the difference between "storms increase losses on average" and "this storm caused this loss to this property."
3The FrameworkThe SCM inference engine: assumptions in, counterfactual answers out.
The SCM Inference Engine
Pearl's Structural Causal Model isn't just a diagram — it's a schematic for a computational engine. It takes three inputs and produces three outputs, connecting assumptions to data to answers with mathematical precision.
Inputs
- Assumptions (causal graph)
- Queries (what you want to know)
- Data (what you observe)
SCM Engine
- ⟨U, V, F⟩
- do-calculus
- Abduction
Outputs
- Estimand (formula)
- Estimate (number)
- Fit (validation)
The formal definition is a triplet: ⟨U, V, F⟩ — exogenous variables (what you can't see), endogenous variables (what you can), and structural equations (how they connect). This is what separates a causal model from a statistical one: the equations represent mechanisms, not correlations.
Three Levels of Query
| Rung | Formal Query | Plain English | Requires |
|---|---|---|---|
| 1. Seeing | P(Y | X) |
"What do I expect to see?" | Data |
| 2. Doing | P(Y | do(X)) |
"What if I intervene?" | Causal model |
| 3. Imagining | P(Yx | X', Y') |
"What would have happened?" | SCM + individual data |
The Rung 3 expression reads: "the probability that Y would be y had X been x, given that we actually observed X to be x' and Y to be y'." As Pearl illustrates: "the probability that Joe's salary would be y had he finished college, given that his actual salary is y' and he had only two years of college." In risk terms: the probability that this claim would have been y had mitigation been x, given that the actual claim was y' and actual mitigation level was x'.
The hierarchy is a formal restriction, not a practical one. No dataset, however large, contains counterfactual information. You cannot observe what would have happened — only what did happen. A century of loss data tells you what storms did to properties. It does not tell you what a specific storm would have done to a specific property under different mitigation. Crossing from Rung 1 to Rung 2 requires a causal graph. Crossing from Rung 2 to Rung 3 requires structural equations with explicit U variables. These are assumptions — informed by actuarial and engineering expertise — not things you extract from data.
4Why This MattersThe questions regulators are already asking.
The Question Regulators Are Already Asking
Counterfactual reasoning is now central to regulatory scrutiny in insurance and finance. When a model declines an applicant or sets a premium, the regulator wants to know: "Would the outcome have been different if this protected variable had been different?" That's a Rung 3 question. Without an SCM, the answer is a guess. With an SCM, it's a computation.
Risk Applications of Rung 3
| Use Case | Rung 2 Question | Rung 3 Question |
|---|---|---|
| Loss Attribution | "Does this peril increase losses on average?" | "Would this loss have occurred without the peril?" |
| Reserve Adequacy | "Does litigation increase development on this book?" | "What would this claim have settled at without the litigation?" |
| Pricing Fairness | "Does this rating factor predict losses?" | "Would this premium have been different if the applicant's ZIP code were different?" |
| Regulatory Explanation | "Does the model use protected variables?" | "Would this decision have changed if the protected variable had been different?" |
| Aspect | Rung 2 (Doing) | Rung 3 (Imagining) |
|---|---|---|
| U Variables | Distributions (variance > 0) | Fixed values (via abduction) |
| Question Type | "What happens on average?" | "What happens for this person?" |
| Answer Type | Distribution over outcomes | Single deterministic outcome |
| Use Case | Policy decisions | Loss attribution, pricing fairness, reserving |
5What To DoThree steps from Rung 1 tools to Rung 3 answers.
Ask if your risk team is answering Rung 3 questions with Rung 1 tools.
- "Can we attribute this loss to this peril — or just to the portfolio average?"
- "Can we explain this underwriting decision to a regulator?"
- "Can we answer 'what would have happened?' for a specific claim — or only for the book?"
- "Do our models have explicit U variables — or are they just distributions?"
Then take these three steps:
Identify Counterfactual Questions
Which risk decisions require individual-level "what if" reasoning? Loss attribution, reserve adequacy, pricing fairness, and regulatory explanation are strong candidates.
Build Structural Causal Models
Specify the causal structure with explicit exogenous variables. This requires underwriting and actuarial expertise — understanding the mechanisms that generate losses, not just the patterns in the data.
Implement the Three-Step Process
Abduction → Action → Prediction. Use the SCM to answer "what would have happened?" for specific claims and policyholders. Validate against known cases where the counterfactual can be checked.
6CalculatorSame intervention, different answers — try it yourself.
Same Intervention, Different Answers
This calculator uses a simplified Mitigation → Exposure → Claim Severity model. The U variables are the unobserved factors that make each policyholder unique — construction quality, maintenance history, local conditions.
Scenario: Intervene with do(Mitigation = 1). Compare Rung 2 (random property from portfolio) vs. Rung 3 (specific property with known characteristics). What's the average effect of a mitigation action across the book vs. the effect on a specific policyholder?
Rung 2: Mitigation = 1 for random property → $85,000 ± $2,500
Rung 3: THIS property with its specific U values → $76,000 exactly
Why the Difference?
| Scenario | Uexp | Uc | Exposure | Claim |
|---|---|---|---|---|
| Rung 2 (average property) | 0 | 0 | 6 | $85,000 |
| Rung 3 (specific property) | -4 | 1,000 | 2 | $76,000 |
The specific property has Uexp = -4 (less exposure than mitigation would predict — perhaps recently renovated) and Uc = 1,000 (slight severity bump from unmeasured factors). The same intervention produces different outcomes because the U values — the idiosyncratic characteristics — are different. Two properties in the same flood zone, same mitigation, different claim outcomes — because construction quality, maintenance, and elevation differ.
Try It Yourself
Mitigation = 1.0 (intervention)
Exposure = 10 + (-4) × 1.0 + -4.0 = 2.0
Claim = 65000 + 2500 × 2.0 + 5000 × 1.0 + 1,000 = $76,000
The Five-Step Process
| Step | What Happens | U Variables | Result |
|---|---|---|---|
| 1. Observation | Baseline joint distribution | Distributions | Population statistics |
| 2. Intervention (Rung 2) | do(Mitigation = 1) for random property | Distributions | $85,000 ± $2,500 |
| 3. Abduction | Infer U values from observed data | Fixed | Uexp = -4, Uc = 1,000 |
| 4. U Values Fixed | Property's characteristics locked in | Fixed | Property identified |
| 5. Counterfactual (Rung 3) | do(Mitigation = 1) for this property | Fixed | $76,000 exactly |
When abduction doesn't fully pin down U (measurement error, unobserved confounders, stochastic mechanisms), counterfactuals become distributions rather than single values. You reason about P(YX=x' | E) instead of a point estimate. In practice, this is common in risk modeling — you rarely know every factor that influenced a claim. The math still works — the answers are just less precise.
7ReadingThe foundational literature.
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
- Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal Inference in Statistics: A Primer. Wiley.
- Pearl, J. & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
- Pearl, J. (2018). Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution. arXiv:1801.04016.
- Bareinboim, E. et al. (2022). On Pearl's Hierarchy and the Foundations of Causal Inference. Technical Report R-60, CausalAI Lab.
Software
- Bayes Server — Commercial tool for probabilistic graphical models and SCMs
- DoWhy (Microsoft) — Causal inference library
- DAGitty — DAG drawing and analysis