What Would Have Happened...? — Counterfactual Risk Modeling with Structural Causal Models

Rung 3 | Counterfactual Risk Modeling

What Would Have Happened...?

Structural Causal Models turn risk modeling into risk AI — by moving beyond prediction to answering counterfactual questions about this claim.

Rung 1 · Seeing

Rung 2 · Doing

Rung 3 · Imagining

The Bottom Line

The Problem: You need to answer "Would this loss have occurred anyway?" for specific claims — not averages. Standard risk models can't do this.
The Insight: Same peril, different answers — Rung 2 (average policyholder) → $85,000. Rung 3 (this policyholder) → $76,000. The difference is who you're asking about.
The Action: Use Structural Causal Models with explicit exogenous variables. They enable individual-level counterfactual reasoning — the foundation for loss attribution, reserve adequacy, and defensible underwriting.

1The ProblemRung 1 gives wrong answers. Rung 2 gives right answers to the wrong question.

Risk decisions are about specific exposures: "Did the storm cause this loss, or was the structure already compromised?" "Would this claim have been filed without the policy change?" "Was the underwriting decision defensible for this applicant?"

These are counterfactual questions — they require Rung 3 on Pearl's Ladder of Causation. Rung 2 gets you closer: causal models that predict what happens on average when you intervene. But averages don't defend individual decisions. And most risk teams aren't even at Rung 2 — they're at Rung 1, where the problems are more fundamental.

Rung 1: The Correlation Trap

The trap is wrong answers (confusing correlation with causation).

Most risk models — GLMs, gradient-boosted trees, even deep learning — operate on Rung 1. They find patterns in historical data and project them forward. This works until it doesn't:

Spurious pricing. The model sees that coastal properties have higher claims and raises premiums accordingly. But it can't distinguish storm surge damage from aging infrastructure that happens to cluster near coasts. It optimizes for the pattern, not the mechanism — and misprices the entire book.

Confounded attribution. Claims involving attorneys cost more. Rung 1 concludes: attorneys increase severity. But the causal arrow may run the other way — high-severity claims attract attorneys. Without causal structure, the model conflates consequence with cause, and the reserve strategy follows the wrong signal.

Simpson's paradox⁴ in development. Aggregate data shows that a book's reserves are developing favorably. But segment by litigation status and the trend reverses in every subgroup. Rung 1 sees the aggregate. Rung 2 would see the subgroups. Rung 3 would tell you what this claim would have developed to without litigation.

The common thread: more data makes these problems worse, not better. A larger dataset reinforces the spurious pattern. The only escape is causal structure — and that requires climbing the ladder.

⁴ Simpson's paradox occurs when a trend that appears in aggregate data reverses when the data is segmented by a confounding variable. It's one of the clearest demonstrations of why correlation without causal structure misleads.

Rung 2: Right Answer, Wrong Person

The trap is right answers to the wrong question (population averages applied to individual decisions).

Rung 2 fixes the causal confusion — it correctly estimates the effect of interventions. But it answers for the average policyholder, not the one sitting in front of you:

Mitigation that doesn't apply. A causal model shows that roof reinforcement reduces hurricane claims by 30% on average. The insurer offers a premium discount. But for this property — built on fill soil with a cracked foundation — the roof was never the vulnerability. The discount is wasted; the risk is unchanged.

Reserves that mask outliers. Average development factors say this book will settle within projections. But this claim — with its unique combination of jurisdiction, injury type, and legal representation — is not average. The population estimate absorbs the outlier. The individual reserve is wrong.

Pricing that ignores context. The average effect of a rating variable is statistically significant and directionally correct. But for this applicant — with their specific combination of exposure characteristics — the variable's effect is negligible or reversed. The price is defensible on average, but indefensible for this individual.

Rung 2 gets the mechanism right but treats everyone the same. Rung 3 treats everyone as themselves.

Rung 3: Right Answer, Right Person

The breakthrough is the right answer for the right person (individual counterfactual reasoning).

Rung 3 uses abduction to infer each individual's unique characteristics — the exogenous variables (U) — and then reasons² about what would have happened to this person in particular, specifically:

Claim-level attribution. Not "storms increase losses on average" but "this storm caused $47,000 of this $62,000 claim — the remaining $15,000 was pre-existing deterioration." The adjuster, the reinsurer, and the regulator all get a defensible number.

Individualized reserves. Not average development factors applied to every open claim, but a specific projection for this claim given its unique combination of injury, jurisdiction, counsel, and treatment history. The reserve reflects the individual, not the portfolio.

Defensible pricing. Not "this rating variable is significant on average" but "for this applicant, with their specific exposure profile, the premium would have been $X if the variable had been different." That's the answer a regulator is looking for — and the answer Rung 2 cannot provide.

² We say reasoning deliberately. SCMs don't pattern-match — they perform logical inference over causal structures: premises in, conclusions out. That's what makes them AI in the classical sense, not just statistics with more data.

Rung 1: Correlation

✗ "What patterns exist in historical losses?"
✗ Answers are associations — not causes
✗ Can't distinguish cause from confound
✗ More data reinforces spurious patterns

Rung 2: Population Averages

✗ "What's the average loss given this peril?"
✗ Answers are distributions across the book
✗ Can't attribute a specific loss to a specific cause
✗ Can't defend an individual underwriting decision

Rung 3: Individual Counterfactuals

✓ "Would this loss have occurred without the peril?"
✓ Answers are point values for this policyholder
✓ Enables claim-level attribution
✓ Foundation for defensible pricing and reserving

The Ceiling of Statistical AI

Pearl's Causal Hierarchy¹ reveals a hard boundary. The hierarchy isn't a ranking of sophistication — it's a proof of impossibility. Each rung answers a class of questions that the rung below it is mathematically incapable of addressing.

No amount of data, compute, or model size can cross from one rung to the next. Each requires fundamentally different mathematical machinery.

¹ Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.

↑ Back to Top

2How It WorksAbduction, action, prediction — the three-step counterfactual process.

A Structural Causal Model differs from a causal Bayesian network in one critical way: it includes exogenous variables (U) — the unobserved factors that make each individual unique. In risk: construction quality, maintenance history, local soil conditions, occupant behavior — everything the data doesn't capture but the outcome depends on.

With U variables in place, the model supports a three-step counterfactual process:

The Three-Step Counterfactual Process

Abduction

Given evidence — this property's claim history, construction, location, and mitigation — infer what values of U would produce that outcome for this specific property. This is working backwards: from what happened to why this property is different.

Action

Hypothetically change the intervention variable — e.g., "what if this property had had roof reinforcement?" This is Pearl's graph surgery: delete all incoming edges to the intervened variable and replace them with the new value. The variable no longer "listens" to its parents — the mitigation is imposed, not observed.

Prediction

Recompute the outcome using the same U values — because we're asking about the same property. Would this claim have occurred, given this property's unique characteristics, if the mitigation had been in place? The answer is deterministic for this individual.

This three-step process is what makes Rung 3 possible. Without abduction, you can't fix U. Without graph surgery, you can't intervene cleanly. Without the structural equations, you can't propagate the intervention through the causal mechanism. In risk modeling, this is the difference between "storms increase losses on average" and "this storm caused this loss to this property."

↑ Back to Top

3The FrameworkThe SCM inference engine: assumptions in, counterfactual answers out.

The SCM Inference Engine

Pearl's Structural Causal Model isn't just a diagram — it's a schematic for a computational engine. It takes three inputs and produces three outputs, connecting assumptions to data to answers with mathematical precision.

Inputs

Assumptions (causal graph)
Queries (what you want to know)
Data (what you observe)

→

SCM Engine

⟨U, V, F⟩
do-calculus
Abduction

→

Outputs

Estimand (formula)
Estimate (number)
Fit (validation)

The formal definition is a triplet: ⟨U, V, F⟩ — exogenous variables (what you can't see), endogenous variables (what you can), and structural equations (how they connect). This is what separates a causal model from a statistical one: the equations represent mechanisms, not correlations.

Three Levels of Query

Rung	Formal Query	Plain English	Requires
1. Seeing	`P(Y \| X)`	"What do I expect to see?"	Data
2. Doing	`P(Y \| do(X))`	"What if I intervene?"	Causal model
3. Imagining	`P(Y_x \| X', Y')`	"What would have happened?"	SCM + individual data

The Rung 3 expression reads: "the probability that Y would be y had X been x, given that we actually observed X to be x' and Y to be y'." As Pearl illustrates: "the probability that Joe's salary would be y had he finished college, given that his actual salary is y' and he had only two years of college." In risk terms: the probability that this claim would have been y had mitigation been x, given that the actual claim was y' and actual mitigation level was x'.

Why Data Alone Can't Get You There

The hierarchy is a formal restriction, not a practical one. No dataset, however large, contains counterfactual information. You cannot observe what would have happened — only what did happen. A century of loss data tells you what storms did to properties. It does not tell you what a specific storm would have done to a specific property under different mitigation. Crossing from Rung 1 to Rung 2 requires a causal graph. Crossing from Rung 2 to Rung 3 requires structural equations with explicit U variables. These are assumptions — informed by actuarial and engineering expertise — not things you extract from data.

↑ Back to Top

4Why This MattersThe questions regulators are already asking.

The Question Regulators Are Already Asking

Counterfactual reasoning is now central to regulatory scrutiny in insurance and finance. When a model declines an applicant or sets a premium, the regulator wants to know: "Would the outcome have been different if this protected variable had been different?" That's a Rung 3 question. Without an SCM, the answer is a guess. With an SCM, it's a computation.

Risk Applications of Rung 3

Use Case	Rung 2 Question	Rung 3 Question
Loss Attribution	"Does this peril increase losses on average?"	"Would this loss have occurred without the peril?"
Reserve Adequacy	"Does litigation increase development on this book?"	"What would this claim have settled at without the litigation?"
Pricing Fairness	"Does this rating factor predict losses?"	"Would this premium have been different if the applicant's ZIP code were different?"
Regulatory Explanation	"Does the model use protected variables?"	"Would this decision have changed if the protected variable had been different?"

The Critical Difference

Aspect	Rung 2 (Doing)	Rung 3 (Imagining)
U Variables	Distributions (variance > 0)	Fixed values (via abduction)
Question Type	"What happens on average?"	"What happens for this person?"
Answer Type	Distribution over outcomes	Single deterministic outcome
Use Case	Policy decisions	Loss attribution, pricing fairness, reserving

↑ Back to Top

5What To DoThree steps from Rung 1 tools to Rung 3 answers.

Ask if your risk team is answering Rung 3 questions with Rung 1 tools.

Questions to Ask Your Risk Team

"Can we attribute this loss to this peril — or just to the portfolio average?"
"Can we explain this underwriting decision to a regulator?"
"Can we answer 'what would have happened?' for a specific claim — or only for the book?"
"Do our models have explicit U variables — or are they just distributions?"

Then take these three steps:

Identify Counterfactual Questions

Which risk decisions require individual-level "what if" reasoning? Loss attribution, reserve adequacy, pricing fairness, and regulatory explanation are strong candidates.

Build Structural Causal Models

Specify the causal structure with explicit exogenous variables. This requires underwriting and actuarial expertise — understanding the mechanisms that generate losses, not just the patterns in the data.

Implement the Three-Step Process

Abduction → Action → Prediction. Use the SCM to answer "what would have happened?" for specific claims and policyholders. Validate against known cases where the counterfactual can be checked.

↑ Back to Top

6CalculatorSame intervention, different answers — try it yourself.

Same Intervention, Different Answers

This calculator uses a simplified Mitigation → Exposure → Claim Severity model. The U variables are the unobserved factors that make each policyholder unique — construction quality, maintenance history, local conditions.

Scenario: Intervene with do(Mitigation = 1). Compare Rung 2 (random property from portfolio) vs. Rung 3 (specific property with known characteristics). What's the average effect of a mitigation action across the book vs. the effect on a specific policyholder?

Rung 2: Mitigation = 1 for random property → $85,000 ± $2,500

Rung 3: THIS property with its specific U values → $76,000 exactly

Why the Difference?

Scenario	U_exp	U_c	Exposure	Claim
Rung 2 (average property)	0	0	6	$85,000
Rung 3 (specific property)	-4	1,000	2	$76,000

The specific property has U_exp = -4 (less exposure than mitigation would predict — perhaps recently renovated) and U_c = 1,000 (slight severity bump from unmeasured factors). The same intervention produces different outcomes because the U values — the idiosyncratic characteristics — are different. Two properties in the same flood zone, same mitigation, different claim outcomes — because construction quality, maintenance, and elevation differ.

Try It Yourself

Mitigation (level)

1.0

U_exp (Exposure noise)

-4.0

U_c (Claim noise)

1,000

Mitigation

1.0

Exposure

2.0

Claim Severity
$76,000

(Rung 3 — point values)
Mitigation = 1.0 (intervention)
Exposure = 10 + (-4) × 1.0 + -4.0 = 2.0
Claim = 65000 + 2500 × 2.0 + 5000 × 1.0 + 1,000 = $76,000

The Five-Step Process

Step	What Happens	U Variables	Result
1. Observation	Baseline joint distribution	Distributions	Population statistics
2. Intervention (Rung 2)	do(Mitigation = 1) for random property	Distributions	$85,000 ± $2,500
3. Abduction	Infer U values from observed data	Fixed	U_exp = -4, U_c = 1,000
4. U Values Fixed	Property's characteristics locked in	Fixed	Property identified
5. Counterfactual (Rung 3)	do(Mitigation = 1) for this property	Fixed	$76,000 exactly

Distributional Counterfactuals

When abduction doesn't fully pin down U (measurement error, unobserved confounders, stochastic mechanisms), counterfactuals become distributions rather than single values. You reason about P(Y_X=x' | E) instead of a point estimate. In practice, this is common in risk modeling — you rarely know every factor that influenced a claim. The math still works — the answers are just less precise.

↑ Back to Top

7ReadingThe foundational literature.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal Inference in Statistics: A Primer. Wiley.
Pearl, J. & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
Pearl, J. (2018). Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution. arXiv:1801.04016.
Bareinboim, E. et al. (2022). On Pearl's Hierarchy and the Foundations of Causal Inference. Technical Report R-60, CausalAI Lab.

Software

Bayes Server — Commercial tool for probabilistic graphical models and SCMs
DoWhy (Microsoft) — Causal inference library
DAGitty — DAG drawing and analysis

↑ Back to Top

What Would Have Happened...?

Contents

The Bottom Line

Rung 1: The Correlation Trap

Rung 2: Right Answer, Wrong Person

Rung 3: Right Answer, Right Person

Rung 1: Correlation

Rung 2: Population Averages

Rung 3: Individual Counterfactuals

The Three-Step Counterfactual Process

Abduction

Action

Prediction

The SCM Inference Engine

Inputs

SCM Engine

Outputs

Three Levels of Query

The Question Regulators Are Already Asking

Risk Applications of Rung 3

Identify Counterfactual Questions

Build Structural Causal Models

Implement the Three-Step Process

Same Intervention, Different Answers

Why the Difference?

Try It Yourself

The Five-Step Process

Software

Contact