Pearl's Ladder of Causation
Pearl's Ladder

See. Do. Imagine.

Pearl's Ladder of Causation measures how far AI has climbed toward human reasoning. For all the hype, it hasn't left the first rung.

1 Seeing
2 Doing
3 Imagining

The Bottom Line

  • The Problem: Most AI — including LLMs — is stuck on Rung 1 (correlation). It cannot answer "What if I do X?" or "What would have happened?"
  • The Insight: Pearl's Ladder1 defines three levels: Seeing (correlation), Doing (intervention), Imagining (counterfactual). True intelligence requires all three.
  • The Action: Build causal models. Correlation-based AI will never answer causal questions, no matter how much data you feed it.

1 Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.

1. The Ladder
Pearl's Ladder of Causation ↗
Pearl's Ladder of Causation — three rungs: Seeing, Doing, Imagining
Rung 1

Seeing

Association — "What is?"

The realm of observation and correlation. Given what I see, what can I predict? This is where most ML lives — pattern recognition, statistical relationships, prediction.

Query: P(Y | X)
"What's the probability of Y given I observe X?"
Rung 2

Doing

Intervention — "What if I do X?"

The realm of action and experimentation. What happens if I actively change something? Requires causal models — understanding mechanisms, not just correlations.

Query: P(Y | do(X))
"What's the probability of Y if I intervene to set X?"
Rung 3

Imagining

Counterfactual — "What if I had done X?"

The realm of imagination and retrospection. What would have happened had things been different? Enables explanation, attribution, and reasoning about specific individuals.

Query: P(Yx | X', Y')
"Given what I observed, what would Y be had X been different?"
Rung Question Business Example Requires
1. Seeing "What is?" "Customers who buy X also buy Y" Data
2. Doing "What if I do?" "If we raise prices, what happens to sales?" Causal model
3. Imagining "What if I had?" "Would this customer have churned anyway?" Causal model + individual data
2. The Problem

Most machine learning — including deep learning and large language models — is fundamentally stuck on Rung 1. It can find correlations and make predictions, but cannot reason about interventions or counterfactuals.

What Rung 1 Can Do ✓

  • ✓ Find patterns in data
  • ✓ Predict outcomes given inputs
  • ✓ Classify and cluster
  • ✓ Recommend based on similarity

What Rung 1 Cannot Do ✗

  • ✗ Answer "What happens if we change X?"
  • ✗ Distinguish cause from correlation
  • ✗ Reason about individual cases
  • ✗ Explain why something happened
The Core Limitation

Correlation ≠ Causation. A system that only sees associations cannot answer: "What would happen if we changed X?" or "Would Y have occurred if X had been different?" These questions require understanding the underlying causal mechanism — not just the statistical relationship. No amount of data will bridge this gap.

Example: Your data shows that customers who receive discount emails have higher purchase rates. Should you send more discount emails?

  • Rung 1 answer: "Customers who get discounts buy more. Send more discounts."
  • Rung 2 question: "What happens if I send discounts to customers who weren't going to get them?"
  • The problem: Maybe you're only sending discounts to customers who were already likely to buy. The correlation exists, but the causal effect might be zero — or negative.
3. Why This Matters

Business Decisions Require Rungs 2 and 3

Nearly every important business decision is causal, not correlational:

Decision Rung Required Why Rung 1 Fails
"Should we raise prices?" Rung 2 Correlation shows high prices = premium customers. But what happens if YOU raise prices?
"Did this campaign work?" Rung 3 You need to know what WOULD have happened without the campaign
"Why did this customer churn?" Rung 3 Requires reasoning about this specific individual, not averages
"How much of this damage was storm-caused?" Rung 3 Must compute counterfactual: damage without the storm
The AI Investment Problem

Companies are pouring money into correlation-based AI expecting causal answers. LLMs can generate text about causation — but they cannot actually reason causally. They will confidently answer "What would happen if..." by pattern-matching, not by understanding cause and effect. The answer will sound plausible and be meaningless.

The ROI is not in bigger models or more data. It's in building causal models — Structural Causal Models, Bayesian Networks — that can actually answer Rung 2 and Rung 3 questions.

4. What To Do
1

Identify Which Rung Your Questions Require

"What is?" = Rung 1. "What if I do?" = Rung 2. "What would have happened?" = Rung 3. Most important business questions are Rung 2 or 3.

2

Build Causal Models for Rung 2/3 Questions

Structural Causal Models, Bayesian Networks, causal DAGs. These require domain expertise to specify the causal structure — not just data.

3

Hire or Train Causal Inference Expertise

You need your people to understand Pearl's framework, the do-calculus, and how to build and validate causal models for your particular business.

Questions to Ask Your Team
  • "When we say our model 'predicts' X, are we claiming correlation or causation?"
  • "Can our AI answer 'What happens if we change X?' — or does it just find patterns?"
  • "Do we have anyone who can build a causal model — or are we dependent on correlation-based ML?"
  • "How many of our 'AI insights' are actually just correlations we're treating as causal?"
5. References
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
  • Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
  • Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal Inference in Statistics: A Primer. Wiley.
  • Marcus, G., & Davis, E. (2019). Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.

On the Limitations of LLMs

Leading AI researchers agree: LLMs cannot do causal reasoning. They pattern-match from training data. They can generate text about causation, but they cannot actually reason about interventions or counterfactuals.

  • Judea Pearl (UCLA, Turing Award winner) — "All the impressive achievements of deep learning amount to just curve fitting."
  • Yann LeCun (Meta Chief AI Scientist, Turing Award winner) — has characterized LLMs as a "hack" that lacks true understanding and world models.
  • Gary Marcus (NYU Professor Emeritus) — "If you don't know what can cause a fire, or what happens when a bottle breaks, it's hard to make inferences about what is happening around you."
  • François Chollet (Creator of Keras) — has argued LLMs lack true abstraction and reasoning capabilities required for general intelligence.