See. Do. Imagine.
Pearl's Ladder of Causation measures how far AI has climbed toward human reasoning. For all the hype, it hasn't left the first rung.
The Bottom Line
- The Problem: Most AI — including LLMs — is stuck on Rung 1 (correlation). It cannot answer "What if I do X?" or "What would have happened?"
- The Insight: Pearl's Ladder1 defines three levels: Seeing (correlation), Doing (intervention), Imagining (counterfactual). True intelligence requires all three.
- The Action: Build causal models. Correlation-based AI will never answer causal questions, no matter how much data you feed it.
1 Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
1. The Ladder
Seeing
Association — "What is?"
The realm of observation and correlation. Given what I see, what can I predict? This is where most ML lives — pattern recognition, statistical relationships, prediction.
"What's the probability of Y given I observe X?"
Doing
Intervention — "What if I do X?"
The realm of action and experimentation. What happens if I actively change something? Requires causal models — understanding mechanisms, not just correlations.
"What's the probability of Y if I intervene to set X?"
Imagining
Counterfactual — "What if I had done X?"
The realm of imagination and retrospection. What would have happened had things been different? Enables explanation, attribution, and reasoning about specific individuals.
"Given what I observed, what would Y be had X been different?"
| Rung | Question | Business Example | Requires |
|---|---|---|---|
| 1. Seeing | "What is?" | "Customers who buy X also buy Y" | Data |
| 2. Doing | "What if I do?" | "If we raise prices, what happens to sales?" | Causal model |
| 3. Imagining | "What if I had?" | "Would this customer have churned anyway?" | Causal model + individual data |
2. The Problem
Most machine learning — including deep learning and large language models — is fundamentally stuck on Rung 1. It can find correlations and make predictions, but cannot reason about interventions or counterfactuals.
What Rung 1 Can Do ✓
- ✓ Find patterns in data
- ✓ Predict outcomes given inputs
- ✓ Classify and cluster
- ✓ Recommend based on similarity
What Rung 1 Cannot Do ✗
- ✗ Answer "What happens if we change X?"
- ✗ Distinguish cause from correlation
- ✗ Reason about individual cases
- ✗ Explain why something happened
Correlation ≠ Causation. A system that only sees associations cannot answer: "What would happen if we changed X?" or "Would Y have occurred if X had been different?" These questions require understanding the underlying causal mechanism — not just the statistical relationship. No amount of data will bridge this gap.
Example: Your data shows that customers who receive discount emails have higher purchase rates. Should you send more discount emails?
- Rung 1 answer: "Customers who get discounts buy more. Send more discounts."
- Rung 2 question: "What happens if I send discounts to customers who weren't going to get them?"
- The problem: Maybe you're only sending discounts to customers who were already likely to buy. The correlation exists, but the causal effect might be zero — or negative.
3. Why This Matters
Business Decisions Require Rungs 2 and 3
Nearly every important business decision is causal, not correlational:
| Decision | Rung Required | Why Rung 1 Fails |
|---|---|---|
| "Should we raise prices?" | Rung 2 | Correlation shows high prices = premium customers. But what happens if YOU raise prices? |
| "Did this campaign work?" | Rung 3 | You need to know what WOULD have happened without the campaign |
| "Why did this customer churn?" | Rung 3 | Requires reasoning about this specific individual, not averages |
| "How much of this damage was storm-caused?" | Rung 3 | Must compute counterfactual: damage without the storm |
Companies are pouring money into correlation-based AI expecting causal answers. LLMs can generate text about causation — but they cannot actually reason causally. They will confidently answer "What would happen if..." by pattern-matching, not by understanding cause and effect. The answer will sound plausible and be meaningless.
The ROI is not in bigger models or more data. It's in building causal models — Structural Causal Models, Bayesian Networks — that can actually answer Rung 2 and Rung 3 questions.
4. What To Do
Identify Which Rung Your Questions Require
"What is?" = Rung 1. "What if I do?" = Rung 2. "What would have happened?" = Rung 3. Most important business questions are Rung 2 or 3.
Build Causal Models for Rung 2/3 Questions
Structural Causal Models, Bayesian Networks, causal DAGs. These require domain expertise to specify the causal structure — not just data.
Hire or Train Causal Inference Expertise
You need your people to understand Pearl's framework, the do-calculus, and how to build and validate causal models for your particular business.
- "When we say our model 'predicts' X, are we claiming correlation or causation?"
- "Can our AI answer 'What happens if we change X?' — or does it just find patterns?"
- "Do we have anyone who can build a causal model — or are we dependent on correlation-based ML?"
- "How many of our 'AI insights' are actually just correlations we're treating as causal?"
5. References
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press.
- Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
- Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal Inference in Statistics: A Primer. Wiley.
- Marcus, G., & Davis, E. (2019). Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon.
On the Limitations of LLMs
Leading AI researchers agree: LLMs cannot do causal reasoning. They pattern-match from training data. They can generate text about causation, but they cannot actually reason about interventions or counterfactuals.
- Judea Pearl (UCLA, Turing Award winner) — "All the impressive achievements of deep learning amount to just curve fitting."
- Yann LeCun (Meta Chief AI Scientist, Turing Award winner) — has characterized LLMs as a "hack" that lacks true understanding and world models.
- Gary Marcus (NYU Professor Emeritus) — "If you don't know what can cause a fire, or what happens when a bottle breaks, it's hard to make inferences about what is happening around you."
- François Chollet (Creator of Keras) — has argued LLMs lack true abstraction and reasoning capabilities required for general intelligence.