Encode.
Query.
Answer.
The Vision
Institutionalizing causality — from expert knowledge
to computable counterfactual answers.
The theoretical foundations exist. The tools exist. What's missing is a general, human-curated causal model library with LLM-based natural-language I/O and real-time counterfactual pruning.
How It Works
| Function | Who Does It |
|---|---|
| Causal knowledge | Humans |
| Model storage | Library |
| Math | SCM engine |
| Language (interpretation, routing, translation) | LLM |
| Judgment | User |
The LLM does:
Question interpretation, variable grounding, model selection, subgraph routing, answer translation.
The LLM does NOT:
Compute probabilities, invent causal structure, estimate parameters, or answer without a model.
The LLM routes. The model reasons. Which avoids: hallucinated causality, black-box counterfactuals, and unverifiable reasoning.
Pearl's Abduction–Action–Prediction Loop
Abduction (model instantiation)
System does: Infer latent variables, condition on observed data, choose model slice.
"What hidden causes must have been true?"
Action (intervention)
Smoking = f(Genetics, U)
with:
Smoking := 0
"Force the world to behave differently."
Prediction (counterfactual outcome)
"What would have happened instead?"
End-to-End Flow
The Pipeline
Example Walkthrough
"Would this patient have avoided hospitalization if we had prescribed statins?"
| U Variable | → Child | Captures |
|---|---|---|
| U_TotalChol | TotalCholesterol | Individual cholesterol factors |
| U_LDL | LDLCholesterol | Individual LDL factors |
| U_Heart | HeartDisease | Unmeasured cardiac risk |
| U_Hosp | Hospitalization | Individual hospitalization propensity |
| U Variable | Abducted Value | Interpretation |
|---|---|---|
| U_TotalChol | 40 | Elevated individual cholesterol factor |
| U_LDL | 24 | Above-average LDL propensity |
| U_Heart | 10 | Higher-than-expected cardiac risk |
| U_Hosp | 4 | Moderate hospitalization propensity |
Identity is now fixed. These U values won't change when we intervene.
Counterfactual Result
The Model Library
Building Causal Models
This is the human-intensive part — and the most valuable. The causal structure becomes a reusable institutional asset.
| Source | What It Contributes |
|---|---|
| Domain experts | Core causal structure, edge directions, known mechanisms |
| Literature review | Established relationships, effect directions, confounders |
| Data exploration | Candidate relationships to validate with experts |
| Causal discovery algorithms | Suggested structures to review (not to trust blindly) |
Model Composition
A query spanning two models requires bridging variables:
If bridge exists → compose into single subgraph. If no bridge → reject query, explain which connection is missing.
Recommendation: Build modular, but audit holistically.
Query-Time Processing
Extracting the Counterfactual Subgraph
For counterfactuals, causal ancestry is required: identify intervention (X) and outcome (Y), include ancestors of Y, descendants of X, and confounders between them. Exclude descendants of Y.
Determining the Required Data
The rule: For each variable, data is required for it and all its parents — pulled from the data warehouse at query time.
| If the subgraph includes... | Data is needed for... |
|---|---|
| Hospitalization (parents: Cholesterol, Age) | Hospitalization + Cholesterol + Age |
| Cholesterol (parents: Statins, Diet) | Cholesterol + Statins + Diet |
| LungCancer (parents: Smoking, Genetics) | LungCancer + Smoking + Genetics |
Adding U Variables
One U variable per endogenous variable: V = f(Pa(V), U_V). U captures individual-specific factors, unmeasured causes, noise.
| Step | What happens to U |
|---|---|
| Abduction | Infer U values from observed evidence — "given what was observed, what must U have been?" |
| Intervention | Keep U fixed, modify the structural equation for X (replace with constant) |
| Prediction | Propagate forward with the original U values — "same person, different treatment" |
Interventions (query execution): Only on the target X. The other U's are inferred during abduction and held fixed — that's what makes it a counterfactual about this specific individual.
Components
| Component | Role | Examples |
|---|---|---|
| Global Model Store | Persistent storage for the master causal graph | .bayes files, graph database, JSON-LD |
| LLM Layer | Query parsing, subgraph selection, SCM construction, answer generation | Claude, GPT-4, fine-tuned models |
| Inference Engine | Parameter learning, abduction, counterfactual computation | Bayes Server, PyMC, DoWhy |
| Data Layer | Historical and current datasets for parameter estimation | Data warehouse, feature store |
| Orchestrator | Coordinates the pipeline, manages state | Python service, workflow engine |
Further Reading
Foundational texts on causal inference, counterfactuals, and LLM reasoning.