Why Causal Inference
Machine learning is good at prediction. But prediction is not enough. An introduction to thinking causally about data.
Machine learning can translate languages, recognize faces, and drive cars. It does this by learning patterns from data — given enough examples, a model learns to predict what comes next. This is remarkable and genuinely useful.
But prediction has a blind spot.
The hotel problem
Imagine you run a hotel chain. You look at your data and notice: when prices are high, more rooms are sold. When prices are low, fewer rooms sell. A predictive model trained on this data would suggest that raising prices leads to more sales.
This is, of course, wrong. Prices are high during tourist season, when demand is already high. The data shows a correlation between price and sales, but the causal relationship runs through a third variable — seasonality — that the model cannot see.
This is the central problem that causal inference addresses: the numbers we observe don’t directly answer the questions we care about.
We don’t want to know “what happened when prices were high?” We want to know “what would happen if we raised the price?” That’s a counterfactual — a question about a world we didn’t observe.
Correlation is not nothing
It’s fashionable to dismiss correlation as meaningless. It isn’t. Correlation tells you that two things move together. That’s real information. The question is why.
Move the slider. Both variables respond to temperature — not to each other. Banning ice cream would not prevent drownings.
Two variables can be correlated because:
- A causes B — the relationship is direct
- B causes A — the direction is reversed
- C causes both — a common cause (confounder) drives both
- Coincidence — with enough variables, some will correlate by chance
Machine learning treats all four the same. Causal inference distinguishes between them. This distinction is the difference between understanding the world and merely describing it.
What causal inference actually does
Causal inference is a collection of methods for answering “what if” questions using observational data — data where you didn’t control the experiment.
The core idea: if you can’t run a randomized experiment (and you usually can’t — you can’t randomly assign hotel prices, or education levels, or medical treatments to people), you need a design that accounts for the reasons the data looks the way it does.
Some of the key methods:
- Randomized experiments — the gold standard. Randomly assign treatment and compare outcomes. The randomization eliminates confounders.
- Regression with controls — adjust for confounders statistically. Simple but fragile — you need to know what all the confounders are.
- Instrumental variables — find a variable that affects the treatment but not the outcome directly. Use it as a lever to isolate the causal effect.
- Difference-in-differences — compare changes over time between a treated and untreated group. The parallel trends assumption does the heavy lifting.
- Regression discontinuity — exploit a sharp cutoff (e.g., test scores above/below a threshold) where treatment assignment is essentially random.
Each method makes different assumptions. None is universally correct. The skill is in matching the method to the structure of your problem.
Why this matters now
The current wave of AI is, at its core, a prediction wave. Large language models predict the next token. Image generators predict pixels. Recommendation engines predict clicks.
This is powerful, but it’s also limited. A model that predicts clicks cannot tell you why users click, or what would happen if you changed the interface. A model that predicts churn cannot tell you which intervention would reduce it.
As AI systems move from autocomplete to decision-making — in healthcare, policy, finance, hiring — the gap between prediction and causation becomes critical. A hiring model trained on historical data will replicate historical biases unless you explicitly account for causal structure. A policy model that confuses correlation with causation will recommend interventions that don’t work, or that make things worse.
Causal inference is not a replacement for machine learning. It’s the layer that tells you when your predictions are trustworthy enough to act on.
Where to go from here
This article is a starting point. Future pieces in this series will work through specific methods with real data and code. The plan:
- Randomized experiments — why randomization works and when you can’t use it
- The dangerous equation — why small samples lie, even in experiments
- Confounders and DAGs — drawing the causal structure of a problem
- Regression done right — what controls to include and which ones to avoid
- Natural experiments — finding causation in observational data
Each piece will include an interactive visualization, a worked example, and the code to reproduce it.
Further reading:
- Matheus Facure, Causal Inference for the Brave and True — rigorous, Python-based, free
- Nick Huntington-Klein, The Effect — clear writing, intuition-first
- Judea Pearl, The Book of Why — the philosophical foundation