Inferring Causation
This is a quick review of how researchers statistically infer causation.
- Trials - when you perform different actions on different people and infer that changes seen in one group and not the other were caused by the treatment.
- Randomized Controlled Trials - when a group of people are randomly divided into groups, each group receives a different treatment Randomized controlled trial. This is typically analyzed by a simple difference in means test. In the social sciences, these are often due to lotteries.
- Quasi-Experiment - a trial without random assignment - typically one that occurs "naturally" such as lotteries, one state changing a law but not neighboring states, etc Quasi-experiment. These are often analyzed by a regression discontinuity design Regression discontinuity design.
- Difference in Differences - You start with a (non-random) treatment group and a (non-random) control group. Observe the change in an outcome over time for each group. Assume the difference in these changes is caused by the treatment. Often used with matching Matching (statistics).
- Efficient Market Outsourcing - Suppose Alice and Bob are running for president and Alice has a 80% chance of winning. If, on election day, that increases to 100% and, simultaneously, a prediction (or futures) market for unemployment rises by 1pp, it's reasonable to conclude that the expected effect of Alice becoming president is an increase in unemployment by 1.25pp. This can be and has been done more rigorously Bernanke.
- Koch's postulates - the standard used in microbiology Koch's postulates.
- Lack of Correlation - although I don't know of anyone who explicitly argues this, I see lack of correlation used implicitly as evidence for lack of causation all the time.
- Correlation - Obviously correlation doesn't imply causation, but, if you control for sufficient confounding variables, it is often used as (weak) evidence for causation. Likewise, lack of correlation is often taken as evidence for lack of causation. See Instrumental variables estimation for an example of how combining a supposed model with correlational data can help infer causation. See here for a discussion of bounding casual effects with correlation.
- Twin Fixed Effects - Suppose you notice that height correlates with income and wonder whether the former causes the latter. You know the reverse isn't true (virtually all your growing occurs before you earn a dime), but you're worried about confounders that might cause both. These confounders can be grouped into three buckets: genes, common environment, unshared environment. If you look at the (correlation) slope between height and income for identical twins, you remove two of those three categories of confounders, which makes the (correlational) estimated slope more likely to resemble the true causal effect. Similar reasoning applies when using sibling fixed effects (see also "Mendelian Randomisation").
- Controlling for Parent Phenotype - If you assume a variable, X, is entirely caused by on parent phenotype and unshared environment (i.e. not by parent genotype or shared environment), then, when trying to determine whether X causes causes the child phenotype, it is sufficient to control for parent phenotype. Note: this type of model is fairly fragile to violations of its assumptions, and I don't generally recommend using it.
- Granger causality - when one variable's past value predicts another variable's future value Granger causality.
- Mendelian randomization Mendelian randomization
- Twin studies
Further Reading
- Lucas Critique Lucas critique
- todo Angrist