epistemology

Inferring Causation

This is a quick review of how researchers statistically infer causation.

  • Trials - when you perform different actions on different people and infer that changes seen in one group and not the other were caused by the treatment.
    • Randomized Controlled Trials - when a group of people are randomly divided into groups, each group receives a different treatment Randomized controlled trial. This is typically analyzed by a simple difference in means test. In the social sciences, these are often due to lotteries.
    • Quasi-Experiment - a trial without random assignment - typically one that occurs "naturally" such as lotteries, one state changing a law but not neighboring states, etc Quasi-experiment. These are often analyzed by a regression discontinuity design Regression discontinuity design.
    • Difference in Differences - You start with a (non-random) treatment group and a (non-random) control group. Observe the change in an outcome over time for each group. Assume the difference in these changes is caused by the treatment. Often used with matching Matching (statistics).
  • Efficient Market Outsourcing - Suppose Alice and Bob are running for president and Alice has a 80% chance of winning. If, on election day, that increases to 100% and, simultaneously, a prediction (or futures) market for unemployment rises by 1pp, it's reasonable to conclude that the expected effect of Alice becoming president is an increase in unemployment by 1.25pp. This can be and has been done more rigorously Bernanke.
  • Koch's postulates - the standard used in microbiology Koch's postulates.
  • Lack of Correlation - although I don't know of anyone who explicitly argues this, I see lack of correlation used implicitly as evidence for lack of causation all the time.
  • Correlation - Obviously correlation doesn't imply causation, but, if you control for sufficient confounding variables, it is often used as (weak) evidence for causation. Likewise, lack of correlation is often taken as evidence for lack of causation. See Instrumental variables estimation for an example of how combining a supposed model with correlational data can help infer causation. See here for a discussion of bounding casual effects with correlation.
    • Twin Fixed Effects - Suppose you notice that height correlates with income and wonder whether the former causes the latter. You know the reverse isn't true (virtually all your growing occurs before you earn a dime), but you're worried about confounders that might cause both. These confounders can be grouped into three buckets: genes, common environment, unshared environment. If you look at the (correlation) slope between height and income for identical twins, you remove two of those three categories of confounders, which makes the (correlational) estimated slope more likely to resemble the true causal effect. Similar reasoning applies when using sibling fixed effects (see also "Mendelian Randomisation").
    • Controlling for Parent Phenotype - If you assume a variable, X, is entirely caused by on parent phenotype and unshared environment (i.e. not by parent genotype or shared environment), then, when trying to determine whether X causes causes the child phenotype, it is sufficient to control for parent phenotype. Note: this type of model is fairly fragile to violations of its assumptions, and I don't generally recommend using it.
  • Granger causality - when one variable's past value predicts another variable's future value Granger causality.
  • Mendelian randomization Mendelian randomization
  • Twin studies

Further Reading

  • Lucas Critique Lucas critique
  • todo Angrist
Wikipedia contributors. (2021, January 18). Randomized controlled trial. In Wikipedia, The Free Encyclopedia. Retrieved 17:01, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Randomized_controlled_trial&oldid=1001234455 Wikipedia contributors. (2021, January 18). Quasi-experiment. In Wikipedia, The Free Encyclopedia. Retrieved 17:02, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Quasi-experiment&oldid=1001186924 Wikipedia contributors. (2021, January 14). Regression discontinuity design. In Wikipedia, The Free Encyclopedia. Retrieved 17:04, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Regression_discontinuity_design&oldid=1000379562 Wikipedia contributors. (2021, January 12). Granger causality. In Wikipedia, The Free Encyclopedia. Retrieved 17:08, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Granger_causality&oldid=999947336 Wikipedia contributors. (2020, December 28). Koch's postulates. In Wikipedia, The Free Encyclopedia. Retrieved 17:10, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Koch%27s_postulates&oldid=996676127 Wikipedia contributors. (2021, January 9). Instrumental variables estimation. In Wikipedia, The Free Encyclopedia. Retrieved 17:16, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Instrumental_variables_estimation&oldid=999221026 Wikipedia contributors. (2020, December 18). Lucas critique. In Wikipedia, The Free Encyclopedia. Retrieved 17:20, January 19, 2021, from https://en.wikipedia.org/w/index.php?title=Lucas_critique&oldid=994989737 Angrist, J., & Pisichke, J. (2014). Mastering 'Metrics: The Path from Cause to Effect. https://www.google.com/books/edition/Mastering_Metrics/s2eYDwAAQBAJ Wikipedia contributors. (2022, January 24). Mendelian randomization. In Wikipedia, The Free Encyclopedia. Retrieved 20:34, March 8, 2022, from https://en.wikipedia.org/w/index.php?title=Mendelian_randomization&oldid=1067682978 Wikipedia contributors. (2021, July 3). Matching (statistics). In Wikipedia, The Free Encyclopedia. Retrieved 22:35, March 10, 2022, from https://en.wikipedia.org/w/index.php?title=Matching_(statistics)&oldid=1031833362 Bernanke, B. S., & Kuttner, K. N. (2005). What explains the stock market's reaction to Federal Reserve policy?. The Journal of finance, 60(3), 1221-1257. https://doi.org/10.1111/j.1540-6261.2005.00760.x