Measurement Error and Correlations
[ Since writing all this, I've found that this idea comes up quite a bit in the scientific literature. The terms are "correlation dilution" and "correction for attenuation" Regression dilution. ]
Suppose there are two variables ($X$ and $Y$) and you are interested in their correlation. The rub is that you can't know these variables values directly; instead, you have estimates of them ($X'$ and $Y'$) that introduce independent noise. You can, if you choose, think of this noise as measurement error.
Let
- $r$ be the correlation between $X$ and $Y$.
- $r'$ be the correlation between $X'$ and $Y'$.
- $r_x$ be the correlation between $X$ and $X'$.
- $r_y$ be the correlation between $Y$ and $Y'$.
Here are two interesting facts:
$$ r' = r \cdot r_x \cdot r_y \cdot $$ $$ r = \frac{r'}{r_x \cdot r_y} $$
Why does this matter? Lots of reasons, especially in the social sciences where measurement error tends to be high.
One example is IQ. Most studies treat IQ scores as actual intelligence. In fact, test-retest correlations are typically r~0.9 for IQ tests, suggesting $r_x \cdot r_y \approx 0.925$. The correlation between identical twins' test scores are generally around 0.84, which implies a true correlation of 0.908 (0.84/0.925). Meanwhile, the correlation for fraternal twins' test scores is around 0.42; again correcting for measurement error increases this to 0.454.
Why does this matter? Well, assuming the test scores are IQ implies a heritability of 0.84 (i.e. 2*(0.84 - 0.42)). Assuming test scores are IQ plus measurement error implies a heritability of 0.908 (i.e. 2*(0.908-0.454)). This is hardly the only problem with naive heritability estimates of IQ, but it is a major one.
This pattern repeats itself across other heritability estimates (e.g. income and BMI): twin studies lump measurement error into the "unshared environment" category, which biases their estimates of heritability and shared environment downwards.