Chetty Mobility
[ This page is deprecated. See here for its replacement. ]Chetty Background
Raj Chetty is one of the most cited academic economists in the world and received tenure from Harvard at the age of 28 Raj Chetty. In Wikipedia, The Free Encyclopedia. He has published a number of (arguably) seminal papers on the causes of economic opportunity, with a particular focus on the use of centralized data including tax returns, standardized test scores, and federal financial aid documents. This data has allowed him to construct unparalleled datasets to explore the correlations and (hopefully) causations of opportunity to an extent never seen before in the US.
This page is dedicated to summarizing this work.
Intelligence & Parental Income
Chetty et al's analyses almost all convert income to a percentile before running any regressions. They do this, presumably, to avoid outlier issues that are fairly common at both the top and bottom of the distribution. TODO: this is actually explained in the Introduction of Where is the land of opportunity?.
Note: the percentiles are computed within each group - i.e. a child classified as 70th percentile is in the 70th percentile for the child sample, not the overall population.
One of the interesting things they've found is that the relationship between parental income percentile and child income percentile is roughly linear:
This is interesting, because it suggests that the relationship between parental income (or log-income) and child income (or log-income) are not linear, which is an assumption often made in economics papers.
More plainly, it suggests that there is not much of a difference between the kids of 99th percentile parents and 95th percentile parents, even though the former's parents make more than double the latter's.
Another interesting thing to note is that the observed correlation is almost exactly what you'd expect from twin studies.
From Table XXII of the online appendix of Income segregation and intergenerational mobility across colleges in the United States, we can predict child income with any of
E[income_percentile] = 0.29 * parent_income_percentile + c
E[income_percentile] = 2.73 * (SAT/100) + c
E[income_percentile] = 0.25 * parent_income_percentile + 2.41 * (SAT/100) + c
What I find most fascinating about all this is how small the effect is: suppose Genius McRich is a child from the richest household with a 1600 SAT score. We'd expect him to (on average) end up at the ~76th percentile, earning about ~80% more than average (see Table IV of the online appendix).
College
Chetty et al constructed a dataset and model where they predicted child income (age 32-34) based on race, ACT/SAT score, gender, parental income, and college fixed effects (FE) Income segregation and intergenerational mobility across colleges in the United States. The naive regression finds that 100 SAT points is associated with 2.73 percentile higher child income while the model with all the aforementioned controls finds the association drops to 1.27 percentiles (Table XXII of the online appendix).
A majority of this slope-drop occurs when college FE are added to the model - an observation consistent with the idea that one of the main vectors by which high intelligence confers greater income is by getting you into a good college.
Indeed, a couple years earlier, Chetty et al investigated the relationship between parental income and child income and found that college FE reduced the slope between the two by a whopping 65% (Table III and Figure III) Mobility report cards.
Unfortunately, neither paper reports the relationship between parental income and child income after controlling for both SAT and college FEs. The second paper does give the relationship with just college FEs: a 1-percentile increase in parental income is associated with a 0.1-percentile increase in child income.
Given that parental income only dropped the SAT-child-income association by ~20% in the first paper, it seems likely that the non-SAT-controlled results in the second paper are just a tad high, so the the slope between child and parent income percentiles in a model including all three variables is probably around 0.08.
In other words, a reasonable way to predict someone's income is
E[income_percentile] = 0.08 * parent_income_percentile + 1.27 * (SAT/100) + college_fe
One interesting thing to note about this is that it suggests roughly half of the benefits of having rich parents and being smart come from getting you into a good college. This is, perhaps, best shown visually here:
Remember Genius McRich? The one who ended up in the 72nd percentile? If he goes to an average college, this drops to the 61st.
The standard deviation of the SAT is about 210 points Table 226.40, so this suggests a 1-SD increase in SAT score is associated with a 2.7 percentile increase in income after accounting for parental income and college FEs. Meanwhile, log_income has a standard deviation of about 1, which suggests a 1-SD increase in parental income is associated with a 2.3 percentile increase in child income with the same controls. In other words, intelligence and parental income both seem roughlly equally important.
Unfortunately, Chetty et al don't provide their estimated fixed effect coefficients for colleges. However, they do provide some hints towards estimating it:
Chetty et al define an "elite" college as (basically) the top 6% of 4-year institutions. Figure III shows that, controlling for parental income, students of these colleges earn about 12.5 percentiles more than graduates of non-elite 4-year colleges. The actual causal effect is certainly smaller: we haven't controlled for SAT, among other things. Table VII shows that the effect shrinks by about 15% when some other things are controlled for and suggests the effect of moving up 1 SD in college eliteness causally affects your income by less than 6.5 percentiles.
As some context, Bryan Caplan claims
moving from the bottom to the top quartile raises male income by about 12% and female income by about 8%.
So his estimate is about half the size of mine - a discrepancy probably mostly driven by his looking at the benefit of starting college and me looking at the benefit of finishing it.
So, big picture, how does Genius McRich end up at the 72nd percentile in practice? He graduates from an elite school (z-score ~ 2).
Neighborhoods
Chetty et al created a linear model predicting child income percentile from parent income percentile, race, and gender within each census tract Where is the land of opportunity?. While this study is strictly correlational, they offer many reasons why a causal interpretation may be warranted.
So, practically speaking, what predicts a good neighborhood?
On a low level, you can just see the "Opportunity Atlas", which uses the data from this study to build heat map revealing which neighborhoods are the best The Opportunity Atlas.
On a higher level, Chetty et al actually provide the data used in the paper Data Library: Publicly available data we've produced and replication code (not individual tax returns, but averages across race, gender, and neighborhood).
Finally and most abstractly, Chetty et al provide their own conclusions in the paper itself, which is probably best summarized by Figure VIII:
They also find significant heterogeneity at the race level: neighborhoods can be good for one race or economic level and bad for another.
Another major finding is that these effects decay rapidly with distance, where only people in a child's immediate area appear to have a significant impact:
On the other hand, effects vary quite slowly across time: poverty rates in 1990 are nearly (91%) as predictive as poverty rates in 2000. This suggests that mobility estimates from historical data (e.g. this study) remain useful today.
One of the issues with this analysis is that, aside from income and race, it's really unclear whether the "effects" being described here are coming from neighbors or parents. That is, if you come from a two-parent household but literally all your neighbors come from one-parent households, how much should we expect to see your income drop? What if the reverse is true? Except for income and race, this study doesn't really yield much in the way of answers.
See here for some interesting robustness checks.
Perspective
All of the above is fascinating, but it's also worth keeping a little perspective. Despite some of these moderate effect sizes, the actually percent of variance explained by things like your childhood neighborhood or your college is typically less than 3%. This is to say, these factors can have a significant effect (e.g. > 20%) on your expected income and are useful in considering how to be successful, but they are not a significant source of overall society-wide income inequality. This observation is how we square all this with the observation that shared environment is a minimal cause of income society-wide inequality.
The chief exceptions are parental income and test scores, but even after these combined and the above factors, we only end up explaining around 14% of the variance in income.
Note the stark contrast between these punny values and the much higher correlations found between, say, single motherhood and neighborhood income-mobility. In plain English: we aren't great at predicting someone's individual income, but we are actually quite good at predicting the income of kids who group up in a neighborhood overall.
For more information on that topic, see here, but the basic gist is that most of income inequality is traced either to genes (including intelligence) or remains unaccounted for (i.e. unknown unshared environment factors).
If you've been stalking reading this website a lot, you might have noticed a bit of a discrepancy here: how can a majority of lifetime income inequality be due to genes while only ~8% of the variance in child income variance is explainable by parent income variance?
The answer to this largely boils down to two factors:
- Chetty et al only use a few years of income, not lifetime income.
- Because children are not clones of their parents, significant genetic noise is introduced between the genes that cause parental income and the genes that cause child income. This causes a dramatic reduction in how well parent income predicts child income, as can easily be seen in this naive simulation, where parent income variance explains only 12.5% of child income income, but genes are assumed to explain 50%.
Finally, I think two of the more interesting open questions are (1) to what extent success is due to ambition, charisma, grit, and luck and (2) to what extent the first three are controllable.
The big issue here is that we don't really have great measurements for any of these variables.
Schools
Chetty et al examine how much of the variance in census-tract outcomes might be explained by high school quality The opportunity atlas: Mapping the childhood roots of social mobility, concluding
Figure III shows that 28% of the total variance in outcomes – and about half of the local tract-within-county variation – can be explained by school catchment area fixed effects. Hence, although a significant share of the tract-level variation in outcomes could potentially be due to school effects, there is clearly substantial variation in outcomes even across neighborhoods among children who attend the same high school.
Common sense says that if you graduate from college, then the most significant vector by which your high school affects your future is by helping (or hurting) you get into a good college - after all few employers will care about the high school of a college graduate.
This narrative predicts that if you have a model with college FEs, including high school FEs should (a) add minimal predictive value and (b) not reduce the college FEs much - this is generally true (see Table VII Income segregation and intergenerational mobility across colleges in the United States).
Another narrative is that the main avenue for child success that parents control is what school their child goes to. Indeed, given the sheer number of (non-sleep, non-screen) hours spent at school, it seems intuitively likely that most of the neighborhood effects observed above are, in fact, school effects.
This may seem contradicted by the quote above, but note that it only includes high school FEs - it's entirely possible that that 28% figure would grow to nearly 100% if you also included elementary- and middle school FEs.
Chetty et al explore this somewhat (Figure VIII from Where is the land of opportunity?) and find that, after controlling for income, primary school test scores and high school dropout rates are each associated (r~0.6) with positive mobility between census tracts.
Natural questions to ask at this point include
- How much of the neighborhood FE variance would be explained if both variables were included together?
- How much variance would be explained with better measures of school quality (e.g. FEs for every school)? How much variance would be left to explain by other neighborhood factors?
Unfortunately, both questions are impossible to answer without access to data more fine-grained data, since there are more schools in the US than census tracts. Therefore, the best we can do is use variables as proxies for school quality and see how well they explain census tract outcomes.
I intend to do this analysis here.
Appendix: Race
This didn't really fit anywhere else, so I'm putting it here. Chetty et all published a paper focusing the relationship between race and income mobility Race and economic opportunity in the United States. There findings are best summarized by Figure V:
As you can see, after controlling for parental income, there is no income gap between black and white women. There is, however, a large gap between black and white men.
The authors throw all sorts of controls at the male gap, but only mange to shrink it about 30% (Figure VIII).
For the female gap, they investigate some progressive-counter-arguments but find them generally unconvincing, concluding
Conditional on parental income, black and white women have very similar wage rates, hours of work, and employment rates.16 These results suggest that the lack of an intergenerational gap in income for females is not entirely due to an income effect. In contrast, there are very large gaps in both wage rates and hours of work for men.
...
This is true not just for means: the entire distribution of black womens’ wage rates and hours of work is very similar to the corresponding distributions for whites, conditional on parent income (not reported). We also find that the occupational distributions of black and white women are similar conditional on parental income (Online Appendix Figure IV), suggesting that black women are not substituting toward occupations with lower amenities to obtain higher wages.
They do find quite a bit of heterogeneity regarding the black-white male income gap based on the neighborhood the child grew up in, including this fascinating graph:
Finally, some people believe the income gap is largely driven due to genetic differences between whites and blacks. The female graph should make these people reconsider since that genetic assumption ">implies a large gap should still occur even after conditioning on parental income (the gap should be roughly half the naive income gap).