chetty college-ranking

College Rankings: Reverse Engineering Chetty Et Al

[ Part of a sequence of posts constructing my own college rankings. ]

Context

This is one post in a series of posts cumulating in my own ranking of colleges. In addition to the previous posts in the sequence, you should read this post, before continuing.

As motivated in the previous post, we construct a model involving a student's parental income, a student's SAT score, and a student's college. This model is based on Figure III and Table III from Income segregation and intergenerational mobility across colleges in the United States, and on Tables X and XXII from the online appendix. Parent and child income is measured via percentile (range: 0 to 100), with the latter being measured between the ages of 32 and 34. SAT is measured on the 1600-point scale.

Basically, Chetty et al construct this model, but irritatingly do not report all the relevant parameters, so we have to attempt to reconstruct them here.

Without College Fixed Effects

From Table XXII, we know that, for all college-goers with an SAT/ACT score, we can predict their earnings with

E[k_rank] = 2.73 * (SAT/100) + ?

or

E[k_rank] = 2.41 * (SAT/100) + X * parent_income + ?

Unfortunately, Chetty et al don't report the value of X, so we'll have to figure it out ourselves.

We know the mean and standard deviation for percentiles are 0.50 and sqrt(1/12), respectively Continuous uniform distribution, and that the mean and standard deviation for (SAT/100) are ~10.6 and ~2.0, respectively Table 226.40.

If we knew (a) the slope between parent_income and k_rank or (b) the correlation between parent_income and SAT, we could compute X and thereby accurately estimate earnings using both variables.

Typically studies find the latter correlation to be around 0.25 Geiser. I also constructed a simple linear model based on Table X from the online appendix of Income segregation and intergenerational mobility across colleges in the United States and found that the implied correlation between SAT score and parental income was approximately r~0.3 - see here for the code.

From this, we can compute the implied X and the implied slope between parent_income and k_rank. It turns out the answers are:

E[k_rank] = 0.113 * parent_income + ?

and

E[k_rank] = 2.41 * (SAT/100) + 0.0673 * parent_income + ?

The slope of 0.113 for college-goers with an SAT/ACT score is interesting, because it is far lower than the parent-child income slope for the overall population: 0.288, which suggests that the parent-child relation is far stronger among non-college-goers than college-goers: around 0.6 versus 0.1. Is such a discrepancy really feasible?

I don't know. Fortunately, this section doesn't actually impact the rest of the analysis, and I'm keeping it mostly to (1) remind myself (and others) that there are some absolutely fascinating results here and (2) remind myself that there is something fishy going on here that I can't explain (yet?).

If I could solve this mystery, I could construct a model that predicts child income based on SAT score and parent income - that'd be nifty.

With College Fixed Effects

Ok, so we have a model where we predict child income based on parent income and one where we predict it based on SAT score, but there is a little fishiness. What happens if we combine them and throw in college FEs?

The short answer is that, while Chetty et al have done this, they haven't shared they haven't deigned to share the full results with us. The long answer is that they've given us enough information that we can make useful inference.

The main pieces of information are Figure III, Table III, and Table XXII from the online appendix. I reproduce the first here:

The relevant factoids are

  • Among the full population: E[k_rank] = 0.288 * parent_income
  • Among the full population: E[k_rank] = 0.139 * parent_income + college_fe
  • Among the full population: E[k_rank] = 0.125 * parent_income + college_fe * sat
  • Among the college-goers: E[k_rank] = 2.73 * sat
  • Among the college-goers: E[k_rank] = X * parent_income + 2.41 * sat
  • Among the college-goers: E[k_rank] = 0.100 * parent_income + college_fe
  • After controlling for parent income, race, gender, and college FEs, the slope between SAT score and child income is 1.27.

The above suggests that if we had a model restricted to college-goers that included SAT scores, parent income, and college FEs, it would look something like

E[k_rank] = 1.27 * (SAT/100) + Y * parent_income + college_fe

The big question is now what the value of Y is.

The first hint is that the slope would be 0.100 if the model didn't have SAT score in it. The second hint is that, among the full population, including SAT score into the model dropped the coefficient by 10% (0.139 to 0.125). Intuitively, then, we'd expect Y ~ 0.090. TODO: verify

College FE Estimation

Unfortunately, Chetty et al don't provide their estimated fixed effect coefficients for colleges. However, equipped with the above model, we will attempt recreate them in the next post...

Linearity

The linearity assumptions in this model are a simplification, but one I don't know how to avoid making in a principled way. Here, I'll put the evidence I've found against them.

First, Figure XI from the online appendix suggests there may be a small non-linearity relating SAT score to adult earnings.

Second, Figure IIIc (shown above) and appendix Table IV suggest that parent income matters less at elite colleges (0.065) than in general (0.100). If this is true, then my later rankings will improperly penalize elite colleges with high-earning parents while improperly favoring two-year colleges with low-income parents.

Unfortunately, Chetty et al don't give us a whole lot to go on in accounting for this lack of linearity. Really, the only thing useful is Figure IIIc from the paper.

Figure IIIc from the paper. Slopes represent within-college slopes between parent income rank and child income rank.

All things considered, this really isn't much, so I'll consider two very different yet simple models:

Model 1: Quadratic Income Model:

E[k_rank] = A * par_rank + B * sat - C * par_rank * par_rank + college_fe

Model 2: Interaction Model:

E[k_rank] = D * par_rank + E * sat - F * par_rank * sat + college_fe

Ignoring the two-year college group, the best fits for these models are

E[k_rank] = 0.2245 * par_rank + 1.27 * sat - 0.0009934 * par_rank * par_rank + college_fe

and

E[k_rank] = 0.2055 * par_rank + 1.27 * sat - 0.01053 * par_rank * sat + college_fe

Note, if you dislike these assumptions, take comfort in the fact that the ultimate FE estimates they yield correlate r>0.999 with each other, so this ends up barely mattering.

Chetty, R., Friedman, J. N., Saez, E., Turner, N., & Yagan, D. (2020). Income segregation and intergenerational mobility across colleges in the United States. The Quarterly Journal of Economics, 135(3), 1567-1633. https://doi.org/10.1093/qje/qjaa005 Geiser, S. (2015). The growing correlation between race and SAT scores: New findings from California. https://escholarship.org/uc/item/9gs5v3pv Table 226.40. SAT mean scores of high school seniors, standard deviations, and percentage of the graduating class taking the SAT, by state: 2017 through 2020. National Center for Education Statistics. https://nces.ed.gov/programs/digest/d20/tables/dt20_226.40.asp Chetty, R., Friedman, J. N., Saez, E., Turner, N., & Yagan, D. (2017). Mobility report cards: The role of colleges in intergenerational mobility (No. w23618). national bureau of economic research. https://doi.org/10.3386/w23618 Wikipedia contributors. (2022, June 4). Continuous uniform distribution. In Wikipedia, The Free Encyclopedia. Retrieved 17:10, July 6, 2022, from https://en.wikipedia.org/w/index.php?title=Continuous_uniform_distribution&oldid=1091406945