chetty economics education inequality personal-finance

School Value Added

Previous Work

See here for a very quick overview of some previous work using similar data.

In particular The opportunity atlas: Mapping the childhood roots of social mobility:

As a simple method of assessing the potential explanatory power of schools, we examine the fraction of variance that is across tracts within high school catchment areas vs. between high school catchment areas.

[33] We assign Census tracts to high school catchment areas in 2017 using data generously provided to us by Peter Bergman on the intersection of Census tracts with high school catchment boundaries in 2017, obtained from Maponics (2017); see Online Appendix B for details. We match 71,720 tracts to school catchment zones, covering roughly 97% of the population. Since school catchment areas do not perfectly nest Census tracts, we assign tracts to the school catchment zone that contains the largest share of their land area. Using information on exact school catchment boundaries in Mecklenburg County, NC we estimate that only 9.6% of the population gets misclassified into the wrong school catchment area using this approach because high school catchment boundaries follow tract boundaries fairly closely (see Online Appendix Figure I).

Figure III shows that 28% of the total variance in outcomes – and about half of the local tract-within-county variation – can be explained by school catchment area fixed effects

[34] Insofar as there is spatial autocorrelation in outcomes across tracts for reasons unrelated to schools, this estimate likely provides an upper bound on the portion of the variance in outcomes that can be attributed to schools, since any randomly drawn set of contiguous tracts would share a common variance component in the presence of spatial autocorrelation. However, in the other direction, our use of 2017 high school catchment boundaries may lead us to understate the role of schools because they do not reflect the boundaries faced by children in our sample, who attended school in the 1990s and early 2000s. In practice, tract boundaries appear to be reasonably stable over time: 87% of tract pairs that fell on different sides of school catchment boundaries in 2002 in Charlotte did so in 2017 as well. Moreover, when examining variation in outcomes for more recent birth cohorts up to the 1989 birth cohort, we find no evidence that schools explain a larger share of the variance for more recent cohorts.

Hence, although a significant share of the tractlevel variation in outcomes could potentially be due to school effects, there is clearly substantial variation in outcomes even across neighborhoods among children who attend the same high school.

The referenced online appendix elaborates.

Data Sources

We will use the census-tract level data made publicly available by Opportunity Insights Data Library: Publicly available data we've produced and replication code, particularly Codebook for Table 4: All Outcomes by Census Tract, Race, Gender and Parental Income Percentile and Codebook for Table 9: Neighborhood Characteristics. For details on how it was computed, see The opportunity atlas: Mapping the childhood roots of social mobility.

For school data, I scraped data from GreatSchools. Future analyses should also use data from Niche Niche, and the National Center for Education Statistics (NCES).

To map school attendances boundaries (SABs) to census tracts, I used The Boundary Collection and the US Census Bureau's TIGER files TIGERweb Nation-Based Data Files. Chetty et al cite

Maponics (2017). Maponics School Boundaries. Pitney Bowes.

But this appears to require spending money unless you know Professor Peter Bergman.

Limitation: Temporal Availability

The census tract data provided by Chetty et al is based on 2010 census tracts, but the data comes from years ranging from 1990 to 2016 Codebook for Table 4: All Outcomes by Census Tract, Race, Gender and Parental Income Percentile Codebook for Table 9: Neighborhood Characteristics. Fortunately, the traits of census tracst are typically fairly stable over time (TODO: cite appropriate Chetty paper).

By necessity, the school GreatSchools and Niche data is recent. The NCES data is available to 1986 to present ElSi tableGenerator. While we can't realistically check whether GreatSchools or Niche scores are stable over time, we can do some basic tests for stability on the NCES data and the school shape files, which go back to 2009 The Boundary Collection.

The NCES has SABs for 2009-10 and 2015-16. The first dataset has 22,582 schools while the second has 72,867 schools. This discrepancy is because the former only contains a sample of 500 school districts. 19,732 schools are in both datasets, representing 87% of the original sample.

For each of these 19,732 schools, I computed the percent of the area that remained the same using this formula:

(area_15 + area_09 - 2*intersection) /(area_15 + area_09 - intersection)

In this way, a school whose boundaries didn't change at all scores a 0 on this metric, while a school that has completely changed scores a 1. If a district doubles in size or is cut in half, then it scores 0.50.

The distribution of this metric is

PercentileMetric
p990.959
p950.745
p750.243
p500.043
p250.007
p050.000
p010.000

So, essentially, a majority of SABs didn't change very much between 2009 and 2015, but a significant minority changed quite a bit. This is a limitation of this analysis.

Note that this issue plague's Chetty et al's analysis as well, since those authors use SABs from 2017, census tract boundaries from 2010, and data from a wide variety of years The opportunity atlas: Mapping the childhood roots of social mobility.

Limitation: Census Tracts - School Attendance Boundary Crosswalks

The topic of constructing crosswalks between two shape files is fascinating.

The most common approach is to simply use the amount of overlapping area, which is a well-understood purely mathematical problem. However, in practice, what we are often interested is a crosswalk based on populations rather than areas. This has been done before Ferrara, but generally assumes researchers have access to additional information.

I have lots of clever thoughts on how to do this properly using quadratic optimization, but I decided to take a simpler and hopefully less controversial approach for this analysis, because we are missing some significant data.

Results

Due to pragmatic and ethical limitations regarding scraping Great Schools, I limited my analysis to Los Angeles.

If p% of a school's attendance boundaries intersected with a certain census tract, I assumed p% of that school's students came from that census tract. From this, I computed the (weighted) mean Great School "overall" rating for each census tract. I repeated this for the other GreatSchool sub-scores (test scores, academic progress, equity, and college readiness).

I added many covariates provided by Opportunity Insights to the dataset. For my outcome, I used kir_white_male_p50 - that is, the average income percentile earned by white men in their 30s who grew up with the national median income within a specific census tract. However, I did perform a minor correction by penalizing census tracts with higher test scores as explained here.

Because of Chetty et al's work, we know that much of the variance in this proxy is true causal effect from living in a census tract. If common measures of school quality had a causal effect on adult earnings, we would therefore expect a positive correlation between those measures of school quality and this outcome variable.

I found that the most predictive four best variables for predicting tract-level income outcomes were average test scores in third grade, white population share of the census tract, poor population share of the census tract, and average jobs per square miles in the county - note that none of these are Great Schools measures.

In a model employing those four variables as controls, only one of the Great Schools variables was statistically significant (4.6%): equity. This did not survive correction for multiple hypothesis testing. The standard errors were small enough that we can confidently conclude that a 1 point increase on any of the measures (all 10 point scales) is associated with less than a 2% gain in adulthood income.

On the other hand, if we employ absolutely no controls, all controls are positively and statistically significantly associated with our outcome measure, except college readiness, which fails after correcting for multiple hypothesis testing. The strongest predictor is the academic progress rating, where 1 point increase is associated with a 2% increase in adulthood earnings (95% CI is 1-3%).

In other words, what we've found here is that schools rankings definitely correlate with income mobility, but these correlations disappear after controlling for basic neighborhood-level confounders.

Obviously, this data isn't conclusive in itself, but I think it generally supports the other evidence I've found.

Data Library: Publicly available data we've produced and replication code. Opportunity Insights. https://opportunityinsights.org/data/ Chetty, R., Hendren, N., Kline, P., & Saez, E. (2014). Where is the land of opportunity? The geography of intergenerational mobility in the United States. The Quarterly Journal of Economics, 129(4), 1553-1623. https://doi.org/10.1093/qje/qju022 2022 Best High Schools in America. Niche. https://www.niche.com/k12/search/best-high-schools Constructing the Opportunity Atlas: Methodology. (2018). https://opportunityinsights.org/wp-content/uploads/2018/10/Atlas_methods.pdf Chetty, R., Friedman, J. N., Hendren, N., Jones, M. R., & Porter, S. (2020). The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility. https://opportunityinsights.org/wp-content/uploads/2018/10/atlas_slides.pdf The Boundary Collection. National Center for Education Statistics. https://nces.ed.gov/programs/edge/SABS Chetty, R., Friedman, J. N., Hendren, N., Jones, M. R., & Porter, S. R. (2018). The opportunity atlas: Mapping the childhood roots of social mobility (No. w25147). National Bureau of Economic Research. https://doi.org/10.3386/w25147 Codebook for Table 4: All Outcomes by Census Tract, Race, Gender and Parental Income Percentile. Opportunity Insights. https://opportunityinsights.org/wp-content/uploads/2019/07/Codebook-for-Table-4.pdf Codebook for Table 9: Neighborhood Characteristics. Census Tract. https://opportunityinsights.org/wp-content/uploads/2019/07/Codebook-for-Table-9.pdf TIGERweb Nation-Based Data Files. United States Census Bureau. https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_nation_based_files.html Ferrara, A., Testa, P., & Zhou, L. (2021). New area-and population-based geographic crosswalks for US counties and congressional districts, 1790-2020. Available at SSRN 4019521. http://doi.org/10.2139/ssrn.4019521 ElSi tableGenerator. National Center for Education Statistics. https://nces.ed.gov/ccd/elsi/tablegenerator.aspx