Presidential Elections
State Vote Results
Processing
First, let's try to understand the reported vote results for each major presidential candidate for each state between 1992 to 2016 (see data MIT Election Data and Science Lab).
First, I threw out votes for candidates who weren't one of the main two; then, I computed the percent of the remaining votes cast for the Democratic candidate; finally, I passed the result through the inverse logistic function. This resulted in a score for each state/year. This inverse logistic function step is important, because otherwise the following analysis could result in a state voting more than 100% for one party.
National Bias
The simplest model is to assume that the score for a state in a particular year is given by
nationalBias[year] + stateBias[state]
However, it's possible some states "swing more" than others:
swingFactor[state] * nationalBias[year] + stateBias[state]
so using the 7 years between 1992 and 2016, I created 51 linear models, each relating scores of a state with the national score over the 7 elections. None of the 51 models could reject the null hypothesis that swingFactor = 1
.
So, we'll revert to the original model:
nationalBias[year] + stateBias[state]
State Bias Over Time
At this point, I ran PCA to determine the most important non-national factor and discovered that that factor was largely just the year the election was held.
Further analysis revealed that, on average, a state's score got ~2% more extreme every year. For instance, over a 4 year period, a state with a 10% margin would grow to a 10.8% margin.
However, unlike with nationalBias
, many (12) states' trends rejected the null hypothesis that this trend was homogenous.
Many of these rejections were quite strong - in particular, West Virginia's and Arkansas's scores dropped by 0.22 and 0.27 points respectively relative to the national average - moves about 10x larger than thee simple biasing model would have expected.