Optimal Taxation Literature
[ This page is my attempt at an unbiased review of the optimal taxation literature, much as one would get at a university. It is based on videos of lectures from Harvard, the cited papers, other reading. It is mostly based on this series of lectures Topic 4: Optimal Taxation Part 1. I largely ignore empirical considerations, preferring to focus this page on the purely theoretical models and their results. ]
TODO: Efficiency cost of Taxation (start here).
Lump Sum Transfers
The second fundamental theorem of welfare economics states that you can achieve any Pareto optimal outcome via only lumpsum wealth redistribution. For a proof see the Wikipedia page Fundamental theorems of welfare economics.
Unfortunately, this theory runs into serious problems when it makes contact with reality.
Probably the most fatal flaw is that it assumes the government has perfect information. In practice, governments generally want to transfer money from people with high earning ability to people with low earning ability, but they don't have a way to perfectly determine each person's earning ability.
Other problems come from the other, fairly strong, assumptions like that there are no transaction costs, that all actors have perfect information, and that there are no monopolies.
Nevertheless, this ideal has significant influence on the literature and you'll often see papers make claims about being the second best redistribution mechanism  the best is always implied to be lump sum transfers.
Consumption Taxes
Ramsey Model
The overall elasticity of a good is the percent equilibrium quantity changes due a percent change in price from taxes. You can compute this from the elasticity of demand ($\epsilon_d$) and supply ($\epsilon_s$) using this formula:
$$ \frac{\epsilon_d \epsilon_s}{\epsilon_d + \epsilon_s} $$Ramsey proved that if you're levying flat consumption taxes, then the tax on each good should be inversely proportional to the elasticity of that good to minimize deadweight loss A Contribution to the Theory of Taxation.
This system obviously ignores redistributive goals and is probably regressive since necessities tend to be inelastic while luxury goods tend to be elastic.
Diamond extends this result from optimizing merely efficiency to optimizing social welfare as a whole. He determined that the if we still restrict ourselves to flat consumption taxes, the optimal rate on each good should be multiplied by the average marginal utility for consumers of that good A manyperson Ramsey tax rule.
Land Value Tax
As a special case of the Ramsey rule, perfectly inelastic goods and services should be taxed to the greatest extent possible. This is the main impetus behind a land value tax  taxing the value of all land but not the value of any improvements made to it (e.g. buildings).
The logic is simply that since the supply of land is perfectly inelastic (well mostly Land reclamation), a tax on land has zero deadweight loss and is therefore optimal from an efficiency standpoint. In fact, from this perspective, we can levy a tax equal to the entire value of the land times the interest rate, effectively giving the government all income derived from unimproved land  all without any efficiency tradeoff.
Another reason the tax tends to be popular is that its progressive since the poor tend to own no land at all. A skeptic may think landlords will just raise rents, but since the supply is perfectly inelastic, the entire incidence of the tax should fall on landlords rather than renters.
Finally, if your unimproved plot of land gains value, it's because other people improved the land around you. This makes that value an externality and makes you essentially a rent seeker. A 100% land value tax removes this unfair value people get when they luck out and their land becomes more valuable through no actions of their own.
There are also arguments against:
 It's difficult to assess how much a piece of land would be worth without any improvements.
 If land values increase around you, you may be unable to afford the land value tax and will, thus, be forced from your home.
 A land value tax is unnecessary to the extent the AtkinsonStiglitz result holds (discussed later).
Income Tax
Laffer Curve (Flat Income Tax)
Instead of optimizing for social welfare, what if we want to optimize for tax revenue? For simplicity, we'll consider a flat tax example.
First of all, it is obvious that revenue will be zero at t=0 since there is no tax. It is nearly as obvious that revenue will be zero at t=1, since no one will work if the tax rate is 100%. Assuming some revenue can be raised, it therefore follows the revenuemaximizing rate is between 0% and 100%.
Let $\epsilon$ be the the elasticity of income with respect to $1t$. For instance, if $\epsilon=0.4$, then increasing taxes from 0% to 1% will cause income to decrease by 0.4%.
Given the above, we can show that maximum tax revenue is given by
$$ t_{max} = \frac{1}{\epsilon + 1} $$Using our arbitrary example of $\epsilon=0.4$, this implies a revenuemaximizing rate of ~71%.
The traditional interpretation is that you never want to have rates above the revenuemaximizing rate, since such rates harm both the government's ability to help society and also harm the person being taxed. In principle, (and maybe in practice), you could justify a higher tax rate if you actually want to discourage working.
The Mirrlees Model (Income Taxes)
Suppose consumption is given by $c = w L  T(w L)$, where $w$ is your wage, $L$ is your labor, and $T(w L)$ is how much you pay in taxes.
Also, suppose everyone has the same utility function $u(c, L)$ where $c$ and they're trying to maximize it:
$$ w(1T')\frac{\delta u}{\delta c} + \frac{\delta u}{\delta L} = 0 $$Finally, suppose there are $h(w)$ people with wage $w$ and that the government has its own social welfare function $G$ such that it is optimizing for
$$ \int_0^\infty{G(u(c,L)) h(w) dw} $$but it needs to raise enough revenue to match our expenditures, $E$:
$$ \int_0^\infty T(w L) h(w) dw \geq E $$Two notes:
 Conventional utilitarianism suggests $G(x)=x$. However, both unconventional utilitarians (e.g. Rawlsian ones) and conventional utilitarians who believe people optimize for the wrong things should choose a nontrivial $G$.
 This model is premised on the assumption that the only reason we don't perfectly redistribute income is due to behavioral responses. Some people argue that at least some people intrinsically deserve at least some of their income.
From this approach, a few general results have been proven:
 Taxes should be negative at low incomes.
 Taxes should positive at high incomes.
 Marginal tax rates should never be negative Seade.
 Marginal tax rates should never be above 100% (trivial since no one would work).
 The marginal tax rate on the very highest earner should be zero if the skill distribution is bounded Sadka. This result is generally believed to be inapplicable to reality as we'll discuss in a bit.
Suppose the maximum tax bracket has rate $\tau$ and is applied to all income above $\bar{z}$. Suppose there are $N$ people who earn above that and that their average income is $z_m$.
If I increase $\tau$, this
 increases tax revenue simply due to having a higher tax rate
 reduces tax revenue by discouraging labor
 reduces the welfare of these $N$ people by making them consume less
These three effects are called the fiscal effect, the behavioral effect, and the welfare effect. I won't really discuss them more, but they come up a lot in the literature and are the sources of some of the derivatives I merely state later on.
We can represent this mathematically using the derivative of social welfare with respect to $\tau$:
$$ N \left( (1g)(z^m  \bar{z})  \epsilon \frac{\tau}{1  \tau}z^m \right) $$where $g$ represents how much.
We can optimize by setting this derivative to zero to achieve Using elasticities to derive optimal income tax rates:
$$ \frac{\tau}{1  \tau} = \frac{(1g)(z^m/\bar{z}  1)}{\epsilon \cdot z^m / \bar{z}} $$Note, these results make intuitive sense:
 As $g$ increases, $\tau$ should decrease.
 As $\epsilon$ increases, $\tau$ should decrease.
 As $z_m/\bar{z}$ increases, income inequality increases, and $\tau$ increases.
A couple things to note:
 This formula is less precise than it appears. Most people believe, for instance, that changes to the top tax rate will affect $z^m$ and $\epsilon$

This result can also be used to show both (1) that the top earner should face a 0% marginal tax rate and (2) that this "proof" is completely inapplicable to reality.
Basically, for the top earner, as $\bar{z}$ approaches their income, the optimal $\tau$ goes to zero. Hence, if we were to implement a tax bracket just below the top earner's income, that bracket's rate should be 0%.
In English, the basic argument goes "if we can predict the top earner's income, we can introduce a new 0% tax bracket just below to encourage them to work more with virtually no revenue loss."
However, this entire paradigm is inapplicable to reality, because we cannot (in fact) predict the top earner's income very well. If, for instance, the top bracket's threshold is half the top earner's income, this entire argument utterly breaks down.
 As a special case, we can reproduce the Laffer curve result by letting $\bar{z}=0$ and $g=0$.
After proving this, Saez uses this model to argue the optimal top tax rate is between 50% and 80% Using elasticities to derive optimal income tax rates.
Let's generalize this to nonlinear income taxes.
Consider the people earning between $z$ and $z+dz$. If I change their marginal tax rate by a small amount, we can use the derivative of social welfare relative to their marginal tax rate to compute the change in their welfare:
$$ (1  H(z))  (1  H(z))G(z) + h(z) \cdot z \cdot \epsilon \cdot \frac{T'}{1T'} $$where $H$ is the CDF of income (as $h$ was the pdf).
When we set this to zero, we get
$$ \frac{T'(z)}{1T'(z)} = \frac{1}{\epsilon} \frac{1  H(z)}{z h(z)} (1  G(z)) $$Note, this result requires us to assume that labor decisions are only made based on the marginal tax rate. This is almost certainly false.
One note on the $(1H(z))/h(z)$ term. This term implies that when few people are earning $z$ income and lots of people are earning above $z$, the marginal tax rate at $z$ should be high. The intuitive reason is that our distortion is low (few people pay the high marginal tax), but it lets us raise effective rates on everyone above that point without discouraging their labor.
Although we can't elegantly solve for the entire tax system, we can solve for the best system computationally based on the current US tax system and income distribution. Saez did this Using elasticities to derive optimal income tax rates. The review of economic studies and found:
TODO: Diamond 1998, Piketty 1997
Discrete Labor Model
One of the problems with the Mirrlees model is that it assumes people can choose any number of hours, when, in reality, it is difficult to work, say, 4 hours per week. Instead, people are likely to respond to taxes and transfers by dropping in or reentering the workforce.
Suppose we have a finite number of jobs and each individual is trained for only one of those jobs and all people working the same job earn the same wage. Each individual can merely choose whether or not to work and the government can choose to levy a tax on each job. Finally, suppose that if aftertax income goes down by 1%, the participation rate of a job goes down by some fixed percent.
This model implies that work subsidies are optimal Optimal income transfer programs: intensive versus extensive labor supply responses  directly contradicting our earlier result that the marginal tax rate never be negative. This justifies programs like the EITC.
Saez finally combines both the Mirrlees and the discrete labor model into a single general model, but that is more complicated, so we won't go into it other than to say that this can still support negative marginal tax rates at low incomes.
Capital Gains Taxation
The Wikipedia page Optimal capital income taxation. In Wikipedia provides a really good overview  to the extent that I feel like I'm almost copypasting them at times, but I've seen similar collections of arguments in lectures, so I guess this falls under "common knowledge". I'm mainly including this section for completeness.Arguments Against
The Ramsey model implyes there should be no capital gains tax. The reasoning was given by Judd Judd and Chamley Chamley and is relatively straightforward:
 Suppose the interest rate is 5%, we levy a 10% capital gains tax, and you invest \$1 to spend later.
 If a consumption tax rate is $t$, then instead of being able to buy $A$ goods, I'll only be able to buy $A/(1+t)$ goods. Alternatively, if I can buy $B$ with the tax, then $t=A/B1$
 After 1 year, you have \$1.045 to consume, but you would have had \$1.05 to consume absent the tax. This implies a tax rate of (1.05/1.0451) ~ 0.48%
 After 100 years, you have \$81.59 to consume, but you would have had \$131.50 to consume absent the tax. This implies a tax rate of 131.50/81.591 ~ 61%
 After 1000 years, the implied tax rate grows to 11731%.
 etc. As time goes to infinity, the implied tax rate grows to infinity.
 However, the Ramsey formula implies the tax rate cannot be infinity for any good, so we have a contradiction.
From this we know that, in the long run, capital gains taxes must tend towards zero.
Another argument comes from the AtkinsonStiglitz theorem. As we showed above, capital gains taxes are equivalent to consumption taxes, so, to the extent you buy the AtkinsonStiglitz result, you should agree capital gains taxes should be zero.
Some economists have also invoked the DiamondMirrless production efficiency result Optimal Taxation and Public Production I: Production Efficiency by arguing capital is an input to production and therefore shouldn't be taxed Mankiw  this is disputed The Case for a Progressive Tax: From Basic Research to Policy Recommendations.
Finally, there's the more pragmatic arguments for taxing capitals income at a lower rate than labor income:
 Most countries already tax corporate profits, so taxing stock dividends/gains is effectively doubletax, which makes the rates higher than they naively appear.
 Taxes on capital income are taxes on nominal returns rather than real returns. For instance, the value I derive from a 10% bond is very different if inflation is 2% vs 12%. Lower tax rates can be justified as a sort of adhoc alternative to letting people deduct inflation from their capital income.
Finally, if we focus exclusively on corporate income taxes rather than taxes on dividends and capital gains, there are a variety of other arguments. One such argument is that if we assume the economy is open, then a tax on corporate profits will cause less investment in the country, reducing the marginal return of labor, causing labor incomes to fall. The capitalists within the country, on the other hand, will simply shift their investments abroad and not see their incomes change at all.
All that being said, it's not clear that capital is actually very mobile across countries Domestic savings and international capital flows (see also Feldstein–Horioka puzzle Equity home bias puzzle). On the other hand, economists do generally believe capital is getting more mobile over time.
Arguments For
Conversely, there are many arguments against a capital gains tax as well [TODO].
 Bernheim
 progressive income taxation  Golosov
 credit market imperfections  Aiyagari and (Farhi and Werning 2011)
 A theory of optimal capital taxation
More generally, are agents even rational when making saving choices?
What To Tax?
Production Efficency
The DiamondMirrless production efficiency result is basically a proof that in a competitive economy, the government shouldn't distort the inputs of firms, choosing instead to levy taxes only on the final goods and services Optimal Taxation and Public Production I: Production Efficiency.
I don't see this result as politically controversial, since I can't really much a nonrentseeking political motive to tax (or subsidize) inputs. Contrast this with taxing outputs, where there are lots of motives: don't tax food, tax luxuries, apply tariffs, etc.
TODO
 Optimal Inefficient Production
 Dasgupta
AtkinsonStiglitz
See here.
Odds and Ends
Tagging
The idea behind tagging is that we alter taxes and transfers based on people's immutable characteristics. For instance, we might (and do) give more money to people who have certain disabilities.
Akerlof showed that the optimal income tax system will make it so the average marginal welfare weight of the blind should equal that of the nonblind Akerlof. If, for instance, you define social welfare as $\ln(x)$, then the marginal social welfare is $1/x$. So, according to Akerlof, $\frac{1}{n}\sum{1/x_i}$ should be equal for blind and nonblind people. Equivalent logic holds for other immutable tags.
If, on the other hand, these tags are partially mutable, things get more complicated.
Now, politically, there are lots of characteristics that, though pretty much immutable, are still quite controversial to use. Examples include height, sex, and race. People have argued that the fact people don't want to tax these tags implies the overall model is wrong Optimal taxation in theory and practice. Two proposed corrections are
 That we value "horizontal equity"  that two people with the same abilities but different immutable characteristics should pay the same taxes
 That we should only use tags that cause higher income, not ones that merely correlate with them. Likewise, people seem more open to giving welfare based on things that cause it to be harder to make ends meet (e.g. # of kids, medical expenses) than to characteristics that merely correlate with the same thing.
Cash vs InKind Transfer
Naively, theory suggests governments should prefer cash transfers to inkind transfers (e.g. food stamps, housing vouchers, etc). The arguments for this is straightforward: (1) the person receiving the welfare knows what they need better than the government and (2) cash transfers are easier/cheaper for both the government and the recipient to handle.
However, some economists argue that inkind transfers have their place Nichols. For instance, suppose you have a soup kitchen that requires waiting in line to get soup.
A poor person might get 2 utils from soup and might lose 1 util from waiting, so they choose to wait in line for free soup.
A rich person might get 1 util from soup (they can easily afford better food) and lose 2 utils from waiting, so they choose not to wait in line for free soup.
In this way, inkind transfers can closely target the people who have more free time relative to income (i.e. the poor).
There are some theoretical situations where cash transfers work better and others where inkind transfers work better. Which is better in the real world depends a great deal on the specific situation.
Tax Incidence
One issue with periodic (i.e. monthly) welfare payments is that they cause a temporary surge in demand, which firms can take advantage of by raising prices.
For instance, even though it is illegal for companies to pricediscriminate based on whether purchases are made with foodstamps, the fact that foodstamps are paid out at the beginning of the month means that the foodstamps program causes a ~30% surge in demand for food during the beginning of the month in high foodstamp areas. Presumably for this reason, stores raise prices at the beginning of the month by ~2.5%. In this way, some of the social welfare created by food stamps makes its way to the stores rather than the intended recipients Hastings. Conversely, people who don't receive foodstamps are also hurt by the higher prices.
Note, that if everyone received food stamps (to destigmatize them), this effect would be significantly larger because (1) we'd see a larger surge and (2) while companies have limited ability to change their prices "optimally" now because even in high foodstamp areas, the vast majority of customers aren't using food stamps, this limitation would cease.
Conversely, this demonstrates that, today, legally requiring that food stamps be treated as cash largely prevents firms from taking reaping the benefits from this social program for themselves. In a similar way the fact that the EITC only applies to workers with kids helps prevent employers from taking its money; likewise, some people believe that EITC causes general wage cuts, so nonEITC workers end up being hurt Rothstein.
An alternative approach to estimate the effect of tax/welfare policy is to look at how asset prices change when a policy is announced. (see for example Friedman).
Finally, it's worth pointing out that mandates are very different than taxes. For instance, if the government implements a 10% payroll tax to pay for a healthcare program, we'd expect firms and laborers to both bear some of that tax while the unhealthy benefit. In the Mirrlees labor model, this will reduce total hours worked.
However, if the government mandates that employers pay for healthcare and make it illegal to discriminate based on health, then we'd expect (for the most part) employees to bear the brunt of the cost via lower pay, but we wouldn't expect their overall compensation to fall. Likewise, it's entirely possible in the Mirrlees labor model that total hours worked won't change.
In particular, if an employee values the mandated benefit at $\alpha$ times its cash cost, the distortion's size is $(1  \alpha)$ times what achieving the mandate with a tax would be.
Philosophical Considerations
Philosophically, you can think of redistributive concerns as "insurance behind a veil of ignorance". However, you can also justify progressive taxation as "insurance in front of a veil of ignorance." The basic idea is that if you face an adverse event (e.g. you become blind), a progressive tax system makes it so you have to reduce your consumption by less than you otherwise would have. See Varian for a more mathematical analysis.
Intertemporal Models
TODO 31:15 from Topic 5: Income Taxation and Labor Supply part 3.
Commentary
To stay "objective", I don't throw include my own comments on this page except for this section.
My main comment is that all the income tax literature derived from the Mirrlees model is utter bullshit. It's all based on the assumption that only marginal tax rates affect labor decisions, when this is laughably false.
The obvious counterexample is welfare: if I give everyone $100k for free, a large number of them will work less (if at all) even though marginal tax rates haven't changed.
I can only assume this assumption is used because achieving nice mathematical results using a more complete model is either exceptionally difficult or impossible, but how On Earth can economists be using this to justify policy recommendations?
The only exceptions to my ire are (1) that marginal tax rates should never exceed 100% and (2) Saez's optimal top tax rate result.
This is also, in my opinion, why there's the ludicrous consensus that lump sum taxes aren't distortionary, when they obviously are. For instance, if Alice is working 1 hour a week for \$10 and living of beans, rice, and propane, she's making \$520 per year. You can bet you soul that she'd work more if the government levied a \$1000 lump sum tax on her.
Seriously, God have mercy on their souls. The relentless obsession with marginal tax rates is the single biggest sin in economics.