Thoughts on Optimal Investment

The Efficient Market Hypothesis

Generally speaking, there are two types of investment: active and passive. Active investing entails spending time/money/effort to find good investments and good times to buy/sell to optimize your risk-adjusted rate of return, while passive investing entails buying one or more index funds and just continuously buying each month.

There is, admittedly a continuum to be seen here, but, by and large passive investment tends to be as good as (if not superior to) active investment within an asset class for the majority of people Passive management. This hypothesis is backed up by historic data, since low-fee index funds have outperformed most funds actively managed by investors Liu. All of this supports the idea that financial markets are generally efficient and trying to "beat" them is typically foolhardy.

All that being said, the fact that financial markets are typically efficient is not a slam-dunk argument against applying intelligence to investment. In particular, there is good evidence markets are not efficient between countries Equity home bias puzzle Backus–Kehoe–Kydland puzzle Feldstein–Horioka puzzle. Moreover, different utility functions (i.e. risk aversiveness) imply different asset class mixes in some pretty significant ways. For instance, as I discuss later, the only people who buy treasury bonds are people who are extremely risk averse - or rather people who think they're extremely risk averse because they're bad at understanding risk quantitatively. What's more, there's at least some evidence the EMH doesn't really apply across asset classes.

For these reason, I'm not going to talk about how to choose which particular stocks/bonds/etc to invest in. Instead I'm going to discuss the other aspects of investing: which indexes to use, how to allocate your savings between those indices to balance risk and return, and how to avoid large tax bills. If you're instead interested in "beating the stock market", a place to start is this dataset Marjanovic.


Any American who wants to understand optimal investment should have a solid understanding of the US tax code. I've listed the most important topics below, but this list isn't meant to explain any of these in-depth; it should instead be viewed as a good point to start your own research.

  1. Tax Advantaged Accounts - principally 401(k)s and IRAs Comparison of 401(k) and IRA accounts but also HSAs and FSAs. After maxing out these contributions, you can save even more tax-advantaged money via the "mega backdoor" conversion Mega Backdoor Roths: How They Work Technically, your house can be a tax-advantaged investment in that your mortgage interest may be tax deductible.
  2. Short Term Penalties - there are two main ways holding investments short-term is penalized by the US tax code. First, generally speaking, selling an asset less than a year after buying it will result in it being taxed as ordinary labor income. If you hold the asset for more than a year, it is generally taxed at the lower capital gains rates Topic No. 409 Capital Gains and Losses. Similarly, dividends are taxed as ordinary labor income by default, but if (a) you hold the stock for at least a couple months and (b) the stock is for a US corporation, then you can generally pay the lower capital gains rates Qualified dividend.
  3. Tax loss harvesting - There are a number of tax advantages for selling an investment at a loss. The first $3,000 you lose can be deducted from your ordinary labor income. After that, you can either use it to cancel out capital gains made during the year or carry it forward to future years. If you choose the latter strategy, then (with enough assets, discipline, and luck) you can leave somewhere around ~10% more of your wealth to someone/thing else when you die. Note: when selling for a tax loss, you can't repurchase the security for 30 days.

Measuring Success

There are, financially speaking, two epochs to your adult life: saving for retirement and spending during retirement.

Giving a fixed savings rate, the goal of the first epoch is to maximize your utility during retirement. Assuming you spend reasonably in retirement and assuming you have a typical utility-consumption elasticity of 0.35, that utility is given by

$$ u = -s^{-0.35} $$

where $s$ is how much you have saved when you retire. So, for investment pre-retirement, we want to find the asset mix that, historically, has optimized this utility function.

After you retire, the goals fundamentally change. Now you want to withdraw as much as possible while having minimal risk of going bankrupt. To determine this, we will find the asset mix that allows for the greatest withdraws while never, historically speaking, going bankrupt.

An Easier Metric

The most common way to weight returns and risk is via the Sharpe ratio, which is defined as

$$ \frac{E[r]-r_0}{\sqrt{Var(r)}} $$

where $r_0$ is the risk-free rate of return.

However, I think that this measurement is silly (a rare moment where I disagree with the expert consensus). An easy way to see why is to notice that the only way to get a negative Sharpe ratio is to have returns below the risk-free rate - an investment that earns 0.01% higher expected returns but is incredibly risk (say an all-or-nothing coin flip) ends up with a positive Sharpe ratio, suggesting its overall better than the risk-free option. This is obviously ridiculous.

The Sharpe ratio is based on the assumption that returns are normally distributed. This is wrong (log-returns skew negative and have fatter tails), but the model is still useful, and I'd like to suggest a different evaluation method based on the same assumption.

First, let our utility function be $-m^{-\epsilon}$, and then let $\mu$ and $\sigma$ be the mean and varianec of our returns, respectively. We can then prove that expected utility is proportional to

$$ e^\left(\mu - \epsilon \cdot \sigma / 2 \right) $$

This means choosing how to invest is equivalent to trying to optimize

$$ \begin{equation} E[R] - \frac{\epsilon \cdot Var[R]}{2} \end{equation} $$

where $R$ is real returns and $\epsilon \approx 0.35$.

To be clear, $R$ is referencing the distribution of returns over the time period of interest. For instance, if you're investing for 10 years, $R$ is the distribution of 10-year-returns.

Regarding the issue of kurtosis and skew in returns, I've only investigated kurtosis by using the Student T distribution. I found that increasing kurtosis from 3 to 4 was nearly equivalent to increasing $\epsilon$ by 4.5%. This is, in my opinion, quite small and suggests that the ignoring of kurtosis can be largely accounted for by just being a little more risk-averse.

I haven't investigate skew and probably won't for the foreseeable future because I'm lazy.

Asset Allocation

Since we're committed to passive investment, our main question is simply how many eggs to put in different baskets. Your answer to this is called your "asset mix" and people typically think of it as being a mix of the following:

  • Domestic stocks
  • Foreign stocks
  • Domestic corporate bonds
  • Foreign corporate bonds
  • Treasury bonds
  • Gold
  • Housing/apartments

Using some mathematical properties of expectation and variance, we adjust our earlier equation, which assumes a single investment from

$$ E[R] - \frac{\epsilon \cdot Var[R]}{2} $$

and generalize it to support multiple investments:

$$ \begin{equation} \sum_i^n{p_i \cdot E[R_i]} - \frac{\epsilon}{2} \left( \sum_{i,j}^{n,n}{p_i \cdot p_j \cdot Cov[R_i, R_j]} \right) \end{equation} $$

where $R_i$ represents the distribution of returns for an asset and $p_i$ represents the percent of our portfolio we have invested in that asset.

I know this formula looks really complicated, but its really just a multivariate quadratic, which means it can be efficiently solved with quadratic programming Quadratic programming.

[I found out later that this formula was actually derived decades ago by Harry Markowitz in his seminal paper Portfolio Selection Harry Markowitz. My ego is a little stoked to have re-derive something myself that was a large part of a Nobel prize seven decades ago.]

"Bootstrapping" Historical Data

There are two chief problems with using historical data. The first is obvious: past returns are not guaranteed of future returns. The second is less obvious: we don't actually have that much historical data. The oldest dataset I could find was compiled by Shiller Shiller (of Case-Shiller fame) and it only tracks two things: the SP500 and treasury bond interest rates and goes back to 1871. 149 years might seem like a lot, but we're interested in 40 year investment horizons, and 149 years only contains 3.7 independent 40-year periods. For this reason, we'll have to be more "creative" with our analysis.

An interesting thing to note here is that if 1-day returns are independent, then expected value and covariance both grow perfectly proportionally over time, which means the utility function we gave above is time-invariant. This means, we could use the daily expected-value and daily covariance to estimate so we can use the expected value and variance of daily returns or 10-year returns.

If, conversely, daily returns are not independent but (say) monthly returns are, we can use monthly returns instead.

For methodological reasons, the Shiller dataset is really only accurate for annual return analysis, so we'll have to turn to a different dataset. I obtained daily SP 500 total return data from Yahoo Finance S&P 500 (TR) (^SP500TR). Unfortunately, inflation is only estimated monthly rather than daily, so I ignored it for this analysis.

I computed the first four movements (mean, variance, skew, kurtosis) for returns every 1, 2, 3, 4, 5, 10, 20, 40, and 63 days. An important point to note is that I'm referring to trading days rather than normal days. For this reason, the 20-days and 63-days are very close to monthly and quarterly returns.

Naturally, the mean return grew linearly with the timespan (as is mathematically required) but I found a statistically significant reduction in variance as the time period increased relative to what independence would predict. This trend was mostly caused by shrinkage between 1-day and 3-days, but the sign persisted throughout the data with the exception of the step between 40 and 60 days.

I also found that skew changed monotonically from -0.44 for 1-day returns to -1.23 for 40-day returns and kurtosis generally decreased from 14.6 to 6.1.

Together, this makes using daily or weekly returns to bootstrap to long-term return dubious at best. Though, statistically insignificant, the same trends persisted through the 20- and 40-day returns, which makes even monthly returns suspect.

By the time we look at the difference between 40- and 63-day returns, the variance trend reverse, but not the skew or kurtosis trends. It's hard to know for sure, but since we know variance is far more important than kurtosis in determining asset optimality, it look like treating quarterly returns as independent is plausibly okay for long-run analysis.

In addition to this empirical support, we have a good reason to prefer quarterly returns a priori: companies announce earnings every quarter, which can be responsible for large stock swings those particular days. For this reason, earning call days probably can't be treated as coming from the same distribution as other trading days, which makes frequencies higher than theoretically suspect.

However, going forward, I'm going to ignore our ability to use quarterly returns for analysis since the oldest datasets I could find use only annual returns. The main point is that since quarterly are quite likely independent, annual returns are almost certainly independent.

Statistical Analysis

I put together real total annual returns for 5 indices from 1891 to 2015. I included (based on availability) returns on housing, S&P 500, gold, governement bills, and government bonds Shiller Karsten Compound Annual Growth Rate (Annualized Return) Jordà. If you want a convenient way to see this data, click here. Note I do not include foreign stocks or corporate bonds in this analysis because I couldn't find sufficiently old datasets.

In any case, I solved for the optimal asset allocation with all five asset classes I had data for using quadratic programming. I believe $\epsilon \approx 0.35$, but I've seen people suggest most investors assume $\epsilon \approx 1$, so I used $\epsilon = 0.5$ as a compromise. After solving, I found you should only have invested in the domestic stocks (the S&P 500) from 1891 to 2015.

The devil, however, is in the details. For instance, the best asset class depends a lot on the decade. More generally, the e/p ratio of the previous 5 years correlates with the subsequent 5-year returns for stocks but not housing:

Asset ClassSlope
S&P 500+1.35 (CI = +0.18 to +2.52)
Housing+0.18 (CI = -0.22 to +0.58)
SP500 - Housing1.17 (CI = +0.09 to +2.25)

This is actually an important point. The e/p ratio has averaged 4.6% in the last 5 years, whereas its historically averaged 7.4%. The facts and the slope above suggest that stocks will deliver ~3.6% expected lower returns in the next 5 years. This, in turn, has huge implications for the optimal investment portfolio, suggesting that you should only invest in real estate.

It makes sense that the e/p ratio predicts stock returns since the e/p acts as an effective long-term floor on stock returns. To see this, you just need to realize that any company that returns all profits as dividends will (in the long-run) have returns exactly equal to its e/p ratio. Assuming companies are profit maximizers, this proves the e/p ratio acts as a lower bound on returns in the long-run.

However, per the efficient-market hypothesis, we'd expect the e/p ratio to also predict other investment returns with similar strength. From the table we can see this, historically, hasn't been true.

Finally, I should admit that I don't think rational investors put money in gold or government bills/bonds. The reason is that even for very high values of $\epsilon$ those investments don't make sense given the available alternatives, and (as we'll get to in a bit) this holds equally true for retirement income. For this reason, while I do believe these markets are individually efficient, I don't think allocations between them are efficient. However, I do think reasonable people invest in both stocks and housing, so I'm only really interested in the discrepancy between how e/p predicts the S&P 500 and how it (doesn't) predict housing.

Pre & Post Retirement

All this analysis has assumed you have a fixed nest egg that you are neither adding money to nor withdrawing money from.

If you are still contributing money to your nest egg, then you are even more risk tolerant than our simple model investor, which makes the risky S&P 500 even more attractive relative to safe bonds and housing than the above analysis considered.

To determine the optimal asset allocation in retirement, I determined which allocation would have allowed you to withdraw the most fixed amount of money per year (in real terms) without ever running out historically after 40 years - a number knowing as the safe withdrawal rate (SWR). Given the 5 above investment options, the best allocation is 39% housing and 61% housing - an allocation with a SWR of 5.3%. Note, the S&P 500 alone has a SWR of just 4.2%, so this is a significant improvement.

We can do a bit better by changing our allocation over the course of our retirement. In particular, by starting at (0% sp500, 100% housing) and shifting to (80% sp500, 20% housing) over the 40 years, our SWR increases to 5.6%.

The reasoning behind the reversal is that how you do in retirement depends a great deal on when the market moves. In particular, two decades of steady growth followed by a huge drop is much better for you than a huge drop followed by two decades of steady growth (you can prove this to yourself by using a toy example). That's why optimal asset allocation is to hold safe investments during the early parts of your retirement and shift to riskier (i.e. higher income) growth later on.

Note: this recommendation goes against conventional financial wisdom which generally says to invest more conservatively as you grow older.


  • Learn how to avoid taxes.
  • While saving for retirement, invest your money solely the SP500 and housing, probably with an emphasis on the former.
  • After retiring, you can probably safely withdraw around ~5% of your initial assets each year (adjusting for inflation) if you start investing exclusively in housing and shift towards a more S&P 500-focused portfolio as you age.

Future Lines of Inquiry

  • Investigate how skew affects the formula instead of kurtosis.
  • Investigate how uncertainty in the estimated covariance matrix affects optimal asset allocation. A good jumping off point is Wishart distribution.
  • This paper Fundamental Credit Special has a table of decade-based US corporate bond returns back to 1900 on page 10. They also have a graph that strongly suggests the authors have access to annual returns, but I can't find them on the internet. They claim corporate bonds have averaged 2.2% real annual returns sine 1900. This is barely better than treasury bonds (1.5%), so I doubt non high-yield "junk" bonds are a sensible investment. As a sanity check on their data, they suggest equities have averaged 6.0% real returns, which is virtually identical from the returns given by Macrohistory. The other data I found on the topic was a graph of overall bond yields going back to 1926 When will we get back to average market returns. I've included that data (transcribed to numeric form) in the aforementioned spreadsheet. It generally also finds quite low returns (~2.5%), which confirms my general feeling that bonds are a poor investment.
  • I need to include international financial markets in the analysis. I recall seeing a dataset that cost a lot of money that had international returns going back about a century. I'm not willing to drop thousands of dollars on this, but this suggests that international returns also go back a long time - I just need to find them.
Wikipedia contributors. (2020, June 12). Passive management. In Wikipedia, The Free Encyclopedia. Retrieved 21:54, June 21, 2020, from Liu, B. Preston, H. (2018). SPIVA® Institutional Scorecard: How Much Do Fees Affect the Active versus Passive Debate? Marjanovic. B. (2017). Huge Stock Market Dataset. Shiller, J. Online Data Robert Shiller. Wikipedia contributors. (2020, February 2). Comparison of 401(k) and IRA accounts. In Wikipedia, The Free Encyclopedia. Retrieved 21:20, June 21, 2020, from Coombes A. (2020). Mega Backdoor Roths: How They Work. Nerd Wallet. Topic No. 409 Capital Gains and Losses. (2020). Internal Revenue Service. Wikipedia contributors. (2020, March 8). Qualified dividend. In Wikipedia, The Free Encyclopedia. Retrieved 21:37, June 21, 2020, from S&P 500 (TR) (^SP500TR). Yahoo Finance. Wikipedia contributors. (2020, June 26). Quadratic programming. In Wikipedia, The Free Encyclopedia. Retrieved 05:37, June 30, 2020, from CPI for All Urban Consumers (CPI-U). U.S. Bureau of Labor Statistics. Karsten. (2018). EarlyRetirementNow SWR Toolbox v2.0 - save your own copy before editing! (see also Compound Annual Growth Rate (Annualized Return). Jordà, Ò., Schularick, M., and Taylor, A. Jordà-Schularick-Taylor Macrohistory Database. Macrofinance & Macrohistory Lab. Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77-91. Wikipedia contributors. (2020, July 14). Harry Markowitz. In Wikipedia, The Free Encyclopedia. Retrieved 17:18, July 22, 2020, from Wikipedia contributors. (2020, June 4). Wishart distribution. In Wikipedia, The Free Encyclopedia. Retrieved 03:39, July 23, 2020, from Reid, J., Burns, N., Ademakinwa, A. (2008). Fundamental Credit Special: 100 Years of Corporate Bond Returns Revisited. Deutschce Bank. When will we get back to average market returns?. (2018). Vanguard. Wikipedia contributors. (2019, December 4). Equity home bias puzzle. In Wikipedia, The Free Encyclopedia. Retrieved 20:52, September 9, 2020, from Wikipedia contributors. (2020, June 23). Backus–Kehoe–Kydland puzzle. In Wikipedia, The Free Encyclopedia. Retrieved 20:52, September 9, 2020, from Wikipedia contributors. (2020, July 9). Feldstein–Horioka puzzle. In Wikipedia, The Free Encyclopedia. Retrieved 20:52, September 9, 2020, from