epistemology

Bayes' theorem: The Root of All Reasoning

[This post isn't a draft, but, upon further reflection, its fairly disorganized. I want to rewrite it with more focus on the definition of evidence and how to evaluate the strength of the evidence.]

Bayes' theorem in particular (and probability theory in general) offers the optimal way to reason under uncertainty. It is the “root of all reasoning” in the sense that an ideal reasoner would always change their beliefs according to these principles.

There are already lots of tutorials on Bayes' theorem on the internet. Some are formal encyclopedic articles Wolfram Math World., while others are youtube videos Dana Scheider; some are brief introductions Trinity University, while others contain in-depth, real-world applications Better Explained.

So why am I writing this? Because, as far as I’ve found, all the other explanations rely on formulas and abstract examples. In this explanation, you’ll never even see the Bayes’ theorem, and all our math will be basic arithmetic. Also, I‘m going to show you how you can use these principles without numbers to improve how you think critically about how the world works.

Example 1: Coins

To introduce the concept, let‘s start with a simple example. Let’s say that I own two coins. One is a fair coin; the other is a trick coin that has heads on both sides. I randomly choose one coin, flip it, and tell you that it landed heads-up. Assuming (for the sake of the example) that everything I told you is completely true, how likely is it that I chose the trick coin?

There are four steps:

  1. Determine the priors.
  2. Condition on the theories.
  3. Eliminate outcomes based on evidence.
  4. Normalize the probabilities, so they add up to 100%.

Step 1: Priors

Before we do any flipping, there is a 50% chance that I am flipping the normal coin and a 50% chance that I am flipping the trick coin. These probabilities are called the priors - the probability of a theory before looking at the evidence. So, if a rectangle represented all probability space, it’d be split evenly 50-50:

Fair Coin Trick Coin

Step 2: Condition

If we condition on (assume) it being the trick coin, then we know that the coin must lands heads-up. On the other hand, if we condition on me having the fair coin, then the coin is just as likely to land tails-up as heads-up. So, we can further divide our probabilities:

Fair Coin, Heads
(25%)
Trick Coin, Heads
(50%)
Fair Coin, Tails
(25%)

Step 3: Eliminate

Okay, finally, let’s use our new information: the coin landed heads-up. That means the yellow area of the table is false:
Fair Coin, Heads
(25%)
Trick Coin, Heads
(50%)

Step 4: Normalize

However, we’d like all the probabilities to add up to 100%. To accomplish this we normalize our distribution, which is just a fancy way of saying “multiply it by the right number to make it all add up to 100%.” In our case, multiplying everything by 1.333 does this:
Fair Coin, Heads
(33%)
Trick Coin, Heads
(67%)

And now you can see that the probability I flipped the trick coin is 67%, and the probability I flipped the fair coin was 33%.

I won’t go through all the math again, but if you think about it, you should be able to see that if the coin had landed tails-up, this would make the probability of it being the fair coin 100%, and the probability of it being the trick coin 0% - because, it’s impossible to get tails-up with the trick coin.

And that’s pretty much all there is to Bayesian reasoning. Let’s review the steps:

  1. List possible theories and their priors (how likely each theory is).
  2. Condition on each theory and compute how likely each outcome is.
  3. Eliminate the outcomes that didn’t happen.
  4. Normalize to make the probabilities add up to 100%.

Example 2: God

Now that we’ve seen Bayes’ theorem in theory, let’s apply it in practice to a similar problem. Imagine we’re trying to determine whether God exists by finding out whether sick people who are prayed for get better faster, so we conduct a study where people are randomly assigned to the prayed-for group or the not-prayed-for grouop.

Step 1: Priors

How likely is it that God exists? This is (obviously) a subjective question, and illustrates that probability theory doesn‘t do everything for you. It allows you to take your beliefs and update them in light of new evidence - it does not tell you what to believe to start with. To make the math easier, I’m going to say there’s a 50-50 chance of God existing, but if you want to start with other probabilities, you should be able to follow along with similar reasoning.

God Does Not Exist
(50%)
God Exists
(50%)

Step 2: Condition

Now, if you’re an atheist, you’d say the probability of this happening in the study is about 5% p-value, because it’s possible that, just by random chance, the group of people who are prayed for got better than people who weren’t prayed for and the 5% threshold is the uncertainty typically used in statistical analysis.

God Does Not Exists;
Health Improves (2.5%)
God Exists
(50%)
God Does Not Exist;
Health Unchanged (47.5%)

If you’re a theist, you have some wiggle room, as it depends on what exactly you believe. Again, to make the math easier, I’m going to assume that you think its a 50-50 chance of a study finding evidence of God answering prayers. If you want to try this with different probabilities, go for it!

God Does Not Exists;
Health Improves (2.5%)
God Exists;
Health Improves (25%)
God Does Not Exist;
Health Unchanged (47.5%)
God Exists;
Health Unchanged (25%)

Step 3: Eliminate

Now, if we do this study and we find that prayer does seem to improve people’s recovery rates, the probabilities become

God Does Not Exists;
Health Improves (2.5%)
God Exists;
Health Improves (25%)
God Exists;
Health Unchanged (25%)

Step 4: Normalize

Then, we normalize to get
God Does Not Exists;
Health Improves (4.8%)
God Exists;
Health Improves (47.6%)
God Exists;
Health Unchanged (47.6%)

So, we conclude there is a 4.8% chance of God not existing and a 95.2% He does exist. While I won’t go through the math again, if the study had found no effect, the probabilities would be a 66% chance of God not existing and a 34% of God existing.

Something to note is that these results don’t seem "fair". If the study finds prayer is effective, God’s odds of existing jump from 50% all the way up to 95%. If no effect is found, then God’s odds only drop a bit: from 50% to 34%.

There’s a moral to this story: human intuitions about "fair" critical thinking aren‘t always right. I don’t really think God exist, but his not answering prayers isn't particularly strong evidence supporting this conclusion.

Generalizing Probabilistic Reasoning

Okay, this is all nifty for math nerds, but why does this matter in real life?

Well, if you deal only with the probabilities 0 and 1, probabilistic reasoning simplifies into first-order logic Stanford Encyclopedia of Philosophy, which is famous for the whole “Socrates is a man; all men are mortal; therefore, Socrates is mortal.”

In other words, first-order logic is a special case of probability theory. This means probabilistic reasoning can solve literally every problem traditional logic can solve, and many more. So, to the extent that you think logic is useful, probabilistic reasoning is at least as useful.

Okay, fine. But, why bother with probabilistic reasoning if traditional logic is so much simpler?

The difference is rather straightforward: probabilistic reasoning can deal with uncertainty. Indeed, probabilistic reasoning forms the foundation of statistics, which has more-or-less taken over the hard and socials sciences alike. So, I’d say its practicality in understanding the world is well verified.

Moreover, I think many basic statistical concepts also improves your ability to think critically above and well beyond just analyzing statistics themselves. I think the best way I can help you see how probabilistic reasoning can improve your reasoning is to show you can use it to think critically without using explicit numbers.

Conditioning: Reasoning Without Numbers

Remember, we looked at 4 steps:

  1. List possible theories and their priors (how likely each theory is).
  2. Condition on each theory and compute how likely each outcome is.
  3. Eliminate the outcomes that didn’t happen.
  4. Normalize to make the probabilities add up to 100%.

I don’t really have much to say about steps (3) and (4). There is some nerdy interestingness regarding (1) Solomonoff's theory of inductive inference, but the main thing to know about choosing your priors is simply Occam’s razor: “Among competing hypothesis, the one with the fewest assumptions should be selectedOccam's razor.

Because of this, I want to focus on (2). I think the idea of conditioning is extremely powerful, because it

  1. encourages you to distinguish between objective causal relationships and your own values
  2. encourages explicitly dealing with your uncertainty, but gradually shifting your beliefs in response to evidence rather than flip-flopping endlessly or pigheadedly believing the same thing
  3. allows you to not only determine whether something is a valid argument, but how strong that argument is

A belief that makes no predictions is useless, both in the sense that it can't help you make decisions/plans and also in the sense that no evidence can be gathered for or against it. Conversely, a belief is useful only to the extent it makes predictions, an idea called paying rent Pay Rent.

Likewise, Bayes' theorem implies that if you want to know whether a belief is true the most important question you can ask is what does it predict. Converseley, the definition of evidence is a predictions whose likelihood changes if the the theory is true.

A Realistic Example

Let me give you an example. I think any reasonable person would agree that welfare reduces income inequality if you count welfare as income. However, I think some liberals believe that welfare also provides poor households with improved opportunities to increase their economic standing by (e.g.) going back to college, starting a business, or finding a better job.

The first thing we should note, is that the proper question is not whether welfare improves the opportunities of poor households, but to what extent welfare improves the opportunities of poor households. However, we’re going to ignore this detail for now, because once you go down that rabbit hole, you pretty much have to use statistics.

So, let’s try and figure out whether that’s true without explicit probabilities: by conditioning.

Imagine, first, that the liberals are completely correct, then what would we expect? Well, for instance, we’d expect countries similar to the US, but with greater welfare programs would have reduced pre-welfare income inequality, because the welfare gives the poor improved opportunities.

Imagine, now, that the liberals are completely wrong. Then, we’d expect no such difference in pre-welfare income inequality.

Now, unlike, in our previous examples, we can‘t give specific probabilities. It‘s possible that the liberals are correct but that cultural/social factors eliminate the benefits. It‘s also possible that the liberals are wrong, but we see differences anyways due to social/cultural factors.

The next step is to check whether European countries actually do have reduced pre-welfare income inequality.

If there turns out to be no difference, this is evidence for the conservative hypothesis; if there is a significant reduction, this is evidence for the liberal hypothesis.

Of course, the evidence isn’t proof; it could be that other social differences between the US and Europe mess up the numbers. However, your degree of belief should change after the answer is revealed.

We‘ll look more into this issue in another post, but for now, I‘ll just tell you that pre-welfare income inequality is no lower in the US than in European countries Gini in the bottle. Again, this isn‘t proof, but it is evidence. So if you still believe welfare reduces pre-welfare income inequality, you should have even stronger evidence going the other direction.

Fallacies

Finally, probabilistic reasoning provides the main justification for a huge variety of fallacies. To be more precise, most fallacies are just special cases of probabilistic reasoning. Here are some examples:

  1. First of all, every logical fallacy follow directly from logic, which is just a special case of probabilistic reasoning.
  2. The anecdotal fallacy is using a personal experience instead of compelling evidence. This is a fallacy because you can find anecdotal evidence for almost any theory, whether or not the theory is true or false. This means that anecdotal evidence doesn‘t really “cancel out” any probability space, making it not useful for having correct beliefs.
  3. The Argument form fallacy is incorrectly reasoning that because an argument for X is false, X must be false. However, you can come up with a bad argument for anything, so (again) this doesn‘t eliminate any probability space, meaning it‘s not useful for having correct beliefs.
  4. The Ad hominem fallacy is when you attack your opponent instead of their arguments. Because you can always attack your opponent, regardless of whether their theory is true or not, this also doesn‘t eliminate any probability space.
I could go on. The point is, fallacies are just rules-of-thumb; if you learn the underlying probabilistic reasoning, you don‘t need to memorize dozens of fallacies. Instead of identifying specific problems in arguments, you're be able to use a solid foundation to begin with.

Limitations

Probabilistic reasoning isn‘t magical, and it has it‘s limitations:

  1. Probabilistic reasoning can‘t invent your theories for you - that still requires creativity and an in-depth understanding of an issue.
  2. Probabilistic reasoning doesn‘t tell you what your priors are. How likely you think a theory is before you look at the evidence is purely subjective.
  3. Probabilistic reasoning is no substitute for scholarship. It doesn‘t give you evidence, it just let‘s you weigh it. You still have to take the time and effort to become well-informed. That‘s what most of this blog tries to accomplish.

All that being said, I hope I‘ve shown you the power of probabilistic reasoning. Although most people know how to tell if something is evidence or not, I am skeptical that we‘re every really taught how to weight evidence. This, ultimately, is what I hope you can do a little better now.

Better Explained. An Intuitive (and Short) Explanation of Bayes’ Theorem. Retrieved from https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/ Dana Scheider. Bayes' Theorem - Explained Like You're Five. Retrieved from https://www.youtube.com/watch?v=2Df1sDAyRvQ M, S. (2013, November 26). Gini in the bottle. The Economist. http://www.economist.com/blogs/democracyinamerica/2013/11/inequality-america Occam's razor. (2017, May 12). In Wikipedia. Retrieved 19:01, May 20, 2017, from https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=Occam%27s_razor&id=780029050 Solomonoff's theory of inductive inference. (2017, May 7). In Wikipedia. Retrieved 19:00, May 20, 2017, from title=Solomonoff%27s_theory_of_inductive_inference&oldid=779172590 Stanford Encyclopedia of Philosophy. Logic and Probability. Retrieved from https://plato.stanford.edu/entries/logic-probability/ Trinity University. An Introduction to Bayes' Theorem. Retrieved from https://web.archive.org/web/20160728072002/http://www.trinity.edu/cbrown/bayesweb/ Wolfram Math World. Bayes' Theorem. Retrieved from http://mathworld.wolfram.com/BayesTheorem.html p-value. (2017, May 6). In Wikipedia. Retrieved 18:57, May 20, 2017, from https://en.wikipedia.org/w/index.php?title=P-value&oldid=779078295 Yudkowsky. E. (2007). Making Beliefs Pay Rent (in Anticipated Experiences). Less Wrong. https://www.lesswrong.com/posts/a7n8GdKiAZRX86T5A/making-beliefs-pay-rent-in-anticipated-experiences