Understand Bayes Theorem (prior/likelihood/posterior/evidence)

Bayes Theorem is a very common and fundamental theorem used in Data mining and Machine learning. Its formula is pretty simple:

P(X|Y) = ( P(Y|X) * P(X) ) / P(Y), which is Posterior = ( Likelihood * Prior ) /  Evidence

So I was wondering why they are called correspondingly like that.

Let’s use an example to find out their meanings.

Example

Suppose we have 100 movies and 50 books.
There are 3 different movie types: Action, Sci-fi, Romance,
2 different book types: Sci-fi, Romance

20 of those 100 movies are Action.
30 are Sci-fi
50 are Romance.

15 of those 50 books are Sci-fi
35 are Romance

So given a unclassified object,

The probability that it's a movie is 100/150, 50/150 for book.
The probability that it's a Sci-fi type is 45/150, 20/150 for Action and 85/150 for Romance.
If we already know it's a movie, then the probability that it's an action movie is 20/100, 30/100 for Sci-fi and 50/100 for Romance.
If we already know it's a book, then that probability that it's an Sci-fi book is 15/50, 35/50 for Romance.

Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?

Using Bayes theorem, we know that the formula is:

P(movie|Sci-fi) = P(Sci-fi| Movie) * P(Movie) / P(Sci-fi)

Here, P(movie|Sci-fi) is called Posterior,
P(Sci-fi|Movie) is Likelihood,
P(movie) is Prior,
P(Sci-fi) is Evidence.

Now let’s see why they are called like that.

Prior: Before we observe it’s a Sci-fi type, the object is completely unknown to us. Our goal is to find out the possibility that it’s a movie, we actually have the data prior(or before) our observation, which is the possibility that it’s a movie if it’s a completely unknown object: P(movie).

Posterior: After we observed it’s a Sci-fi type, we know something about the object. Because it’s post(or after) the observation, we call it posterior: P(movie|Sci-fi).

Evidence: Because we’ve already known it’s a Sci-fi type, what has happened is happened. We witness it’s appearance, so to us, it’s an evidence, and the chance we get this evidence is P(Sci-fi).

Likelihood: The dictionary meaning of this word is chance or probability that one thing will happen. Here it means when it’s a movie, what the chance will be if it is also a Sci-fi type. This term is very important in Machine Learning.

So why those probabilities are named like that, the observation time is a very important reason.

23 thoughts on “Understand Bayes Theorem (prior/likelihood/posterior/evidence)”

1. purplechun Post author

多谢推荐！

1. Moin

Nice and easy explanation..keep up good works…:)

2. Hari Anantharaman

Cool .. Thank You.

3. Akansha

Beautiful explanation. It took me 45 minutes to find out this article and this is the best explanation ever.

4. Dave Hatharian

You have been incredibly helpful here! THANK YOU! I really appreciate the elegance of your example as it has COMPLETELY cleared up any misunderstanding I had.

5. Steen

Very good terminology! Excellent example too!

1. purplechun Post author

Thanks! 🙂

6. Armin

“Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?”
So the answer will be (30/100 * 100/150) / (45/100) which is around 0.44 right? Thank you very much for a nice explanation.

7. Armin

Sorry it should be (30/100 * 100/150) / (45/150) which gives around 0.67.. Is that the correct answer?

1. rotesl1cht

You dont have to calculate P(Evidence) since its a constant :/

1. Godavari

Here we have to calculate but when we calculate porbality in bayees classifications is not required ( does not matter). Suppose we want to calculate P(y1/x) and p(y2/x)
Ultimately need to find the highest value form the two
As per formula p(x/y1)*P(y1)/p(x) and p(x/y2)*p(y2)/)p(x) .. check here denominator is same .. we need to calculate to p(×) to get highest value of the two…If a is greater than b .. then a/c also greater than b/c . Hope u got the point.

8. Dhvanan Shah

Simple and great explanation! Helped get a very good intuition about the terminology!

9. jacobli

写的很清楚

10. Vinu Raja Kumar C

Great one…

11. Pingback: Quora

12. Godavari

Before going directly into formula .*i want to explain some thing to better to understanding of the theorem.
P(x) is Probability p(y) porbality of b
P(xand y) is p(x) and p(y should be happened when x happens) /?Or p(y) and p(x should be happned when y happens). So now ..

P(x and y) is p(x.)*p(y/x) or p(y) *p(x/y)
So p(x).P(y/x)=p(y)*P(x/y)
Now p(y/x) =p(x/y)*p(y)/p(x)

1. Parnia

Nice article and a helpful comment! Thank you both

13. Ahmed

Great , thanks.
very helpful and easy

14. Imtiaz Ahmad

I have still doubt

15. Jasper

如果我们要求的是P(Sci-fi| Movie)，P(Sci-fi| Movie)是不是就成了Posterior?
If we instead want to calculate P(Sci-fi| Movie), will P(Sci-fi| Movie) then be our posterior in question?

In some sense, it means conditional probability will either be likelihood or posterior depending on whether we know it or we want to test it?

Similar analogy applies to evidence and prior?

16. Ram Karki

With the same example you presented above, if you gave me 45 sci-fi objects, I already know that 30 of them are movie types and 15 of them are book types. Thus, I can directly compute the probability that a sci-fi object is a movie by dividing 30 by 45. So, why would I need to apply the Baye’s theorem to get that value? Please help me understand this issue.