Understand Bayes Theorem (prior/likelihood/posterior/evidence)

Bayes Theorem is a very common and fundamental theorem used in Data mining and Machine learning. Its formula is pretty simple:

P(X|Y) = ( P(Y|X) * P(X) ) / P(Y), which is Posterior = ( Likelihood * Prior ) /  Evidence

So I was wondering why they are called correspondingly like that.

Let’s use an example to find out their meanings.


Suppose we have 100 movies and 50 books.
There are 3 different movie types: Action, Sci-fi, Romance,
2 different book types: Sci-fi, Romance

20 of those 100 movies are Action.
30 are Sci-fi
50 are Romance.

15 of those 50 books are Sci-fi
35 are Romance

So given a unclassified object,

The probability that it's a movie is 100/150, 50/150 for book.
The probability that it's a Sci-fi type is 45/150, 20/150 for Action and 85/150 for Romance.
If we already know it's a movie, then the probability that it's an action movie is 20/100, 30/100 for Sci-fi and 50/100 for Romance.
If we already know it's a book, then that probability that it's an Sci-fi book is 15/50, 35/50 for Romance.

Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?

Using Bayes theorem, we know that the formula is:

P(movie|Sci-fi) = P(Sci-fi| Movie) * P(Movie) / P(Sci-fi)

Here, P(movie|Sci-fi) is called Posterior,
P(Sci-fi|Movie) is Likelihood,
P(movie) is Prior,
P(Sci-fi) is Evidence.

Now let’s see why they are called like that.

Prior: Before we observe it’s a Sci-fi type, the object is completely unknown to us. Our goal is to find out the possibility that it’s a movie, we actually have the data prior(or before) our observation, which is the possibility that it’s a movie if it’s a completely unknown object: P(movie).

Posterior: After we observed it’s a Sci-fi type, we know something about the object. Because it’s post(or after) the observation, we call it posterior: P(movie|Sci-fi).

Evidence: Because we’ve already known it’s a Sci-fi type, what has happened is happened. We witness it’s appearance, so to us, it’s an evidence, and the chance we get this evidence is P(Sci-fi).

Likelihood: The dictionary meaning of this word is chance or probability that one thing will happen. Here it means when it’s a movie, what the chance will be if it is also a Sci-fi type. This term is very important in Machine Learning.

So why those probabilities are named like that, the observation time is a very important reason.

23 thoughts on “Understand Bayes Theorem (prior/likelihood/posterior/evidence)

  1. Akansha

    Beautiful explanation. It took me 45 minutes to find out this article and this is the best explanation ever.

  2. Dave Hatharian

    You have been incredibly helpful here! THANK YOU! I really appreciate the elegance of your example as it has COMPLETELY cleared up any misunderstanding I had.

  3. Armin

    “Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?”
    So the answer will be (30/100 * 100/150) / (45/100) which is around 0.44 right? Thank you very much for a nice explanation.

      1. Godavari

        Here we have to calculate but when we calculate porbality in bayees classifications is not required ( does not matter). Suppose we want to calculate P(y1/x) and p(y2/x)
        Ultimately need to find the highest value form the two
        As per formula p(x/y1)*P(y1)/p(x) and p(x/y2)*p(y2)/)p(x) .. check here denominator is same .. we need to calculate to p(×) to get highest value of the two…If a is greater than b .. then a/c also greater than b/c . Hope u got the point.

  4. Pingback: Quora

  5. Godavari

    Before going directly into formula .*i want to explain some thing to better to understanding of the theorem.
    P(x) is Probability p(y) porbality of b
    P(xand y) is p(x) and p(y should be happened when x happens) /?Or p(y) and p(x should be happned when y happens). So now ..

    P(x and y) is p(x.)*p(y/x) or p(y) *p(x/y)
    So p(x).P(y/x)=p(y)*P(x/y)
    Now p(y/x) =p(x/y)*p(y)/p(x)

  6. Jasper

    如果我们要求的是P(Sci-fi| Movie),P(Sci-fi| Movie)是不是就成了Posterior?
    If we instead want to calculate P(Sci-fi| Movie), will P(Sci-fi| Movie) then be our posterior in question?

    In some sense, it means conditional probability will either be likelihood or posterior depending on whether we know it or we want to test it?

    Similar analogy applies to evidence and prior?

  7. Ram Karki

    With the same example you presented above, if you gave me 45 sci-fi objects, I already know that 30 of them are movie types and 15 of them are book types. Thus, I can directly compute the probability that a sci-fi object is a movie by dividing 30 by 45. So, why would I need to apply the Baye’s theorem to get that value? Please help me understand this issue.


Leave a Reply

Your email address will not be published. Required fields are marked *