Bayes Theorem is a very common and fundamental theorem used in Data mining and Machine learning. Its formula is pretty simple:

P(X|Y) = ( P(Y|X) * P(X) ) / P(Y), which is Posterior = ( Likelihood * Prior ) / Evidence

So I was wondering why they are called correspondingly like that.

Let’s use an example to find out their meanings.

## Example

Suppose we have 100 movies and 50 books.

There are 3 different movie types: Action, Sci-fi, Romance,

2 different book types: Sci-fi, Romance

20 of those 100 movies are Action. 30 are Sci-fi 50 are Romance. 15 of those 50 books are Sci-fi 35 are Romance

So given a unclassified object,

The probability that it's a movie is 100/150, 50/150 for book. The probability that it's a Sci-fi type is 45/150, 20/150 for Action and 85/150 for Romance.

If we already know it's a movie, then the probability that it's an action movie is 20/100, 30/100 for Sci-fi and 50/100 for Romance. If we already know it's a book, then that probability that it's an Sci-fi book is 15/50, 35/50 for Romance.

Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?

Using Bayes theorem, we know that the formula is:

P(movie|Sci-fi) = P(Sci-fi| Movie) * P(Movie) / P(Sci-fi)

Here, P(movie|Sci-fi) is called **Posterior**,

P(Sci-fi|Movie) is **Likelihood**,

P(movie) is **Prior**,

P(Sci-fi) is **Evidence**.

Now let’s see why they are called like that.

**Prior**: **Before** we **observe** it’s a Sci-fi type, the object is completely unknown to us. Our goal is to find out the possibility that it’s a movie, we actually have the data **prior(or before)** our **observation**, which is the possibility that it’s a movie if it’s a completely unknown object: **P(movie)**.

**Posterior**: **After** we **observed** it’s a Sci-fi type, we know something about the object. Because it’s **post(or after)** the **observation**, we call it **posterior**: P(movie|Sci-fi).

**Evidence**: Because we’ve already known it’s a Sci-fi type, what has happened is happened. We **witness** it’s appearance, so to us, it’s an **evidence**, and the chance we get this evidence is **P(Sci-fi)**.

**Likelihood**: The dictionary meaning of this word is chance or probability that one thing will happen. Here it means when it’s a movie, what the chance will be if it is also a Sci-fi type. This term is very important in Machine Learning.

So why those probabilities are named like that, the **observation time** is a very important reason.

Guichi Zhao喜欢用自己的话来解释

还有：

这篇文章也不错

purplechunPost author多谢推荐！

MoinNice and easy explanation..keep up good works…:)

Hari AnantharamanCool .. Thank You.

AkanshaBeautiful explanation. It took me 45 minutes to find out this article and this is the best explanation ever.

Dave HatharianYou have been incredibly helpful here! THANK YOU! I really appreciate the elegance of your example as it has COMPLETELY cleared up any misunderstanding I had.

SteenVery good terminology! Excellent example too!

purplechunPost authorThanks! 🙂

Armin“Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?”

So the answer will be (30/100 * 100/150) / (45/100) which is around 0.44 right? Thank you very much for a nice explanation.

ArminSorry it should be (30/100 * 100/150) / (45/150) which gives around 0.67.. Is that the correct answer?

rotesl1chtYou dont have to calculate P(Evidence) since its a constant :/

GodavariHere we have to calculate but when we calculate porbality in bayees classifications is not required ( does not matter). Suppose we want to calculate P(y1/x) and p(y2/x)

Ultimately need to find the highest value form the two

As per formula p(x/y1)*P(y1)/p(x) and p(x/y2)*p(y2)/)p(x) .. check here denominator is same .. we need to calculate to p(×) to get highest value of the two…If a is greater than b .. then a/c also greater than b/c . Hope u got the point.

Searene谢谢，对我帮助很大

Dhvanan ShahSimple and great explanation! Helped get a very good intuition about the terminology!

jacobli写的很清楚

Vinu Raja Kumar CGreat one…

Pingback: Quora

GodavariBefore going directly into formula .*i want to explain some thing to better to understanding of the theorem.

P(x) is Probability p(y) porbality of b

P(xand y) is p(x) and p(y should be happened when x happens) /?Or p(y) and p(x should be happned when y happens). So now ..

P(x and y) is p(x.)*p(y/x) or p(y) *p(x/y)

So p(x).P(y/x)=p(y)*P(x/y)

Now p(y/x) =p(x/y)*p(y)/p(x)

ParniaNice article and a helpful comment! Thank you both

AhmedGreat , thanks.

very helpful and easy

Imtiaz AhmadI have still doubt

Jasper如果我们要求的是P(Sci-fi| Movie)，P(Sci-fi| Movie)是不是就成了Posterior?

If we instead want to calculate P(Sci-fi| Movie), will P(Sci-fi| Movie) then be our posterior in question?

In some sense, it means conditional probability will either be likelihood or posterior depending on whether we know it or we want to test it?

Similar analogy applies to evidence and prior?

Ram KarkiWith the same example you presented above, if you gave me 45 sci-fi objects, I already know that 30 of them are movie types and 15 of them are book types. Thus, I can directly compute the probability that a sci-fi object is a movie by dividing 30 by 45. So, why would I need to apply the Baye’s theorem to get that value? Please help me understand this issue.