Bayes Theorem is a very common and fundamental theorem used in Data mining and Machine learning. Its formula is pretty simple:
P(X|Y) = ( P(Y|X) * P(X) ) / P(Y), which is Posterior = ( Likelihood * Prior ) / Evidence
So I was wondering why they are called correspondingly like that.
Let’s use an example to find out their meanings.
Suppose we have 100 movies and 50 books.
There are 3 different movie types: Action, Sci-fi, Romance,
2 different book types: Sci-fi, Romance
20 of those 100 movies are Action. 30 are Sci-fi 50 are Romance. 15 of those 50 books are Sci-fi 35 are Romance
So given a unclassified object,
The probability that it's a movie is 100/150, 50/150 for book. The probability that it's a Sci-fi type is 45/150, 20/150 for Action and 85/150 for Romance.
If we already know it's a movie, then the probability that it's an action movie is 20/100, 30/100 for Sci-fi and 50/100 for Romance. If we already know it's a book, then that probability that it's an Sci-fi book is 15/50, 35/50 for Romance.
Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?
Using Bayes theorem, we know that the formula is:
P(movie|Sci-fi) = P(Sci-fi| Movie) * P(Movie) / P(Sci-fi)
Here, P(movie|Sci-fi) is called Posterior,
P(Sci-fi|Movie) is Likelihood,
P(movie) is Prior,
P(Sci-fi) is Evidence.
Now let’s see why they are called like that.
Prior: Before we observe it’s a Sci-fi type, the object is completely unknown to us. Our goal is to find out the possibility that it’s a movie, we actually have the data prior(or before) our observation, which is the possibility that it’s a movie if it’s a completely unknown object: P(movie).
Posterior: After we observed it’s a Sci-fi type, we know something about the object. Because it’s post(or after) the observation, we call it posterior: P(movie|Sci-fi).
Evidence: Because we’ve already known it’s a Sci-fi type, what has happened is happened. We witness it’s appearance, so to us, it’s an evidence, and the chance we get this evidence is P(Sci-fi).
Likelihood: The dictionary meaning of this word is chance or probability that one thing will happen. Here it means when it’s a movie, what the chance will be if it is also a Sci-fi type. This term is very important in Machine Learning.
So why those probabilities are named like that, the observation time is a very important reason.
Nice and easy explanation..keep up good works…:)
Cool .. Thank You.
Beautiful explanation. It took me 45 minutes to find out this article and this is the best explanation ever.
You have been incredibly helpful here! THANK YOU! I really appreciate the elegance of your example as it has COMPLETELY cleared up any misunderstanding I had.
Very good terminology! Excellent example too!
“Right now, we want to know that given an object which has type Sci-fi, what the probability is if it’s a movie?”
So the answer will be (30/100 * 100/150) / (45/100) which is around 0.44 right? Thank you very much for a nice explanation.
Sorry it should be (30/100 * 100/150) / (45/150) which gives around 0.67.. Is that the correct answer?
You dont have to calculate P(Evidence) since its a constant :/
Here we have to calculate but when we calculate porbality in bayees classifications is not required ( does not matter). Suppose we want to calculate P(y1/x) and p(y2/x)
Ultimately need to find the highest value form the two
As per formula p(x/y1)*P(y1)/p(x) and p(x/y2)*p(y2)/)p(x) .. check here denominator is same .. we need to calculate to p(×) to get highest value of the two…If a is greater than b .. then a/c also greater than b/c . Hope u got the point.
Simple and great explanation! Helped get a very good intuition about the terminology!
Before going directly into formula .*i want to explain some thing to better to understanding of the theorem.
P(x) is Probability p(y) porbality of b
P(xand y) is p(x) and p(y should be happened when x happens) /?Or p(y) and p(x should be happned when y happens). So now ..
P(x and y) is p(x.)*p(y/x) or p(y) *p(x/y)
Now p(y/x) =p(x/y)*p(y)/p(x)
Nice article and a helpful comment! Thank you both
Great , thanks.
very helpful and easy
I have still doubt
如果我们要求的是P(Sci-fi| Movie)，P(Sci-fi| Movie)是不是就成了Posterior?
If we instead want to calculate P(Sci-fi| Movie), will P(Sci-fi| Movie) then be our posterior in question?
In some sense, it means conditional probability will either be likelihood or posterior depending on whether we know it or we want to test it?
Similar analogy applies to evidence and prior?
With the same example you presented above, if you gave me 45 sci-fi objects, I already know that 30 of them are movie types and 15 of them are book types. Thus, I can directly compute the probability that a sci-fi object is a movie by dividing 30 by 45. So, why would I need to apply the Baye’s theorem to get that value? Please help me understand this issue.