Intuitions behind Logistic Regression

Widely used in machine learning algorithms, logistic function is often described as a function “with nice properties.” It is smooth and at the same time provides a threshold structure that is useful for problems like binomial classification.

Take a look at this guy!

$f(x) = \frac{1}{1 + e^{-x}}$

In what situation might this function be useful. Well, imagine that you are a jury member who have to decide whether a person is guilty or not guilty. In order to reach a conclusion, you must take into account all the evidence and testimonies. After hours of hard thinking, there emerges a “feeling” in your mind, sort of like a measurement for the defendant’s level of innocence. This measurement involves a lot of variables, for example, the suspect’s fingerprint in the crime scene would significantly decrease this innocence level. Finally, if this measurement passes a certain threshold, your decision is guilty, otherwise not guilty.

Logistic function is such a mechanism. Given an input x, it can decide whether this input falls in class A or class B according to its vallue.

This is a very useful property for machine learning algorithms. The function can become more and more accurate in classification after training data to tweak the threshold and change how the input x is derived. Now, let’s not simply accept that logistic function is just a magical tool with a weird name. Let’s got a step furthera and explore how it is derived, which is actually pretty straightforward.