It outperforms most models in binary classification tasks.

also supports multi-class classification but it often runs into convergence issues and will usually not outperform more complex, and uninterpretable, models like Regression Trees, Neural Networks, etc.

Interpretability

Logistic regression is an interpretable model, unlike a random forest, which is more of a black-box method. the name regression comes from the RHS of the following equation, as you must assume a linear relationship $X β$ , note that like any Regression, $X$ can be transformed to achieve polynomial regression

log (\frac{p}{1 - p}) = X β

logit function

$logit (p) = log (\frac{p}{1 - p})$

Recall that the logarithmic function is defined over the positive reals: $R an g e (l o g (x)) = {x ∣ x > 0}$ . So $l o g i t (p)$ , is only defined for $p$ such that $\frac{p}{1 - p}$ which ends up being $p \in (0, 1)$ (easy to see graphically, need a full analysis of asymptotes and inflexions to prove rigorously).

Mathematical principle

The logit function has a range of $(- \infty, \infty)$ and a domain of $(0, 1)$ , meaning it maps probabilities $p$ from $(0, 1)$ onto the real line. Its inverse function, known as the expit (or sigmoid) function, is perhaps more intuitive in the context of logistic regression: it maps numerical values from $X β$ onto the probability range $p \in (0, 1)$ , ensuring the outputs represent valid probabilities.

To recover the succcess/label 1 probability, p, we use the expit function:

$e x p i t (x) = \frac{e x p ( x )}{1 + e x p ( x )}$

$p = t e x t e x p i t (β_{0} + β_{1} x) = \frac{e x p ( β _{0} + β _{1} x )}{1 + e x p ( β _{0} + β _{1} x )}$

As you would expect, p is equal to a function that takes the linear model onto the range $0, 1$ which is fit for a probability.

ODDS

$o dd s = \frac{p}{1 - p}$

log odds is exactly what it sounds.

A coefficient can be interpreted as a log odds ratio of a one unit differnce in the direction of its $x$ variable:

$l o g i t (p_{i}) = β_{0} + β_{1} x + \dots + β_{p} x_{p}$

and

$l o g i t (p_{j}) = β_{0} + β_{1} (x + 1) + \dots + β_{p} x_{p}$

Then $l o g i t (p_{j}) - l o g i t (p_{i}) = β_{1}$

[

\beta_1 = log(\frac{odds_j}{odds_i})

]

For interpretation:

Exponential coefficients give the log odds ratio of one unit increase in the x variate of the coeficient.

So if $l o g i t (p_{a d mi ss i o n}) = 10 + .15 x_{a v er a g e}$

you can say: “on average, each 1% increase in average \textbf{increases the odds} of admission by \textbf{a factor of} $e x p (β_{1}) = e x p (0.15)$ , with all other variables held constant.”

So if the odds ( $\frac{p}{1 - p}$ ) of admission for an average of $x_{1} = 80$ are, say 7.38, then the revised odds for an average of $81$ are = $7.38 \times e x p (.15)$

WE can only make statements about odds. We cannot say anything about how the probabilities $p$ themselves will change when you increase one variate by 1 because the increase is non-constant.

\subsubsection{model evaluation}

Logistic regression is less prone to overfitting than other learning methods.

\begin{itemize}

\item Linearity assumption makes the model highly inflexible.

\item There is no tunning parameter by default… you could regularize.

\item Usually no test/train/cv split for logistic regression needed.

\item it is hard to beat logistic regression (bi class) in terms of prediction.

\item it is easier to beat linear regression or multi-class logisti regression.

\item Affected a great deal with multicolinearity.

\item Only has \textbf{linear} decision boundaries

\end{itemize}

🥷🌵 Juan Bello

Explorer

Logistic Regression

Interpretability

logit function

Mathematical principle

ODDS

Graph View

Table of Contents