Some model selection criteria for predictive power:
-
Prediction Stanrdard error.
-
criterion (cross-validation)
Measure the quality of the fit. As you imagine, there are several ways to measure “quality”.
We need a measure to compare the fitted predicted values to the truth.
Denote the loss function as . It measures the error in estimating with the estimate
The loss function \textbf{is a random variable}. The expected value of the loss function with respect to the \textbf{joint probability function} is called the \textbf{risk}:
[ E_{X,Y}[L(Y, f(X))] ]
A couple of things here:
The expectation is over the joint distribution in theory, but in practice, since we are tryint to use to predict we \textit{assume is given} so the expecation will be ralative to this distribution ie.
[ E_{Y|X}[L(Y, f(X)) | X =x] = \int_{\mathbb{R}} P(Y|X=x) dL(Y, f(X))]
Even these conditional probabilities are hard to wrap your mind around, we have to assume their distribution, often to uniform when we consider the \textbf{empirical} risk estimate:
[ \hat E[L(y_i, f(x_i))] = \sum_{\Omega} L(y_i, f(x_i))]
\subsubsection{Quadratic Loss/Squared error loss}
where the random variable estimated as a function of .
The risk of this loss function also has a special name: \textbf{MSE}
[ MSE(f) = E_{X,Y}[L_{SE}(Y, f(X))] ]
MSE
Conditional Expectation Theorem
If are two Random Variable with , then the function which minimizes for is given by the conditional expectation:
Absolute Error Loss
Minimized by the median