Regression

In regression we are always modelling and not just since we are actually using the explanatory variable to fit .

The MSE is minimized by the conditional expectation . But we don’t really know how to get this, so we take a step back and assume the form of in the risk minimization problem (called OLS, RSSQ) and then solve for the coefficients using matrix methods.

In multiple regression we have some simplifying assumptions s.t.

Where is the design matrix.

OLS

We call the risk under the assumed linear model the \textbf{Residual Sum of Squares}. To estimate the vector of parameters using where represents a row on the design matrix where each column is a variate (rooms, bathrooms, parking spots) or an attribute (bathrooms squared ) of the th observation/unit.

As a function of the coefficient vector, our estimate for in quadratic matrix form is:

\bm \hat \beta = \argmin_{\bm \beta} (\bm y - \bm X \bm \beta)^T (\bm y - \bm X \bm \beta)

The OLS extimate for the coefficient vector is:

\paragraph{Sampling distribution}

The sampling distribution of the OLS coefficients estimator is

The variance estimator also has a sampling distribution following a chi-squared.

The test of hypothesis for uses the pivota quantity

[z_j = \frac{\hat \beta_j}{\hat \sigma \sqrt{v_j}} \sim t_{n-p-1}]

Where is the diagonal element of

Properties of the OLS estimator

\begin{itemize}

\item it is unbiased ie.

\item variance covariance matrix is . This means that the variability and thus length of confidence and prediction intervals depends on (which we don’t control that much) and on the design matrix which we can indeed control. The size of the variance will explode if explodes. This happens when columns of are \textit{correlated}, ie. you have \textit{redundant} information in the design matrix.

\item is estimated with the unbiased estimator recall is the number of exploratory variates.

\item a % \textit{confidence interval} for is where the Standard Error of the estimate is the square root of the corresponding diagonal element of the variance covarinace matrix .

\end{itemize}

You can also show OLS with maximum likelihood estimation.

\paragraph{Sum of squares decomposition}

We can show that TSS = SSReg + SSE

We consider the hat matrix which:

\begin{itemize}

\item is an \textit{orthogonal projection} matrix since it is indempotent.

\item is indempotent, so

\item the fitted values are the projection of the observed values in the training set y onto the linear subspace spanned by the columnds of \textbf{X}. ie

\item the estimated residuals are the corresponding orthogonal complement

\end{itemize}

Regression in R

\paragraph{lm}

Consider the output of . The values of the coefficients test the hypothesis , so small values mean the that the are useful/necessary and large ones mean that the variable is not adding much to the prediction or maybe it’s information is also captured by another correlated variable. if not a good fit.

The statistic value is the probability that all coefficients (except the intercept ) are zero and we can model better with just a constant .

If lots of coefficient values are large but the val of the statistic is small, this is a sign of colinearity.

\begin{itemize}

\item \textbf{Residual standard error} β€œ2.92” on degrees of freedom.

\item Multiple squared: percentage of variability explained by the model.

\item Adjusted squared

\item statistic . Overall significance of th model.

\end{itemize}

Polynomial