1 A bit of math housekeeping

\[\underset{n \times k}{\mathbf{X}}\]

In R, selecting row i in column j from matrix X is: X[i,j].

\[ \begin{equation} \mathbf{x} = \begin{bmatrix} x_1 & x_2 & \cdots & x_m \end{bmatrix} \end{equation} \]

\[ \begin{equation} \mathbf{x} =\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} \end{equation} \]

\[ \begin{equation} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}' = \begin{bmatrix} x_1 & x_2 & \cdots & x_m \end{bmatrix} \end{equation} \]

I will probably screw these up along the way, but I wanted to mention them up front to be on the same page!

2 Overview of covariance, correlation, and regression

We’ll use housing price data from Boston’s 1970 census to review important concepts in correlation and regression. This is a nice dataset for regression because there are many interdependent variables: crime, pollutants, age of properties, etc.

#example dataset from mlbench package with home prices in Boston by census tract
data(BostonHousing2)
BostonSmall <- BostonHousing2 %>% dplyr::select(
  cmedv, #median value of home in 1000s
  crim, #per capita crime by town
  nox, #nitric oxide concentration
  lstat #proportion of lower status
  )

n <- nrow(BostonSmall) #number of observations
k <- ncol(BostonSmall) #number of variables

#scatterplot matrix of variables
splom(BostonSmall)