1 Matrix Algebra

In multiple regression, we are used to seeing a single regression equation such as:

\[ Y_i = \beta_0 + \beta_1 V_i + \beta_2 W_i + \beta_3 X_i + \varepsilon_i \] where \(i\) denotes the ith individual/observation. The goal of this tutorial is to provide some background that will review how such a regression equation can be represented by a set of matrices and vectors:

\[ \mathbf{Y} = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix} \beta_0 + \beta_1 V_1 + \beta_2 W_1 + \beta_3 X_1 \\ \beta_0 + \beta_1 V_2 + \beta_2 W_2 + \beta_3 X_2 \\ \vdots \\ \beta_0 + \beta_1 V_n + \beta_2 W_n + \beta_3 X_n \\ \end{bmatrix} \enspace + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{bmatrix} \]

If we factor out the parameters (\(\mathbf{\Beta}\)) from the data, we can see the difference between the model matrix and the parameter estimates:

\[ \mathbf{Y} = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix} 1 & V_1 & W_1 & X_1 \\ 1 & V_2 & W_2 & X_2 \\ \vdots \\ 1 & V_n & W_n & X_n \\ \end{bmatrix} \enspace \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{bmatrix} \]

2 Types of matrices

Remember that matrices are defined by rows (the first dimension) and columns (the second dimension):

\[ \underset{m \times n}{\mathbf{A}} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ a_{41} & a_{42} & a_{43} \end{bmatrix} \]

And a position in the matrix is specific by subscripting according to the row and column: \(a_{11}\).

2.1 Square

A square matrix has the same number of rows and columns. Covariance matrices are always square.

\[ \underset{n \times n}{\mathbf{A}} = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \]

2.2 Symmetric

A symmetric matrix a square matrix that is identical when transposed. That is, flipping the rows and columns has no effect. Another way to think of it is that the off-diagonal structure (upper triangle and lower triangle) is identical.

\[ \begin{align} \underset{n \times n}{\mathbf{A}} &= \begin{bmatrix} a & ab & ac & ad \\ ab & b & bc & bd \\ ac & bc & c & cd \\ ad & bd & cd & d \end{bmatrix} \\ \cr \mathbf{A} &= \mathbf{A}' \end{align} \]

This is pretty close to the structure we’ll see in much of the class – with \(ab\) representing some function of both \(a\) and \(b\) (e.g., covariance).

2.3 Diagonal

A diagonal matrix is a special case of a square symmetric matrix in which there are values along the diagonal, but zeros elsewhere:

\[ \begin{align} \underset{n \times n}{\mathbf{A}} &= \begin{bmatrix} a & 0 & 0 & 0 \\ 0 & b & 0 & 0 \\ 0 & 0 & c & 0 \\ 0 & 0 & 0 & d \end{bmatrix} \\ \cr \mathbf{A} &= \mathbf{A}' \end{align} \]

2.3.1 Matrix trace

The trace of a square matrix is the sum of elements along the diagonal:

\[ tr(\mathbf{A}) = a + b + c + d \]

Or more generally, if the matrix is \(n \times n\):

\[ tr(\mathbf{A}) = \sum_{i=1}^{n}{a_{ii}} = a_{11} + a_{22} + ... + a_{nn} \]

2.4 Identity

An identity matrix is a special case of a diagonal matrix in which the elements of the diagonal are all 1:

\[ \underset{n \times n}{\mathbf{I}} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]

Why would this be useful? Mostly it helps make matrix multiplication work, but for now, just remember that any matrix multiplied by an identity matrix is unchanged. Just like multiplying a number by 1:

Here’s a square matrix

A <- matrix(rnorm(25), nrow=5, ncol=5)
print(A)
##         [,1]    [,2]   [,3]   [,4]    [,5]
## [1,] -0.6208 -0.6728  0.655  1.458  0.3162
## [2,] -0.0968 -0.0921  1.111 -1.590  0.0229
## [3,]  0.8798  0.8030  0.898 -0.529 -1.0620
## [4,] -1.1098 -1.2248 -0.805 -0.277  0.9781
## [5,] -0.9037  1.0291 -1.073  0.337 -1.0197

And now multiplied by \(\mathbf{I}\):

A %*% diag(5)
##         [,1]    [,2]   [,3]   [,4]    [,5]
## [1,] -0.6208 -0.6728  0.655  1.458  0.3162
## [2,] -0.0968 -0.0921  1.111 -1.590  0.0229
## [3,]  0.8798  0.8030  0.898 -0.529 -1.0620
## [4,] -1.1098 -1.2248 -0.805 -0.277  0.9781
## [5,] -0.9037  1.0291 -1.073  0.337 -1.0197

3 Matrix addition and subtraction

Matrix addition and subtraction are straightforward. These operations are applied elementwise:

\[ \mathbf{A} = \begin{bmatrix} 10 & 5 \\ 9 & 1 \end{bmatrix} , \enspace \mathbf{B} = \begin{bmatrix} 2 & 1 \\ 20 & 0 \end{bmatrix}, \enspace \textrm{then } \mathbf{A}-\mathbf{B}= \begin{bmatrix} 8 & 4 \\ -11 & 1 \end{bmatrix} \]

Note that matrices must be of the same dimension (i.e., number of rows and columns) to be subtracted or added.

4 Matrix multiplication

Multiplication is more complex.

4.1 Multiplication of a matrix by a scalar value

To multiply a matrix \(\mathbf{X}\) by a (scalar) constant \(a\), one simply multiplies all elements of \(\mathbf{X}\) by \(a\):

\[ \mathbf{A} = \begin{bmatrix} 10 & 5 \\ 9 & 1 \end{bmatrix}, \enspace k=2, \enspace k\mathbf{A} = \begin{bmatrix} 20 & 10 \\ 18 & 2 \end{bmatrix} \]

4.2 Multiplication of a matrix by another matrix

Multiplication is a more complex operation when both objects are matrices. First, the order matters, such that \(\mathbf{AB}\) is not (usually) the same as \(\mathbf{BA}\). This gives rise to the terms ‘pre-multiplication’ and ‘post-multiplication’, though we don’t need those much in SEM. Second, if we are computing \(C = AB\), then the number of columns in A must match the number of rows in B:

\[ \underset{n \times k}{\mathbf{C}} = \underset{n \times p}{\mathbf{A}} \cdot \underset{p \times k}{\mathbf{B}} \] Thus, the resulting matrix \(\mathbf{C}\) has the number of rows of \(\mathbf{A}\) and the number of columns of \(\mathbf{B}\). Matrices that can be multiplied are called ‘compatible’ or ‘comformable.’ Matrices in which the inner dimensions (i.e., columns of \(\mathbf{A}\), rows of \(\mathbf{B}\)) do not match are called ‘incompatible’ or ‘non-conformable.’ These cannot be multiplied.

How does matrix multiplication work? One multiplies the elements of the ith row of \(\mathbf{A}\) by the elements of the jth column of \(\mathbf{B}\), then sums up these values into the ith row and jth column of \(\mathbf{C}\). Like so:

\[c_{ij} = \sum_{k=1}^{p} a_{ik} b_{kj}\]