Matrix operations define how matrices interact with each other, with vectors, and with scalars.
These operations allow us to manipulate blocks of data, solve systems of equations, represent and combine linear transformations, and perform computations essential to machine learning models.
Core operations include matrix addition, scalar multiplication, matrix-vector multiplication, matrix-matrix multiplication, and transposition.
Definition: Adding or subtracting two matrices of the same dimensions by adding/subtracting their corresponding elements.
Rule: If \(\mathbf{A}\) and \(\mathbf{B}\) are both \(m \times n\) matrices, then \((\mathbf{A} + \mathbf{B})_{ij} = A_{ij} + B_{ij}\) and \((\mathbf{A} - \mathbf{B})_{ij} = A_{ij} - B_{ij}\). The result is also an \(m \times n\) matrix.
Definition: Multiplying a matrix \(\mathbf{A}\) by a scalar \(c\).
Rule: If \(c\) is a scalar and \(\mathbf{A}\) is an \(m \times n\) matrix, then \((c\mathbf{A})_{ij} = c \cdot A_{ij}\). The result \(c\mathbf{A}\) is an \(m \times n\) matrix where every element is multiplied by \(c\).
Definition: Multiplying an \(m \times n\) matrix \(\mathbf{A}\) by an \(n \times 1\) column vector \(\mathbf{x}\). The number of columns in the matrix (\(n\)) must equal the number of rows (components) in the vector (\(n\)).
Rule (Dot Product View): The \(i\)-th element of the resulting vector \(\mathbf{y}\) (\(y_i\)) is the dot product of the \(i\)-th row of matrix \(\mathbf{A}\) (viewed as a row vector) with the vector \(\mathbf{x}\).
$$ y_i = (\text{Row } i \text{ of } \mathbf{A}) \cdot \mathbf{x} = \sum_{j=1}^n A_{ij} x_j $$
Rule (Linear Combination View): The resulting vector \(\mathbf{y}\) is a linear combination of the columns of matrix \(\mathbf{A}\), where the coefficients are the elements of vector \(\mathbf{x}\).
$$ \mathbf{y} = x_1 (\text{Col } 1 \text{ of } \mathbf{A}) + x_2 (\text{Col } 2 \text{ of } \mathbf{A}) + \dots + x_n (\text{Col } n \text{ of } \mathbf{A}) $$
Significance: Represents applying a linear transformation (defined by \(\mathbf{A}\)) to a vector (\(\mathbf{x}\)), resulting in a transformed vector (\(\mathbf{y}\)). Core operation in solving \(\mathbf{Ax=b}\) and in neural network layers (\(\text{output} = \text{activation}(\mathbf{W}\mathbf{input} + \mathbf{b})\)).
Definition: Multiplying an \(m \times n\) matrix \(\mathbf{A}\) by an \(n \times p\) matrix \(\mathbf{B}\). The number of columns in the first matrix (\(\mathbf{A}\)) (\(n\)) must equal the number of rows in the second matrix (\(\mathbf{B}\)) (\(n\)).
Result: An \(m \times p\) matrix \(\mathbf{C}\). \(\mathbf{C} = \mathbf{AB}\).
Rule: The element \(C_{ij}\) (in row \(i\), column \(j\) of the result) is the dot product of the \(i\)-th row of matrix \(\mathbf{A}\) with the \(j\)-th column of matrix \(\mathbf{B}\).
$$ C_{ij} = (\text{Row } i \text{ of } \mathbf{A}) \cdot (\text{Col } j \text{ of } \mathbf{B}) = \sum_{k=1}^n A_{ik} B_{kj} $$
Properties:
NOT Commutative: In general, \(\mathbf{AB \neq BA}\) (BA might not even be defined or have different dimensions). Order matters!
Significance: Represents composing linear transformations (applying transformation B then transformation A). Fundamental to many algorithms and deep learning computations.
Foundation for computations in Linear Regression (Normal Equation involves \((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\)), Neural Networks, Principal Component Analysis (PCA), etc.
Matrix Addition/Subtraction: Element-wise, requires same dimensions.
Scalar Multiplication: Multiply every element by the scalar.
Matrix-Vector Multiplication (\(\mathbf{Ax=y}\)): Result \(y_i\) is dot product of \(\text{Row } i \text{ of } \mathbf{A}\) with \(\mathbf{x}\). Transforms vector \(\mathbf{x}\). Requires \(\text{cols}(\mathbf{A}) = \text{rows}(\mathbf{x})\).
Matrix-Matrix Multiplication (\(\mathbf{AB=C}\)): Result \(C_{ij}\) is dot product of \(\text{Row } i \text{ of } \mathbf{A}\) with \(\text{Col } j \text{ of } \mathbf{B}\). Composes transformations. Requires \(\text{cols}(\mathbf{A}) = \text{rows}(\mathbf{B})\). Not commutative.
Transpose (\(\mathbf{A}^T\)): Swap rows and columns. \((\mathbf{AB})^T = \mathbf{B}^T\mathbf{A}^T\).