Rectangular arrays of numbers.
Matrices are rectangular arrays of number with which we can carried complicated algebraic operations.
A matrix is simply a rectangular array of numbers.
E.g., \[ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \]
A matrix that has $m$ rows and $n$ columns is called a $m \times n$ matrix.
That is, we always follow the "rows-by-columns" convention when describing the size (dimension) of a matrix.
When using symbols to represent entries in a matrix, we write something like \[ \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{bmatrix} \]
Note the ordering in the subscripts: $a_{12}$ and $a_{21}$ represents two different entries.
If the number of rows and columns in a matrix are the same, i.e., a $n \times n$ matrix, then we call this matrix a square matrix.
E.g., \[ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]
As far as algebra is concerned, matrices of only one column behave just like column vectors.
E.g., \[ \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \]
Similarly, matrices of only one row behave just like row vectors.
E.g., \[ \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \]
We can add two matrices of the same size together simply by adding the corresponding entries.
\[ \begin{bmatrix} a & b \\ c & d \end{bmatrix} + \begin{bmatrix} x & y \\ z & w \end{bmatrix} = \begin{bmatrix} a+x & b+y \\ c+z & d+w \end{bmatrix}. \]
We do not allow matrices of difference sizes to be added together.
This definition is consistent with the way we define vector sums.
The zero matrix, of any size, is the matrix consists of zero entries.
E.g., a $2 \times 3$ matrix is \[ \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. \]
In general, a zero matrix is (dimension usually clear from context) \[ \mathbf{0} = \begin{bmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{bmatrix}. \]
Sum of the zero matrix with any matrix $A$ of the same size is still $A$: \[ \mathbf{0} + A = A + \mathbf{0} = A \] So it really behaves like the number "0" in the world of matrices.
E.g., for any (real) number $r$, \[ r \cdot \begin{bmatrix} a & b \\ c & d \end{bmatrix} = \begin{bmatrix} ra & rb \\ rc & rd \end{bmatrix}. \]
In general, for a scalar $r$, \[ r \cdot \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} = \begin{bmatrix} r \, a_{11} & \cdots & r \, a_{1n} \\ \vdots & \ddots & \vdots \\ r \, a_{m1} & \cdots & r \, a_{mn} \end{bmatrix} \]
$A$ and $rA$ always have the same size (for any scalar $r$).
As expected, $-A$ simply means $(-1)A$.
This is an operation that reflect entries of a matrix along the main diagonal (upper left to lower right).
\[ \begin{bmatrix} a & b \\ c & d \end{bmatrix}^\top \;=\; \begin{bmatrix} a & c \\ b & d \end{bmatrix}. \]
Similarly, \[ \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ \end{bmatrix}^\top \;=\; \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \\ a_{13} & a_{23} \\ \end{bmatrix}. \]
This operation simply turns rows into columns.
In general, the transpose of an $m \times n$ matrix $A$ is an $n \times m$ matrix denoted by $A^\top$, with $[ a_{ij} ]^\top = [ a_{ji} ]$. Clearly, $(A^\top)^\top = A$.
We can also define the product of a $2 \times 2$ matrix and a vector in $\mathbb{R}^2$.
\[ \begin{bmatrix} a & b \\ c & d \end{bmatrix} \, \begin{bmatrix} x \\ y \end{bmatrix} \;=\; \begin{bmatrix} ax + by \\ cx + dy \end{bmatrix}. \]
The two resulting entries are exactly the dot product between the two rows and the vector, respectively.
In general, we can multiply an $m \times n$ matrix (left) with a column vector in $\mathbb{R}^n$ (right) via the formula \[ \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} a_{11} x_1 + \cdots + a_{1n} x_n \\ \vdots \\ a_{m1} x_1 + \cdots + a_{mn} x_n \\ \end{bmatrix} \]
The result is a column vector in $\mathbb{R}^m$.
And the entries are exactly the dot products between the rows of the matrix and the vector.
This multiplication is only possible when the number of columns the matrix (left) has matches the number of entries in the vector (right).
For a $m \times n$ matrix $A$, a vector $\mathbf{v} \in \mathbb{R}^n$, and a real number $r$, it is easy to verify that \[ A (r \, \mathbf{v}) = r \, A \, \mathbf{v}. \] (The entries of the resulting vector are dot products)
Similarly, for a $m \times n$ matrix $A$ and two vectors $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$, we can also verify that \[ A (\mathbf{u} + \mathbf{v}) = A \, \mathbf{u} \, + \, A \, \mathbf{v} \]
Therefore, the function $\mathbf{v} \mapsto A \mathbf{v}$ is a linear function.
Indeed, we will show, later, that all linear functions between $\mathbb{R}^n$ to $\mathbb{R}^m$ can be represented by matrices-vector products in this way.
It is also easy to verify that for two $m \times n$ matrices $A$ and $B$ and a vector $\mathbf{v} \in \mathbb{R}^n$, \[ (A+B) \mathbf{v} = A \mathbf{v} + B \mathbf{v}. \]
Similarly, \[ (-A) \mathbf{v} = -(A \mathbf{v}) = A(-\mathbf{v}). \]
In a matrix-vector product, the order in which we write the factors is very important: For a $m \times n$ matrices $A$ and a vector $\mathbf{v} \in \mathbb{R}^n$, \[ A \mathbf{x} \] makes sense (as we have defined). However, \[ \mathbf{x} A \] does not make sense.
The $n \times n$ identity matrix, denoted $I_n$, is the matrix \[ I_n = \begin{bmatrix} 1 & & \\ & \ddots & \\ & & 1 \end{bmatrix}. \] (missing entries are $0$'s)
That is, it has $1$'s on the main diagonal and $0$'s elsewhere.
It has the very special property that \[ I_n \, \mathbf{v} = \mathbf{v} \] for any $\mathbf{v} \in \mathbb{R}^n$. It plays the role of "1", in matrix-vector products.
Whenever the dimension is clear from the context, we simply use $I = I_n$, an it is always assumed to be square.
Matrices are nice containers for data (they look like spreadsheets).
But their real usefulness: Connection to linear functions.
As noted earlier, each $m \times n$ matrix $A$ defines a linear function $f : \mathbb{R}^n \to \mathbb{R}^m$ given by \[ f(\mathbf{x}) = A \mathbf{x}. \]
In mathematics, the terms function, transformation, and map have similar meaning and are often used in inconsistent ways. So you will often see people use "linear transformation" or "linear map" instead.
The converse is also true: For every linear function $f : \mathbb{R}^n \to \mathbb{R}^m$, there exists a unique $m \times n$ matrix $A$ such that \[ f(\mathbf{x}) = A \mathbf{x}. \]
This statement falls apart when we enter infinite dimensional vector spaces.
That is, every linear function is associated with a unique matrix, and every matrix is associated with a unique linear function. Mathematicians call such special one-to-one correspondence a bijection.
There is no difference between working with linear functions (important in applications) and working with matrices (finite representation).