QR-decomposition

A product of an orthogonal and triangular matrix

The QR-decomposition is a decomposition of a matrix into a product of an orthogonal matrix and a upper triangular matrix.

Matrix decomposition

A decomposition of a matrix is simply the process of expressing a matrix as the product of two or more matrices. E.g., \[ A = X Y \]

This is similar to the problem of factorize integers or polynomials, which is why matrix decompositions are also known as matrix factorizations.

There are many useful decomposition methods, each has its own use case. They are crucial in numerical computations.

It may be difficult to appreciate the usefulness of these decompositions in classrooms. Once you see a large matrix from applications, you will understand why we need them.

LU decomposition
Cholesky/LDL decomposition
QR decomposition
Eigenvalue decomposition
Singular value decomposition (SVD)...

QR-decomposition (square case)

QR-decomposition is a special of decomposing a matrix that is particularly useful in solving linear least-squares problems. It is closely related to the "Gram-Schmidt process" that is used to create orthogonal basis.

For any square $n \times n$ matrix $A$, there exists an orthogonal $n \times n$ matrix $Q$ and an upper triangular $n \times n$ matrix $R$ such that \[ A = QR. \] This factorization is known as a QR-decomposition of $A$.

Recall that $Q$, being orthogonal, must be nonsingular.

Also, $R$ is nonsingular if and only if $A$ is nonsingular.

If $A$ is nonsingular and we require the diagonal entries of $R$ to be positive, then the QR-decomposition is unique.

What's the point?

It is inherently difficult to appreciate the value of the QR-decomposition (or matrix decomposition in general) in classroom-settings.

The real usefulness only become apparent when dealing with very large matrices (real-world applications) or matrices with floating-point entries (numerical analysis).

If it is known that $A = QR$, then the problem of solving \[ A \mathbf{x} = \mathbf{b} \] Can be turned into the problem \[ R \mathbf{x} = Q^\top \mathbf{b} \]

Explain why these two problems are equivalent (even when $A$ itself is singular), and why we may prefer the second formulation.

The true benefit is that orthogonal matrices have the best numerical condition, which is crucially important in numerical computations. You need to study numerical analysis to see what's going on.

QR-decomposition and linear least-squares

Recall that orthogonal linear transformations preserve vector norms. That is, \[ \| Q \mathbf{x} \| = \| \mathbf{x} \| \] for an orthogonal $n \times n$ matrix and any vector $\mathbf{x} \in \mathbb{R}^n$.

This observation gives us a very nice way for solving linear least-squares problems.

Suppose we know the QR-decomposition \[ A = QR. \] Why is the problem of minimizing \[ \| A \mathbf{x} - \mathbf{b} \|^2 \] is completely equivalent to the problem of minimizing \[ \| R \mathbf{x} - Q^{\top} \mathbf{b} \|^2 \]

Through QR-decomposition, general linear least-squares problems can be reduced to linear least-squares problems involving only upper triangular matrices, which are easy to solve.

Linear least-squares: Upper triangular case

Suppose \[ R = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \] where "$*$" are placeholders for potentially nonzero numbers whose values are not important.

The "$*$"-notations is used widely in discussions of numerical computations so that we don't get too distracted by concrete numbers.

How should we minimize \[ \left\| \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} - \begin{bmatrix} * \\ * \\ * \\ * \\ * \end{bmatrix} \right\|^2 \]

Assume $R$ is nonsingular, explain why the problem of minimizing \[ \| R \mathbf{x} - \mathbf{c} \|^2 \] is equivalent to the problem of solving a linear system.

QR and Gram-Schmidt process

The existence of QR-decomposition for a matrix is equivalent to the possibility of carrying out "Gram-Schmidt" process (See orthonormal basis for detail).