Linear regression models

Learning formula from data and predicting data from formula

Linear regression models are one of the most basic tools in statistics, data science, and machine learning. Essentially, they are linear/affine functions that have "best fit" for some data set.

Linear functions

One of the simplest class of functions are functions of the form \[ f(x) = a x, \] where $a$ are real numbers.

The graph of such a function is a straight line passing though the origin. This is a linear function.

It satisfies the properties that \[ \begin{aligned} f(x_1 + x_2) &= f(x_1) + f(x_2) & f(r x) &= r \cdot f(x). \end{aligned} \] We will take these to be the defining property and call any function that satisfy these properties linear functions.

Using these two defining properties, we can generalize the concept of linear functions.

A function $f : \mathbb{R}^n \to \mathbb{R}^m$ is a linear function if \[ \begin{aligned} f(\mathbf{x}_1 + \mathbf{x}_2) &= f(\mathbf{x}_1) + f(\mathbf{x}_2) & f(r \, \mathbf{x}) &= r \, f(\mathbf{x}). \end{aligned} \]

It is quite easy to understand the behavior of linear functions. In particular, all linear functions from $\mathbb{R}^n$ to $\mathbb{R}^1$ has a simple form that we already understand well.

Show that a linear function $f : \mathbb{R}^n \to \mathbb{R}^1$ must be of the form \[ f(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x} \] for some vector $\mathbf{a} \in \mathbf{R}^n$.

Affine functions

A function of the form \[ f(x) = a x + b, \] for some real numbers $a,b$ is called an affine function.

The graph of such a function is a straight line (that may or may not pass through the origin). Similarly, we can generalize this to any dimension.

A function $f : \mathbb{R}^n \to \mathbb{R}^m$ is an affine function if \[ f(\mathbf{x}) = L(\mathbf{x}) + \mathbf{b} \] for some linear function $L : \mathbb{R}^n \to \mathbb{R}^m$ and a vector $\mathbf{b} \in \mathbb{R}^m$.

Linear regression models

In our context, a regression model is simply a linear or affine function.

E.g., \[ f(x_1,x_2) = a_1 x_1 + a_2 x_2 + v \] where $a_1,a_2$ and $v$ are real numbers.

In statistics, the variables $x_1,x_2$ are also called regressors. The coefficients $a_1,a_2$ are weights, and the constant term $v$ is the offset.

The function value \[ \hat{y} = f(x_1,x_2) = a_1 x_1 + a_2 x_2 + v \] is called the prediction. (The "hat" notation is just part of the name of the variable).

Why is it called "regression"?

The term "regression" had an origin in biology. It was coined by Francis Galton in the 19th century in his study about the phenomenon that the heights of sons and daughters of tall parents tend to "regress" back towards an average.

Today, this important statistical phenomenon is known as the regression toward the mean.

Over time, this fundamental idea is applied to wider and wider area of scientific studies, and the term "regression" is no longer connected to the biological idea of regression.

More general construction

In general, a regression model, involving $n$ independent variables $x_1,\ldots,x_n$ is simply an affine function of the form \[ f(x_1,\ldots,x_n) = a_1 x_1 + \cdots + a_n x_n + v \] where $a_n,\ldots,a_n$ and $v$ are real numbers.

As before, the $x_1,\ldots,x_n$ are called regressors. The coefficients $a_1,\ldots,a_n$ are weights, and the constant term $v$ is the offset.

Using a vector notation, such a function can be written as \[ \hat{y} = f(x_1,\ldots,x_n) = \begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} + v \] where $\cdot$ is the dot product operation.

Even more general construction

It is even possible to construct regression models with multiple output (dependent) variables \[ \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} = f(x_1,\ldots,x_n) \]

where \[ \hat{y}_i = \begin{bmatrix} a_{i1} \\ \vdots \\ a_{in} \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} + v_i \] for $i=1,\ldots,m$.

The notation is very complicated. We will learn how to manage these complicated expression using "matrices".

How to construct regression models?

In math classes, we are often given formula for certain function and asked to perform interesting/boring calculations.

In the real world, the situation is reversed --- we almost always need to guess the "correct" formula from observations (data).

In the rest of this course, we will develop the language and concepts necessary for solving this problem.