Learning formula from data and predicting data from formula
Linear regression models are one of the most basic tools in statistics, data science, and machine learning. Essentially, they are linear/affine functions that have "best fit" for some data set.
One of the simplest class of functions are functions of the form \[ f(x) = a x, \] where $a$ are real numbers.
The graph of such a function is a straight line passing though the origin. This is a linear function.
It satisfies the properties that \[ \begin{aligned} f(x_1 + x_2) &= f(x_1) + f(x_2) & f(r x) &= r \cdot f(x). \end{aligned} \] We will take these to be the defining property and call any function that satisfy these properties linear functions.
Using these two defining properties, we can generalize the concept of linear functions.
It is quite easy to understand the behavior of linear functions. In particular, all linear functions from $\mathbb{R}^n$ to $\mathbb{R}^1$ has a simple form that we already understand well.
A function of the form \[ f(x) = a x + b, \] for some real numbers $a,b$ is called an affine function.
The graph of such a function is a straight line (that may or may not pass through the origin). Similarly, we can generalize this to any dimension.
In our context, a regression model is simply a linear or affine function.
E.g., \[ f(x_1,x_2) = a_1 x_1 + a_2 x_2 + v \] where $a_1,a_2$ and $v$ are real numbers.
In statistics, the variables $x_1,x_2$ are also called regressors. The coefficients $a_1,a_2$ are weights, and the constant term $v$ is the offset.
The function value \[ \hat{y} = f(x_1,x_2) = a_1 x_1 + a_2 x_2 + v \] is called the prediction. (The "hat" notation is just part of the name of the variable).
The term "regression" had an origin in biology. It was coined by Francis Galton in the 19th century in his study about the phenomenon that the heights of sons and daughters of tall parents tend to "regress" back towards an average.
Today, this important statistical phenomenon is known as the regression toward the mean.
Over time, this fundamental idea is applied to wider and wider area of scientific studies, and the term "regression" is no longer connected to the biological idea of regression.
In general, a regression model, involving $n$ independent variables $x_1,\ldots,x_n$ is simply an affine function of the form \[ f(x_1,\ldots,x_n) = a_1 x_1 + \cdots + a_n x_n + v \] where $a_n,\ldots,a_n$ and $v$ are real numbers.
As before, the $x_1,\ldots,x_n$ are called regressors. The coefficients $a_1,\ldots,a_n$ are weights, and the constant term $v$ is the offset.
Using a vector notation, such a function can be written as \[ \hat{y} = f(x_1,\ldots,x_n) = \begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} + v \] where $\cdot$ is the dot product operation.
It is even possible to construct regression models with multiple output (dependent) variables \[ \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} = f(x_1,\ldots,x_n) \]
where \[ \hat{y}_i = \begin{bmatrix} a_{i1} \\ \vdots \\ a_{in} \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} + v_i \] for $i=1,\ldots,m$.
The notation is very complicated. We will learn how to manage these complicated expression using "matrices".
In math classes, we are often given formula for certain function and asked to perform interesting/boring calculations.
In the real world, the situation is reversed --- we almost always need to guess the "correct" formula from observations (data).
In the rest of this course, we will develop the language and concepts necessary for solving this problem.