Critical points

A critical point of a real-valued function of several variables refers to a point in the domain where either all (first-order) partial derivatives are zero or some of the partial derivatives are undefined. They are crucial in the study of functions. In particular, all local extrema are must be critical points (although the converse is not true).

Critical points of real valued functions in two variables

How to find crtical points

For a function \(f(x,y)\) that is defined on an open domain, the problem of finding critical points of \(f\) is exactly the problem of finding points \((x,y)\) such that

either both \(\frac{\partial f}{\partial x} = \frac{\partial f}{\partial y} = 0\)
or one of \(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\) is undefined.

Classifictaion of critical points

For a differentiable function, isolated critical points can be categorized as degenerate or nondegenerate.

Nondegenerate critical points

Nondegenerate critical points are the simpler kind, they comprise local maxima, local minima, or saddle points.

At a local maximum, the function possesses a peak value within a neighborhood of the point.
At a local minimum, the function has a low point (value-wise) within a neighborhood.
Saddle points, on the other hand, may appear to be a local maxima in one direction but a local minima in another. They are neither local maxima nor local minima.

As long as the function has continuous second order partial derivatives, we can classify a nondegenerate critical point through a simple test.

Second derivative test

Suppose the second partial derivative of \(f\) are continuous in a neighborhood of a critical \((x_0,y_0)\) of \(f\). Let

\[ D = f_{xx} f_{yy} - (f_{xy}^2) \]

evaluate at this point (i.e., we plug in \(x=x_0\) and \(y=y_0\) into all the partial derivatives).

If \(D > 0\) and \(f_{xx} > 0\), then \(f\) has a local mininum at \((x_0,y_0)\).
If \(D > 0\) and \(f_{xx} < 0\), then \(f\) has a local maximum at \((x_0,y_0)\).
If \(D < 0\), then \((x_0,y_0)\) is a saddle point for \(f\).

This test is inconclusive (i.e., it tells us nothing) if \(D = 0\).

In the language of linear algebra, we can state the same test in terms of determinant, trace, or eigenvalues.

Second derivative test (the determinant and trace version)

Suppose the second partial derivative of \(f\) are continuous in a neighborhood of a critical \((x_0,y_0)\) of \(f\). Define the Hessian matrix to be

\[ H = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{bmatrix} \]

evaluate at this point (i.e., we plug in \(x=x_0\) and \(y=y_0\) into all the partial derivatives). In more familiar term, this matrix is also the Jacobian matrix of \(\nabla f^\top\).

If \(\det(H) > 0\) and \(\operatorname{tr}(H) > 0\), then \(f\) has a local mininum at \((x_0,y_0)\).
If \(\det(H) > 0\) and \(\operatorname{tr}(H) < 0\), then \(f\) has a local maximum at \((x_0,y_0)\).
If \(\det(H) < 0\), then \((x_0,y_0)\) is a saddle point for \(f\).

This test is inconclusive (i.e., it tells us nothing) if \(\det(H) = 0\).

Second derivative test (the eigenvalue version)

Suppose the second partial derivative of \(f\) are continuous in a neighborhood of a critical \((x_0,y_0)\) of \(f\). Define the Hessian matrix to be

\[ H = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{bmatrix} \]

evaluate at this point.

If \(H\) has two positive eigenvalues, then \(f\) has a local mininum at \((x_0,y_0)\).
If \(H\) has two negative eigenvalues, then \(f\) has a local maximum at \((x_0,y_0)\).
If \(H\) has one positive and one negative eigenvalue, then \((x_0,y_0)\) is a saddle point for \(f\).

This test is inconclusive (i.e., it tells us nothing) if \(H\) has a zero eigenvalue.

It is worth noting that this version can be directly generalized to higher dimensions. For a real-valued function in any number of variables, we can simply replace the word “two” in the first two conditions by “all”. The third condition will be by “If \(H\) has eigenvalues of different signs, then \((x_0,y_0)\) is a saddle point for \(f\).”

Degenerate critical points

Degenerate critical points are much more complex to classify since the second derivative test does not lead to a definitive conclusion. Essentially, the function’s graph is so “flat” near a degenerate critical point that the first and second order partial derivatives don’t provide enough information to understand the situation. To conclusively classify such points, one may need further analysis or higher derivative information.

Fitting a line through two points

Suppose we have two points \((x_1,y_1) = (1, 1)\) and \((x_2,y_2) = (3, 2)\). Consider a straight line that is the graph of the function

\[ y = f(x) = mx + b \]

for some real numbers \(m\) and \(b\). Let us define the function

\[ E(m,b) = [ y_1 - f(x_1) ]^2 + [ y_2 - f(x_2) ]^2 \]

This function is the sum of squared \(y\)-difference between the line and the two given points. The problem of finding the line that passes through both point is therefore equivalent to the problem of finding the right choices of \(m\) and \(b\) so that \(E(m,b)\) is minimized. Even through we can find \(m\) and \(b\) directly, the goal of this exercise is to recover them using the tool of calculus.

Question

Find and classify the critical points for this function.

Answer

In this case, \(E\) is given by

\[ E(m,b) = \left(- b - 3 m + 2\right)^{2} + \left(- b - m + 1\right)^{2} \]

Its partial derivatives are

\[ \begin{aligned} \frac{ \partial E }{ \partial m } &= 8 b + 20 m - 14 & \frac{ \partial E }{ \partial b } &= 4 b + 8 m - 6 \end{aligned} \]

Both are defined for all \(m\) and \(b\). Therefore, to find the critical points, we simplify have to find \(m\) and \(b\) such that \(\frac{\partial E}{\partial m} = 0\) and \(\frac{\partial E}{\partial b}\) simultaneously. Solve this system, we get

\[ \left\{ \begin{aligned} m &= \frac{1}{2} \\ b &= \frac{1}{2} \\ \end{aligned} \right. \]

At this point,

\[ \begin{bmatrix} E_{mm} & E_{mb} \\ E_{bm} & E_{bb} \\ \end{bmatrix} = \left[\begin{matrix}20 & 8\\8 & 4\end{matrix}\right] \]

and \(E_{mm} E_{bb} - E_{mb} E_{bm} = 16 > 0\). Moreover, \(E_{mm} > 0\). By the second derivative test, we can conclude that this critical point is a local minimum. Indeed, since the original function is a sum of squares, we can also conclude that it is a global minimum. In other words, we found that the line that passes through the two given points is defined by

\[ y = \frac{1}{2} x + \frac{1}{2} \]

Best fitting line among three points

Suppose we now have three points \[ \begin{aligned} (x_1,y_1) &= (1, 1) & (x_2,y_2) &= (3, 2) & (x_3,y_3) &= (4, 2) \end{aligned} \] Consider a straight line that is the graph of the function

\[ y = f(x) = mx + b \]

for some real numbers \(m\) and \(b\). Let us define the function

\[ E(m,b) = [ y_1 - f(x_1) ]^2 + [ y_2 - f(x_2) ]^2 + [ y_3 - f(x_3) ]^2 \]

This function is the sum of squared \(y\)-difference between the line and the given points. The problem of finding the “best fitting” line is therefore equivalent to the problem of finding the right choices of \(m\) and \(b\) so that \(E(m,b)\) is minimized. This problem is known as the linear least square problem, and it is a form of linear regression.

Question

Find and classify the critical points for this function.

Answer

In this case, \(E\) is given by

\[ E(m,b) = \left(- b - 4 m + 2\right)^{2} + \left(- b - 3 m + 2\right)^{2} + \left(- b - m + 1\right)^{2} \]

Its partial derivatives are

\[ \begin{aligned} \frac{ \partial E }{ \partial m } &= 16 b + 52 m - 30 & \frac{ \partial E }{ \partial b } &= 6 b + 16 m - 10 \end{aligned} \]

\[ \left\{ \begin{aligned} m &= \frac{5}{14} \\ b &= \frac{5}{7} \\ \end{aligned} \right. \]

At this point,

\[ \begin{bmatrix} E_{mm} & E_{mb} \\ E_{bm} & E_{bb} \\ \end{bmatrix} = \left[\begin{matrix}52 & 16\\16 & 6\end{matrix}\right] \]

and \(E_{mm} E_{bb} - E_{mb} E_{bm} = 56 > 0\). Moreover, \(E_{mm} > 0\). By the second derivative test, we can conclude that this critical point is a local minimum. Indeed, since the original function is a sum of squares, we can also conclude that it is a global minimum. In other words, we found that the line that passes through the two given points is defined by

\[ y = \frac{5}{14} x + \frac{5}{7} \]