Constrained optimization problems

Constrained optimization problems involve finding maxima or minima of function values within a specified set of restrictions or constraints. These problems are ubiquitous in areas like economics, engineering, physics, and machine learning where various conditions must be satisfied. The constraints could be either equality or inequality in nature. Here, we only focus on constraints that can be formulated as equalities.

In two dimension, a basic form of is either \[ \begin{aligned} \text{Minimize } &f(x,y) \\ \text{subject to } &g(x,y) = c \end{aligned} \] or \[ \begin{aligned} \text{Maximize } &f(x,y) \\ \text{subject to } &g(x,y) = c \end{aligned} \] for some function \(f(x,y)\) (called objective function) and \(g(x,y)\) as well as a constant \(c\). The equation \(g(x,y)=c\) is called the constraint (or constraint equation, in this case).

In general, such problems can be formulated as maximizing or minimizing a function (objective function) subject to a set of constraints.

Tip

There are several special types of constrained optimization problems when both the objective and the contraints are given by linear functions/equations/inequalities, we call them linear programming problems. Beyond those, we have quadratic programming problems, semi-definite programming, and etc.

Level set interpretation

The form \[ \begin{aligned} \text{Minimize } &f(x,y) \\ \text{subject to } &g(x,y) = c \end{aligned} \] can be interpreted as the problem of of optimizaing the value of \(f(x,y)\) while restricting ourselves to a level set of \(g\) defined by \(g(x,y)=c\).

Elimination/substitution method

If in the equation \(g(x,y) = c\), we can solve one variable in terms of the other (which may not be an easy task). the constrained optimization problem in two variables can be reduced an unconstrained optimization problem in one variable. For example, if we can get \(y = h(x)\), then the problem \[ \begin{aligned} \text{Minimize } &f(x,y) \\ \text{subject to } &g(x,y) = c \end{aligned} \] can be reduced to \[ \text{Minimize } f(x,h(x)). \] within certain domain.

Warning

It is often the case that the relation \(g(x,y) = c\) cannot be parametrized by a single function \(y=h(x)\). Instead, we have to piece together multiple functions, say \(y = h_1(x), y=h_2(x), \ldots\) In this case, the original optimization problem is translated into several optimization problems.

Global parametrization method

In some cases, it is possible to parametrize the set defined by \(g(x,y)=c\) as the image of a function \(\mathbf{r} : \mathbb{R} \to \mathbb{R}^2\) over some interval \(I\). With that, the problem \[ \begin{aligned} \text{Minimize } &f(x,y) \\ \text{subject to } &g(x,y) = c \end{aligned} \] can be translated to the unconstrained optimization problem \[ \text{Minimize } f(\mathbf{r}(t)) \] over the interval \(I\).

Method of Lagrange multipliers

The method of Lagrange multipliers is a mathematical tool used for solving constrained optimization problems. The method works by transforming the constrained problem into a almost equivalent problem without constraints. In the two-variable case, it says that from the constrained optimization \[ \begin{aligned} \text{Minimize } &f(x,y) \\ \text{subject to } &g(x,y) = c \end{aligned} \] we can derive the system of equation \[ \begin{aligned} \nabla f (x,y) &= \lambda \nabla g(x,y) \\ g(x,y) &= c \end{aligned} \] in the three variables \(x,y,\lambda\). Here, \(\nabla f\) and \(\nabla g\) are the gradient vector of \(f\) and \(g\), respectively. After solving this system of equations if the solution is finite, then…

  1. We plug in each solution to \(f\) to identify the maxima and minima.
  2. We also need to check the values of \(f\) where \(g(x,y) = c\) and \(\nabla g(x,y) = 0\) to locate any maxima and minima missed by the first step.
Warning

Note that the second step is missed by many textbooks and most internet resources.

Exercises

The saddle surface

Solve the optimization problems \[ \begin{aligned} \text{Minimize } &x^2 - y^2 \\ \text{subject to } &x^2 + y^2 = 1 \end{aligned} \] and \[ \begin{aligned} \text{Maximize } &x^2 - y^2 \\ \text{subject to } &x^2 + y^2 = 1 \end{aligned} \] Note that the equation \(x^2 + y^2 = 1\) defines the unit circle, therefore, these problems ask us to find the lowest and highest function values of \(f(x,y) = x^2 - y^2\) while restrict ourself to the unit circle.

The constraint is \(x^2 + y^2 = 1\), which is equivalent to \[ y = \pm \sqrt{1-x^2} \] within the interval \([-1,1]\). Then the original objective function becomes \[ h(x) = x^2 - y^2 = x^2 - \left(\pm \sqrt{1-x^2} \right)^2 = x^2 - (1 - x^2) = 2x^2 - 1. \] In other words, the original problem can be reduced into the optimization problem for the function \(h(x) = 2x^2 - 1\) within the interval \([-1,1]\).

Within the open interval \((-1,1)\), \(h\) is differentiable and \[ h'(x) = 4x \] so the only critical point within the open interval \((-1,1)\) is \(x=0\). Morever, since \[ h''(0) = 4 > 0 \] by the 2nd derivative test, we can conclude that \(h\) has a local minimum at \(x=0\).

At the two end points \(x=\pm 1\), \(h(x) = 1\), and therefore we can conclude that \(h\) attains its global maxima (within \([-1,1]\)) at \(x=\pm 1\) and its global minimum at \(x=0\).

In the context of the original problem, we can conclude that subject to the constraint that \(x^2+y^2=1\), the function \(x^2-y^2\) attains its maxima at \(x=\pm 1\) and \(y=0\) and its minima at \(x=0\) and \(y=\pm 1\).

Let \(f(x,y) = x^2 - y^2\) be the objective function, and let \(g(x,y) = x^2 + y^2\). Then the problem is the constrained optimization problem for \(f(x,y)\) subject to \(g(x,y) = 1\).

We can compute that \[ \begin{aligned} \nabla f(x,y) &= \begin{bmatrix} f_x \\ f_y \end{bmatrix} = \begin{bmatrix} 2x \\ -2y \end{bmatrix} & \nabla g(x,y) &= \begin{bmatrix} g_x \\ g_y \end{bmatrix} = \begin{bmatrix} 2x \\ 2y \end{bmatrix} \end{aligned} \] Notice that \(\nabla g \ne \mathbf{0}\) for any \((x,y)\) such that \(g(x,y)=1\). The method of Lagrange multipliers is the transformation of the constrained optimization problem into the system of equations \[ \left\{ \begin{aligned} \nabla f(x,y) &= \nabla g(x,y) \\ g(x,y) &= 1 \end{aligned} \right. \] I.e., \[ \left\{ \begin{aligned} 2x &= \lambda \cdot 2x \\ 2y &= \lambda \cdot (-2y) \\ x^2 + y^2 &= 1 \end{aligned} \right. \] Solve this equations we get the solutions \[ \begin{aligned} & \left\{ \begin{aligned} x &= 1 \\ y &= 0 \end{aligned} \right. && \left\{ \begin{aligned} x &= -1 \\ y &= 0 \end{aligned} \right. && \left\{ \begin{aligned} x &= 0 \\ y &= 1 \end{aligned} \right. && \left\{ \begin{aligned} x &= 0 \\ y &= -1 \end{aligned} \right. \end{aligned} \] Since \[ \begin{aligned} f(1,0) &= 1 & f(-1,0) &= 1 & f(0,1) &= -1 & f(0,-1) &= -1 \end{aligned} \] We can conclude that under the constraint that \(x^2 + y^2 = 1\), the function \(x^2 - y^2\) attains its maxima at \((x,y) = (\pm 1,0)\) and its minima at \((x,y) = (0,\pm 1)\).

Rectangles of given perimeter

What is the maximum area of a rectangle if the perimeter is 20? Is there a minimum area?

Solving this using the method of Lagrange multipliers is actually more complicated than it first appears. Answering the second question can help us simplify the problem significantly:

From our geometric intuition, we can see there can be no minimum area since we can always make the area smaller and smaller by making the rectangle narrower and narrower while keeping the perimeter the same. Therefore, there can be no minimum area.

Let \(x\) and \(y\) be the width and height of a rectangle, respectively. Also let \(A(x,y) = xy\) and \(P(x,y) = 2x + 2y\) be the area and the perimeter of the rectangle, respectively. Then the problem is the of \[ \begin{aligned} \text{Maximize } &A(x,y) \\ \text{subject to } &P(x,y) = 20 \end{aligned} \] with the implicit restriction that \(x,y > 0\). We can compute that \[ \begin{aligned} \nabla A(x,y) &= \begin{bmatrix} A_x \\ A_y \end{bmatrix} = \begin{bmatrix} y \\ x \end{bmatrix} & \nabla P(x,y) &= \begin{bmatrix} P_x \\ P_y \end{bmatrix} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} \end{aligned} \] Notice that \(\nabla P \ne 0\) for any \((x,y)\). The method of Lagrange multipliers is the transformation of the constrained optimization problem into the system of equations \[ \left\{ \begin{aligned} \nabla A(x,y) &= \nabla P(x,y) \\ P(x,y) &= 20 \end{aligned} \right. \] I.e., \[ \left\{ \begin{aligned} y &= \lambda \cdot 2 \\ x &= \lambda \cdot 2 \\ 2x + 2y &= 20 \end{aligned} \right. \] Solve this equations we get the solution \[ \begin{aligned} & \left\{ \begin{aligned} x &= 5 \\ y &= 5 \\ \lambda &= \frac{5}{2} \end{aligned} \right. \end{aligned} \] Since we have already concluded that there can be no minimum area, this solution must correspond to a maximum. In other word, if we require the perimiter to be 20, the maximum area is achieved by the \(5 \times 5\) square.