Equality constrained optimization#

So far we have discussed how to find \(x\) that minimizes \(f(x\)), which is called ‘unconstrained minimization’. We now add equality constraints:

Find \(x\) that minimizes \(f(x\)) subject to \(g(x) = 0\). Note that \(g(x)\) can be a vector function if there are multiple equality constraints.

E.g.: If you wanted to minimize something on the unit circle, your constraint would be,

\[\begin{split}\begin{align} x^2+y^2&=1 \\ x^2+y^2-1 &= 0 \\ g(x,y) &=0 \end{align} \end{split}\]

Penalty method#

One way to enforce the constraints is to add them to the objective function as a penalty:

$\( f(x) + \lambda g(x) \)\( where \)\lambda\( is some suitably chosen constant. If the constraint is violate, i.e.: \)g(x) \ne 0\(, the combined objective is penalized. By increasing \)\lambda$, one enforces the penalty more severely.

Recall that the solvability of a system is related to its landscape. An excessively large \(\lambda\) will enforce the condition but introduce significant terrain to the surface. However, too small a \(\lambda\) will not affect the constraint at all! Algoithms that use this method have heuristic mecahnisms for adjusting the penalty depending on the violation of the constraints.

Note however, that there is a Goldilocks value that is just right in that it is the smallest value that is large enough to affect the constraint…

Method of undetermined Lagrange Multipliers#

At the optimium, we know that \(g(x)=0\) in which case it doesn’t matter what \(\lambda\) is, as long as it is big enough to enforce the constraint.

In this case, why not add \(\lambda\) as an unknown variable to be solved for? In this case, \(\lambda\) are called the Lagrange Mulitpliers. The summation of the objective function and the penalized equality constraints is called the Lagrangian,

\[ L(x, \lambda) = f(x) + \lambda \cdot g(x) \]

and is now a function of \(\lambda\) as well.

The system is solved for the stationary point, $\(\frac{\partial L}{\partial [x,\lambda] = 0}\)\( which implies, starting with \)\lambda$:

\[\frac{\partial L}{\partial \lambda} = 0 = g(x)\]

and thus the constraints are enforced.

The derivative wrt \(x\) is more interesting and reads,

\[\begin{split} \begin{align} \frac{\partial L}{\partial x} = 0 &= \nabla f(x) + \lambda \nabla g(x) \\ \nabla f(x) &= -\lambda \nabla g(x) \end{align} \end{split}\]

This means that along the curve defined by \(g(x)=0\) there is a point where the the gradient \(\nabla f\) is exactly perpendicular to the curve (and therefore parallel to \(\nabla g\)). The difference in magnitude, being the force with which one wants to keep to the curve, is \(\lambda\).

Consider a 2D analogy: You are hiking on a mountain along a path (\(g(x)=0\)). As you go, the path leads you along the height of the mountain (\(f(x)\)). You want to find the lowest point, so you simply walk ‘downhill’ along the path until you reach the lowest point. Here, the ground slopes exactly away (perpendicular) from the path. What’s more, for you to stay on the path, you will have to push your legs exactly uphill with precisely the necessary force \(\lambda\), so as to not fall to your death but stay on the path.

Quadratic programming with equality constraints#

The interesting thing about this technique is that we have replaced a constrained optimization with an unconstrained method, which can now be solved using our established techniques!

Unfortuantely, except in certain cases this doesn’t usually work… One case where it works well is for a quadratic objective function with a set of equality constraints.

\[g(x) = A x-b = 0\]

As we discussed, quadratic functions can be solved directly through Newton’s method. With a quadratic objective function \(f(x)\) and linear constraints \(g(x) = Ax-b\) the Lagrangian is also quadratic.

\[ L(x, \lambda) = f(x) + \lambda \cdot [Ax-b] \]

We have,
$\(\begin{align} \frac{\partial f}{\partial x} &= H x + c \\ \frac{\partial L}{\partial x} &= H x + c + \lambda A \end{align} \)$

and the Jacobian of the Lagrangian can be written as a block matrix,

\[\begin{split} \begin{bmatrix} H & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} -c \\ b \end{bmatrix} \end{split}\]

which is, once again, a (square) linear system!

NB: Don’t be scared off by the 0 block on the diagonal. For well-formed problems this matrix is non-singular even if H is not!

Example: Minimize \(x y +yz\) subject to \(x+2y = 6\) and \(x=3z\)#

We have the objective function, \(f(x) = xy+yz\) with Hessian $\(H=\begin{bmatrix} 0 & 1 & 0 \\ 1& 0 & 1 \\ 0 & 1& 0\end{bmatrix} \)\( and \)\(c = \begin{bmatrix} 0 \\0 \\ 0 \end{bmatrix}\)$

The constraints are $\( g = \begin{bmatrix} 1 & 2 & 0 \\ 1 & 0 & -3 \end{bmatrix} x - \begin{bmatrix} 6 \\ 0 \end{bmatrix} = 0\)$

Our KKT matrix then is,

import scipy as sp
A = np.array([
    [0, 1, 0, 1, 1],
    [1, 0, 1, 2, 0],
    [0, 1, 0, 0, -3],
    [1, 2, 0, 0, 0],
    [1, 0,-3, 0, 0]])

b = np.array([0,0,0,6,0])

sp.linalg.solve(A, b)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import scipy as sp
      2 A = np.array([
      3     [0, 1, 0, 1, 1],
      4     [1, 0, 1, 2, 0],
      5     [0, 1, 0, 0, -3],
      6     [1, 2, 0, 0, 0],
      7     [1, 0,-3, 0, 0]])
      9 b = np.array([0,0,0,6,0])

ModuleNotFoundError: No module named 'scipy'

Lets dig into this matrix a little more:

evals, evects= sp.linalg.eig(A)
print(np.round(evals,2))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 evals, evects= sp.linalg.eig(A)
      2 print(np.round(evals,2))

NameError: name 'sp' is not defined

Note that the eigenvalues are of opposite sign! This means that the system is indefinite, and the optimum is a saddle point.

This is typical of Lagrange Multiplier problems and is the reason why standard optimization techniques have difficulty. However, the block structure of the KKT matrix does make it ammenable to block matrix inversion (and the zero block helps!).

Constrained optimization algorithms exploit these properties.

Meaning of the Lagrange Multipliers#

The Lagrange Multipliers are also physically important parameters and this is why they are sought-after over penalty methods. They give the value of relaxing a constraint.

  • In engineering, they are the constraint force associated with some imposed condition.

  • In economics, this is marginal cost / shadow price of the constraint and tell you the worth of doing a little bit more.

  • In thermodynamics they are the conjugate state variable corresponding to a particular equilibrium.

#General constrained optimization

The algorithms for solving the general optimizaiton problem subject to nonlinear constraints and inequality constraints, generally involve defining feasible regions for which inequalities are satisfied, and searching interior points for the constrained optimum. The derivations can be very sophisticated, well beyond the scope of this course!

Example: Minimize \(x^4 + y^4\) on the unit circle.#

# prompt: Optimize x^4+y^4 subject to x^2+y^2-1, plot the objective, constraint,  and the optimization path

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from mpl_toolkits.mplot3d import Axes3D


def objective(x):
  return x[0]**4 + x[1]**4

def constraint(x):
  return x[0]**2 + x[1]**2 - 1

# Initial guess
x0 = [0.5, 0.5]

# Constraints
cons = ({'type': 'eq', 'fun': constraint})

# Optimization
sol = minimize(objective, x0, method='SLSQP', constraints=cons)

# Print results
print(sol)

# Plot the objective function and constraint
x = np.linspace(-1.5, 1.5, 100)
y = np.linspace(-1.5, 1.5, 100)
X, Y = np.meshgrid(x, y)
Z = X**4 + Y**4

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.5)
ax.contour(X, Y, Z, zdir='z', offset=0, cmap='viridis')

# Plot the constraint
theta = np.linspace(0, 2 * np.pi, 100)
x_constraint = np.cos(theta)
y_constraint = np.sin(theta)
ax.plot(x_constraint, y_constraint, 0, color='red', label='Constraint')

# Plot the optimum
ax.scatter(sol.x[0], sol.x[1], sol.fun, color='blue', marker='o', label='Optimum')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('Objective')
ax.legend()
plt.show()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[3], line 5
      3 import numpy as np
      4 import matplotlib.pyplot as plt
----> 5 from scipy.optimize import minimize
      6 from mpl_toolkits.mplot3d import Axes3D
      9 def objective(x):

ModuleNotFoundError: No module named 'scipy'
# prompt: minimize x^2-6xy+y^2 on the unit circle using trust-const and plot hte surface and the constraint

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from mpl_toolkits.mplot3d import Axes3D

def objective(x):
  return x[0]**2 - 6*x[0]*x[1] + x[1]**2

def constraint(x):
  return x[0]**2 + x[1]**2 - 1

# Initial guess
x0 = [0., 0.]

# Constraints
cons = ({'type': 'eq', 'fun': constraint})

# Optimization using trust-constr
sol = minimize(objective, x0, method='trust-constr', constraints=cons)

# Print results
print(sol)

# Plot the objective function and constraint
x = np.linspace(-1.5, 1.5, 100)
y = np.linspace(-1.5, 1.5, 100)
X, Y = np.meshgrid(x, y)
Z = X**2 - 6*X*Y + Y**2

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.5)
ax.contour(X, Y, Z, zdir='z', offset=0, cmap='viridis')

# Plot the constraint
theta = np.linspace(0, 2 * np.pi, 100)
x_constraint = np.cos(theta)
y_constraint = np.sin(theta)
ax.plot(x_constraint, y_constraint, 0, color='red', label='Constraint')

# Plot the optimum
ax.scatter(sol.x[0], sol.x[1], sol.fun, color='blue', marker='o', label='Optimum')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('Objective')
ax.legend()
plt.show()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[4], line 5
      3 import numpy as np
      4 import matplotlib.pyplot as plt
----> 5 from scipy.optimize import minimize
      6 from mpl_toolkits.mplot3d import Axes3D
      8 def objective(x):

ModuleNotFoundError: No module named 'scipy'
# prompt: Minimize a complex 2D function on the unit disk using trust-const

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from mpl_toolkits.mplot3d import Axes3D
import scipy as sp

def objective(x):
  """Objective function to minimize."""
  return np.sin(x[0]) * np.cos(x[1]) + x[0]**2 + x[1]**2

def constraint(x):
  """Constraint function: x^2 + y^2 <= 1 (unit disk)."""
  return x[0]**2 + x[1]**2 - 1

# Initial guess
x0 = [0.5, 0.5]

# Constraints
cons = sp.optimize.NonlinearConstraint(lambda x: x[0]**2 + x[1]**2 - 1, -np.inf, 0)

# Optimization using trust-constr
sol = minimize(objective, x0, method='trust-constr', constraints=cons)

# Print results
print(sol)

# Plot the objective function and constraint
x = np.linspace(-1.5, 1.5, 100)
y = np.linspace(-1.5, 1.5, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(X) * np.cos(Y) + X**2 + Y**2

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.5)
ax.contour(X, Y, Z, zdir='z', offset=0, cmap='viridis')

# Plot the constraint (unit disk boundary)
theta = np.linspace(0, 2 * np.pi, 100)
x_constraint = np.cos(theta)
y_constraint = np.sin(theta)
ax.plot(x_constraint, y_constraint, 0, color='red', label='Constraint')

# Plot the optimum
ax.scatter(sol.x[0], sol.x[1], sol.fun, color='blue', marker='o', label='Optimum')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('Objective')
ax.legend()
plt.show()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[5], line 5
      3 import numpy as np
      4 import matplotlib.pyplot as plt
----> 5 from scipy.optimize import minimize
      6 from mpl_toolkits.mplot3d import Axes3D
      7 import scipy as sp

ModuleNotFoundError: No module named 'scipy'