Indefinite systems#
The quadratic \(x^2 - 6 x y +y^2\) is an example of a problem with an indefinite matrix. Such functions are neither convex nor concave, and the extremal point is neither a minimum nor a maximum, but a saddle point (as visualized).
Saddle point problems are an important subclass of optimization as we’ll see when we deal with constraints.
Newton’s method, and generally optimizers that employ gradients, typically can identify saddle points, but direct methods often fail unless one modifies the objective function (e.g.: Minimize \(f^2\))
Eigenvalues / functions#
The fact that \(f(x,y) = x^2 - 6 x y +y^2\) is indefinite may come as a surprise considering the quadratics along the axes, \(f(x,0) = x^2\) and \(f(0,y) = y^2\) are obviously positive definite.
The key is to realize that we could rotate the coordinate system by \(45^0\) (which obviously wouldn’t change the extremal), and write the same function as \(f(x',y') = 8 {x'}^2 - 4 {y'}^2\) from which the saddle point is clear.
But how would you know that?
The \(x',y'\) coordiate system is special in that its Hessian doesn’t have diagonal terms. They are the eigenvectors of the Hessian, and the \(8\) and \(-4\) are the eigenvalues. With this in mind, definiteness is defined:
Definiteness |
Eigenvalues |
---|---|
Positive Definite |
All eigenvalues are positive. |
Negative Definite |
All eigenvalues are negative. |
Indefinite |
Eigenvalues are positive and negative. |
Positive Semi-Definite |
All eigenvalues are non-negative (positive or zero). |
Negative Semi-Definite |
All eigenvalues are non-positive (negative or zero). |
# prompt: what are the eigen functions and values of x^2 - 6xy + y^2
import numpy as np
def hessian_f(x):
return np.array([[2, -6], [-6, 2]])
hessian = hessian_f([0,0])
eigenvalues, eigenvectors = np.linalg.eig(hessian)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)
Eigenvalues: [ 8. -4.]
Eigenvectors: [[ 0.70710678 0.70710678]
[-0.70710678 0.70710678]]
Newton’s optimization routine is highly sensative to initial guesses. Furthermore, the calculation and inversion of the Hessian can be computationally expensive and we don’t necessarily trust that it will give us a good result. We will return to it later in its more common form…