Maxima and Minima (Optimization Basics)¶

Simple Idea¶

Finding maxima (peaks) and minima (valleys) of functions is a core task in optimization. Calculus gives us tools to find these points.
Single Variable: Think about finding the highest or lowest point on a curve \(y=f(x)\). At these points, the slope (derivative) is usually zero (the tangent line is horizontal).
Multiple Variables: Think about finding peaks or valley bottoms on a surface \(z=f(x, y)\). At these points, the "ground" is flat in all directions, meaning all partial derivatives (and thus the gradient) must be zero.

Local Maximum: A point \(c\) where \(f(c) \ge f(x)\) for all \(x\) in some open interval around \(c\). It's a peak in its immediate neighborhood.
Local Minimum: A point \(c\) where \(f(c) \le f(x)\) for all \(x\) in some open interval around \(c\). It's a valley bottom in its immediate neighborhood.
Global Maximum: A point \(c\) where \(f(c) \ge f(x)\) for all \(x\) in the function's entire domain. The absolute highest point.
Global Minimum: A point \(c\) where \(f(c) \le f(x)\) for all \(x\) in the function's entire domain. The absolute lowest point.
Extrema: Collective term for maxima and minima.

Single Variable: A point \(c\) in the domain of \(f\) is a critical point if either \(f'(c) = 0\) or \(f'(c)\) does not exist.
Multiple Variables: A point \(\mathbf{a}\) in the domain of \(f\) is a critical point if either \(\nabla f(\mathbf{a}) = \mathbf{0}\) (the zero vector, meaning all partial derivatives are zero) or if \(\nabla f(\mathbf{a})\) does not exist.
Fermat's Theorem (for local extrema): If \(f\) has a local maximum or minimum at an interior point \(c\) (or \(\mathbf{a}\)) and the derivative (or gradient) exists there, then \(f'(c) = 0\) (or \(\nabla f(\mathbf{a}) = \mathbf{0}\)).
Important: The condition \(f'(c)=0\) or \(\nabla f(\mathbf{a})=\mathbf{0}\) is necessary for a local extremum (if the derivative/gradient exists), but it is not sufficient. Points where the derivative/gradient is zero could also be saddle points.

Find critical points (where \(f'(x)=0\) or DNE).
Check the sign of \(f'(x)\) on either side of a critical point \(c\):
- If \(f'\) changes from + to - at \(c\): Local maximum.
- If \(f'\) changes from - to + at \(c\): Local minimum.
- If \(f'\) does not change sign: Neither (possibly saddle point or inflection).

Find critical points where \(f'(c)=0\).
Evaluate the second derivative \(f''(c)\):
- If \(f''(c) > 0\): \(f\) is concave up, so \(c\) is a local minimum.
- If \(f''(c) < 0\): \(f\) is concave down, so \(c\) is a local maximum.
- If \(f''(c) = 0\): The test is inconclusive.

Find critical points where \(\nabla f(\mathbf{a}) = \mathbf{0}\).
Compute the Hessian matrix \(\mathbf{H}(\mathbf{a})\), the matrix of second partial derivatives evaluated at \(\mathbf{a}\).
Calculate the determinant of the Hessian, \(D = \det(\mathbf{H}(\mathbf{a}))\). Let \(f_{xx} = \partial^2 f / \partial x^2\) evaluated at \(\mathbf{a}\).
- If \(D > 0\) and \(f_{xx}(\mathbf{a}) > 0\): Local minimum. (Hessian is positive definite).
- If \(D > 0\) and \(f_{xx}(\mathbf{a}) < 0\): Local maximum. (Hessian is negative definite).
- If \(D < 0\): Saddle point. (Hessian is indefinite).
- If \(D = 0\): The test is inconclusive.

Find all critical points inside the interval/region.
Evaluate the function \(f\) at these critical points.
Evaluate the function \(f\) at the boundaries of the interval/region.
The largest value found is the global maximum; the smallest is the global minimum.

Gradient Descent & Optimization Algorithms: These algorithms aim to find minima (usually local minima) of loss functions by iteratively moving towards points where the gradient is zero.
Model Training: Finding the optimal parameters for a machine learning model involves minimizing a loss function, i.e., finding its minimum value.
Maximum Likelihood Estimation (MLE): A statistical method that finds parameter values maximizing the likelihood function (finding a maximum).
Convex Functions: For convex functions, any local minimum is also a global minimum, which significantly simplifies optimization.

Extrema are maximum or minimum values of a function.
Local Extrema occur at critical points where the derivative \(f'(x)=0\) (single var) or the gradient \(\nabla f(\mathbf{a})=\mathbf{0}\) (multi var), or where they don't exist.
First/Second Derivative Tests (single var) or the Second Derivative Test using the Hessian Matrix (multi var) help classify critical points as local maxima, minima, or saddle points.
Finding extrema is the core goal of optimization.