Lagrange Multipliers

Calculus-based optimization methods can be summarized in a single sentence: when minimizing a function, every point either has a nearby point that decreases the function, or is a candidate minimum. This can be seen in single variable optimization problems, where we either solve for critical points by setting the derivative of the function equal to , or use the derivative to indicate which direction to move to decrease the function. In multivariable optimization problems, all nearby points are obtained by small perturbations of each variable. When the partial derivatives of the function with respect to each variable are , then no nearby point decreases the function. Otherwise, for each nonzero partial derivative, there is a known direction to perturb the corresponding variable to decrease the function. Gradient descent does this while also setting the magnitude of the perturbation to be proportional to the partial derivative.

Lagrange multipliers is a method for solving optimization problems subject to constraints on the variables. For a single constraint, the problem is

Instead of describing the method of Lagrange multipliers, we can reason about the core idea described in the first paragraph to obtain the same approach to the constrained optimization problem.

Without the constraint, we would ordinarily minimize by setting all its partial derivatives to . This is because every point near is of the form for small perturbations , and a first-order approximation gives

If any were nonzero, it would be possible to choose the sign of a small perturbation to make negative and decrease the function as a result. Setting all partial derivatives to is equivalent to requiring for all perturbations .

The constraint gives us and restricts the valid perturbations we can make to . Since

combining with the requirement that remains after the perturbation gives . So with the constraint, we only need

These expressions are dot products. Letting

the requirement is that for all satisfying . Since can be any vector orthogonal to , must be a scalar multiple of .

To see this, if had any component not along , that component would itself be orthogonal to and so would be a valid choice of , but dotting it with would be positive, contradicting the requirement. So must point entirely along , meaning for some scalar .

Together with the constraint , the conditions for a candidate minimum are:

for some scalar . This is exactly the method of Lagrange multipliers, where is the Lagrange multiplier. The gradient of must be proportional to the gradient of , meaning the only directions that could decrease are directions that would violate the constraint.

The same reasoning extends to multiple equality constraints . Valid perturbations must be orthogonal to all , so must lie in their span, giving . It can also be used to derive the Karush-Kuhn-Tucker (KKT) conditions, which generalize Lagrange multipliers to inequality constraints.