Duality

Definition 40

A primal optimization problem is given by
p=minxRnf0(x):i[1,m] fi(x)0,k[1,n] hk(x)=0p^* = \min_{\mathbf{x}\in\mathbb{R}^n} f_0(\mathbf{x}) : \forall i\in[1,m]\ f_i(\mathbf{x}) \leq 0, \forall k\in[1,n]\ h_k(\mathbf{x}) = 0
The primal problem is essentially the standard form of optimization. There are no assumptions of convexity on any of the functions involved. We can would like to express primal problems as a min-max optimization with no constraints.

Definition 41

The Lagrangian
L(x,λ,μ)\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})
using Lagrange Multipliers
λ\boldsymbol{\lambda}
and
μ\boldsymbol{\mu}
is given by
L(x,λ,μ)=f0(x)+i=1mλifi(x)+k=1nμihi(x)\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = f_0(\mathbf{x}) + \sum_{i=1}^m\lambda_i f_i(\mathbf{x}) + \sum_{k=1}^n \mu_i h_i(\mathbf{x})
The Lagrangian achieves the goal of removing the constraints in the min-max optimization
p=minxRnmaxλ0,μL(x,λ,μ)p^* = \min_{\mathbf{x}\in\mathbb{R}^n}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})
This is true because if any inequality constraints are violated, then
fi(x)0f_i(\mathbf{x}) \geq 0
, and the maximization could set
λi\lambda_i
very large to make the overall problem
\infty
, and if any equality constraints are violated, then
hk(x)0h_k(\mathbf{x}) \ne 0
, and the maximization would set
μi\mu_i
to a very large number of the same sign as
hk(x)h_k(\mathbf{x})
to make the overall problem
\infty
. Thus the minimax problem is equivalent to the original problem. At this point, it might be easier to solve the problem if the order of min and max were switched.

Theorem 22 (Minimax Inequality)

For any sets
X,YX, Y
and any function
F:X×YRF:X\times Y\to\mathbb{R}
minxXmaxyYF(x,y)maxyYminxXF(x,y)\min_{\mathbf{x}\in X}\max_{\mathbf{y}\in Y} F(\mathbf{x}, \mathbf{y}) \geq \max_{\mathbf{y}\in Y}\min_{\mathbf{x}\in X}F(\mathbf{x}, \mathbf{y})
Theorem 22 can be interpreted as a game where there is a minimizing player and a maximizing player. If the maximizer goes first, it will always produce a higher score than if the minimizer goes first (unless they are equal). We can now apply Theorem 22 to switch the
min\min
and
max\max
in our optimization with the Lagrangian.

Theorem 23 (Weak Duality)

minxRnmaxλ0,μL(x,λ,μ)maxλ0,μminxRnL(x,λ,μ)\min_{\mathbf{x}\in\mathbb{R}^n}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \geq \max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \min_{\mathbf{x}\in\mathbb{R}^n} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})
What weak duality does is convert our minimization problem to a maximization problem.

Definition 42

The dual function of the primal problem is given by
g(λ,μ)=minxL(x,λ,μ)g(\boldsymbol{\lambda}, \boldsymbol{\mu}) = \min_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})
Note that
gg
is a concave function because it is the pointwise minimum of functions that are affine in
μ\boldsymbol{\mu}
and
λ\boldsymbol{\lambda}
. A maximization of a concave function over a convex set is a convex problem, so the dual problem (minimizing
gg
) is convex. Thus duality achieves two primary purposes.
  1. 1.
    It removes constraints, potentially making the problem easier to solve.
  2. 2.
    It can turn a non-convex problems into a convex one.
Even when there are no constraints, we can sometimes introduce constraints to leverage duality by adding slack variables that are equal to expressions in the objective.

Strong Duality

In some cases, duality gives not just a lower bound, but an exact value. When this happens, we have Strong Duality.

Theorem 24 (Sion's MiniMax Theorem)

Let
XRnX\subseteq\mathbb{R}^n
be convex, and
YRmY\subseteq\mathbb{R}^m
be bounded and closed (compact). Let
F:X×YRF:X \times Y \to \mathbb{R}
be a function such that
y, F(,y)\forall y,\ F(\cdot, y)
is convex and continuous, and
x, F(x,)\forall x,\ F(x, \cdot)
is concave and continuous, then
minxXmaxyYF(x,y)=maxyYminxXF(x,y)\min_{\mathbf{x}\in X}\max_{\mathbf{y}\in Y} F(\mathbf{x}, \mathbf{y}) = \max_{\mathbf{y}\in Y}\min_{\mathbf{x}\in X}F(\mathbf{x}, \mathbf{y})
If we focus on convex problems, then we can find conditions which indicate strong duality holds.

Theorem 25 (Slater's Condition)

If a convex optimization problem is strictly feasible, then strong duality holds
Once we find a solution to the dual problem, then the solution to the primal problem is recovered by minimized
L(x,λ,μ)\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}^*, \boldsymbol{\mu}^*)
where
λ,μ\boldsymbol{\lambda}^*,\boldsymbol{\mu}^*
are the optimal dual variables, and if no such feasible point
x\mathbf{x}
exists, then the primal itself is infeasible. When searching for strong duality and an optimal solution
(x,λ,μ)(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})
, it can be useful to consider particular conditions.

Theorem 26

For a convex primal problem which is feasible and has a feasible dual where strong duality holds, a primal dual pair
(x,λ,μ)(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})
is optimal if and only if the KKT conditions are satisfied.
  1. 1.
    Primal Feasibility
    x\mathbf{x}
    satisfies
    i[1,m], fi(x)0\forall i\in[1,m],\ f_i(\mathbf{x}) \leq 0
    and
    k[1,n], hi(x)=0\forall k\in[1,n],\ h_i(\mathbf{x}) = 0
    .
  2. 2.
    Dual Feasibility
    λ0\boldsymbol{\lambda} \geq \boldsymbol{0}
    .
  3. 3.
    Complementary Slackness
    i[1,m], λifi(x)=0\forall i\in[1,m],\ \lambda_if_i(\mathbf{x}) = 0
  4. 4.
    Lagrangian Stationarity If the lagrangian is differentiable, then
ablaxf0(x)+i=1kλixfi(x)+k=1nμihk(x)=0abla_xf_0(\mathbf{x}) +\sum_{i=1}^k\lambda_i\nabla_xf_i(\mathbf{x}) + \sum_{k=1}^n\mu_ih_k(\mathbf{x})=0
The complementary slackness requirement essentially says that if a primal constraint is slack (
fi(x)<0)f_i(\mathbf{x}) < 0)
, then
λi=0\lambda_i=0
, and if
λi>0\lambda_i > 0
, then
fi(x)=0f_i(\mathbf{x}) = 0
.
Copy link