# Duality

The primal problem is essentially the standard form of optimization. There are no assumptions of convexity on any of the functions involved. We can would like to express primal problems as a min-max optimization with no constraints.

The Lagrangian

$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

using Lagrange Multipliers $\boldsymbol{\lambda}$

and $\boldsymbol{\mu}$

is given by

$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = f_0(\mathbf{x}) + \sum_{i=1}^m\lambda_i f_i(\mathbf{x}) + \sum_{k=1}^n \mu_i h_i(\mathbf{x})$

The Lagrangian achieves the goal of removing the constraints in the min-max optimization

$p^* = \min_{\mathbf{x}\in\mathbb{R}^n}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

This is true because if any inequality constraints are violated, then

$f_i(\mathbf{x}) \geq 0$

, and the maximization could set $\lambda_i$

very large to make the overall problem $\infty$

, and if any equality constraints are violated, then $h_k(\mathbf{x}) \ne 0$

, and the maximization would set $\mu_i$

to a very large number of the same sign as $h_k(\mathbf{x})$

to make the overall problem $\infty$

. Thus the minimax problem is equivalent to the original problem. At this point, it might be easier to solve the problem if the order of min and max were switched.Theorem 22 can be interpreted as a game where there is a minimizing player and a maximizing player. If the maximizer goes first, it will always produce a higher score than if the minimizer goes first (unless they are equal). We can now apply Theorem 22 to switch the

$\min$

and $\max$

in our optimization with the Lagrangian.

$\min_{\mathbf{x}\in\mathbb{R}^n}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \geq \max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \min_{\mathbf{x}\in\mathbb{R}^n} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

What weak duality does is convert our minimization problem to a maximization problem.

Note that

$g$

is a concave function because it is the pointwise minimum of functions that are affine in $\boldsymbol{\mu}$

and $\boldsymbol{\lambda}$

. A maximization of a concave function over a convex set is a convex problem, so the dual problem (minimizing $g$

) is convex. Thus duality achieves two primary purposes.- 1.It removes constraints, potentially making the problem easier to solve.
- 2.It can turn a non-convex problems into a convex one.

Even when there are no constraints, we can sometimes introduce constraints to leverage duality by adding slack variables that are equal to expressions in the objective.

In some cases, duality gives not just a lower bound, but an exact value. When this happens, we have

**Strong Duality**.Let

$X\subseteq\mathbb{R}^n$

be convex, and $Y\subseteq\mathbb{R}^m$

be bounded and closed (compact). Let $F:X \times Y \to \mathbb{R}$

be a function such that $\forall y,\ F(\cdot, y)$

is convex and continuous, and $\forall x,\ F(x, \cdot)$

is concave and continuous, then

$\min_{\mathbf{x}\in X}\max_{\mathbf{y}\in Y} F(\mathbf{x}, \mathbf{y}) = \max_{\mathbf{y}\in Y}\min_{\mathbf{x}\in X}F(\mathbf{x}, \mathbf{y})$

If we focus on convex problems, then we can find conditions which indicate strong duality holds.

Once we find a solution to the dual problem, then the solution to the primal problem is recovered by minimized

$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}^*, \boldsymbol{\mu}^*)$

where $\boldsymbol{\lambda}^*,\boldsymbol{\mu}^*$

are the optimal dual variables, and if no such feasible point $\mathbf{x}$

exists, then the primal itself is infeasible. When searching for strong duality and an optimal solution $(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

, it can be useful to consider particular conditions.For a convex primal problem which is feasible and has a feasible dual where strong duality holds, a primal dual pair

$(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

is optimal if and only if the KKT conditions are satisfied.- 1.
**Primal Feasibility**$\mathbf{x}$satisfies$\forall i\in[1,m],\ f_i(\mathbf{x}) \leq 0$and$\forall k\in[1,n],\ h_i(\mathbf{x}) = 0$. - 2.
**Dual Feasibility**$\boldsymbol{\lambda} \geq \boldsymbol{0}$. - 3.
**Complementary Slackness**$\forall i\in[1,m],\ \lambda_if_i(\mathbf{x}) = 0$ - 4.
**Lagrangian Stationarity**If the lagrangian is differentiable, then

$abla_xf_0(\mathbf{x}) +\sum_{i=1}^k\lambda_i\nabla_xf_i(\mathbf{x}) + \sum_{k=1}^n\mu_ih_k(\mathbf{x})=0$

The complementary slackness requirement essentially says that if a primal constraint is slack (

$f_i(\mathbf{x}) < 0)$

, then $\lambda_i=0$

, and if $\lambda_i > 0$

, then $f_i(\mathbf{x}) = 0$

.Last modified 2yr ago