$p^* = \min_{\mathbf{x}\in\mathbb{R}^n} f_0(\mathbf{x}) : \forall i\in[1,m]\ f_i(\mathbf{x}) \leq 0, \forall k\in[1,n]\ h_k(\mathbf{x}) = 0$

The Lagrangian $\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$ using Lagrange Multipliers $\boldsymbol{\lambda}$ and $\boldsymbol{\mu}$ is given by

$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = f_0(\mathbf{x}) + \sum_{i=1}^m\lambda_i f_i(\mathbf{x}) + \sum_{k=1}^n \mu_i h_i(\mathbf{x})$

$p^* = \min_{\mathbf{x}\in\mathbb{R}^n}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

This is true because if any inequality constraints are violated, then $f_i(\mathbf{x}) \geq 0$, and the maximization could set $\lambda_i$ very large to make the overall problem $\infty$, and if any equality constraints are violated, then $h_k(\mathbf{x}) \ne 0$, and the maximization would set $\mu_i$ to a very large number of the same sign as $h_k(\mathbf{x})$ to make the overall problem $\infty$. Thus the minimax problem is equivalent to the original problem. At this point, it might be easier to solve the problem if the order of min and max were switched.

For any sets $X, Y$ and any function $F:X\times Y\to\mathbb{R}$

$\min_{\mathbf{x}\in X}\max_{\mathbf{y}\in Y} F(\mathbf{x}, \mathbf{y}) \geq \max_{\mathbf{y}\in Y}\min_{\mathbf{x}\in X}F(\mathbf{x}, \mathbf{y})$

Theorem 22 can be interpreted as a game where there is a minimizing player and a maximizing player. If the maximizer goes first, it will always produce a higher score than if the minimizer goes first (unless they are equal). We can now apply Theorem 22 to switch the $\min$ and $\max$ in our optimization with the Lagrangian.

$\min_{\mathbf{x}\in\mathbb{R}^n}\max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \geq \max_{\boldsymbol{\lambda}\geq \boldsymbol{0}, \boldsymbol{\mu}} \min_{\mathbf{x}\in\mathbb{R}^n} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

$g(\boldsymbol{\lambda}, \boldsymbol{\mu}) = \min_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$

Note that $g$ is a concave function because it is the pointwise minimum of functions that are affine in $\boldsymbol{\mu}$ and $\boldsymbol{\lambda}$. A maximization of a concave function over a convex set is a convex problem, so the dual problem (minimizing $g$) is convex. Thus duality achieves two primary purposes.

Let $X\subseteq\mathbb{R}^n$ be convex, and $Y\subseteq\mathbb{R}^m$ be bounded and closed (compact). Let $F:X \times Y \to \mathbb{R}$ be a function such that $\forall y,\ F(\cdot, y)$ is convex and continuous, and $\forall x,\ F(x, \cdot)$ is concave and continuous, then

$\min_{\mathbf{x}\in X}\max_{\mathbf{y}\in Y} F(\mathbf{x}, \mathbf{y}) = \max_{\mathbf{y}\in Y}\min_{\mathbf{x}\in X}F(\mathbf{x}, \mathbf{y})$

Once we find a solution to the dual problem, then the solution to the primal problem is recovered by minimized $\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}^*, \boldsymbol{\mu}^*)$ where $\boldsymbol{\lambda}^*,\boldsymbol{\mu}^*$ are the optimal dual variables, and if no such feasible point $\mathbf{x}$ exists, then the primal itself is infeasible. When searching for strong duality and an optimal solution $(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$, it can be useful to consider particular conditions.

For a convex primal problem which is feasible and has a feasible dual where strong duality holds, a primal dual pair $(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})$ is optimal if and only if the KKT conditions are satisfied.

**Primal Feasibility** $\mathbf{x}$ satisfies $\forall i\in[1,m],\ f_i(\mathbf{x}) \leq 0$ and $\forall k\in[1,n],\ h_i(\mathbf{x}) = 0$.

**Dual Feasibility** $\boldsymbol{\lambda} \geq \boldsymbol{0}$.

**Complementary Slackness** $\forall i\in[1,m],\ \lambda_if_i(\mathbf{x}) = 0$

$abla_xf_0(\mathbf{x}) +\sum_{i=1}^k\lambda_i\nabla_xf_i(\mathbf{x}) + \sum_{k=1}^n\mu_ih_k(\mathbf{x})=0$

The complementary slackness requirement essentially says that if a primal constraint is slack ($f_i(\mathbf{x}) < 0)$, then $\lambda_i=0$, and if $\lambda_i > 0$, then $f_i(\mathbf{x}) = 0$.