## Strong Duality and Slater’s Theorem

I left off a few days ago with my explanation of Weak Duality. Today, I will continue with the discussion of strong duality, which states

$d^* = p^*$.

Strong duality does not always hold but for certain types of problems (especially most convex optimality problems), strong duality holds. Constraint qualifications are the conditions under which strong duality holds when these conditions are applied to convex optimality problems.

One simple but well-known constraint qualification is Slater’s condition:

$\exists x \in$ relint $\mathcal{D}$

such that $f_i(x) < 0, i=1,...,m$ and $Ax = b$.

Slater’s condition basically states that there is a point in the relative interior of the domain of the optimization problem such that all constraints are strictly satisfied.

For linear programs, a well-known proof using Farkas’s Lemma shows that strong duality holds. You can view the proof here. However, here, I am interested in exploring the proof of Slater’s Theorem (below) since the main focus of this blog series is convex optimization. Slater’s Theorem states that strong duality holds for convex optimization problems that follow Slater’s condition.

Slater’s Theorem: If Slater’s condition is followed and the optimization problem is convex, then strong duality holds.

The proof of Slater’s Theorem relies on the geometric interpretation of duality. Suppose we define a set $\mathcal{G}$ where

$\mathcal{G} = \left\{((f_1(x), ..., f_m(x)), (h_1(x), ..., h_p(x)), f_0(x)) \in \mathrm{R}^m \times \mathrm{R}^p \times \mathrm{R} : x \in \mathcal{D}\right\}$.

We may define the optimal solution, $p^*$ to the optimization problem in terms of $\mathcal{G}$ since $\mathcal{G}$ contains the values of the constraint functions and the objective function.

$p^* = \inf \left\{t : (u, v, t) \in \mathcal{G}, u \preceq 0, v = 0 \right\}$.

Once we have defined the optimal solution value in terms of $\mathcal{G}$, we may then define the dual in terms of $\mathcal{G}$. Recall that the Lagrangian dual is

$g(\lambda, \nu) = \inf_{x \in \mathrm{D}}( f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x))$.

We may rewrite the Lagrangian dual in terms of $(u, v, t) \in \mathcal{G}$ as

$g(\lambda, \nu) = \inf\left\{(\lambda, \nu, 1)^T (u, v, t) : (u, v, t) \in \mathcal{G}\right\}$.

If the infimum of the Lagrangian dual is finite, then, by the same argument we gave last time,

$(\lambda, \nu, 1)^T (u, v, t) \geq g(\lambda, \nu)$

for all $(u, v, t) \in \mathcal{G}$. The above inequality defines a hyperplane given by $(\lambda, \nu, 1)^T (u, v, t) = g(\lambda,\nu)$.

Suppose that $\lambda \succeq 0$, then $t \geq (\lambda, \nu, 1)^T (u, v, t)$ given a feasible solution where $u \preceq 0$ and $v = 0$. We can show weak duality by using the fact we just stated:

$p^* = \inf \left\{t : (u, v, t) \in \mathcal{G}, u \preceq 0, v = 0 \right\}$

$\geq \inf\left\{(\lambda, \nu, 1)^T (u, v, t): (u, v, t) \in \mathcal{G}, u \preceq 0, v= 0\right\}$

$\geq \inf\left\{(\lambda, \nu, 1)^T (u, v, t) : (u, v, t) \in \mathcal{G}\right\}$

$= g(\lambda, \nu)$.

Geometrically, each pair of values $\lambda, \nu$ defines a different hyperplane. The hyperplanes’ intersections with the axes define the Lagrangian dual values since at the intersections all constraints are satisfied.

Let’s look at a simple example with only one constraint:

Optimization problem with only one inequality constraint (i.e. $f_1(x)\leq 0$).

This graph assumes $\lambda \succeq 0$. We first see that the optimal value $p^*$ is given by the tangent horizontal line that indicates the minimum value when $u \preceq 0$ (when all constraints are satisfied). For a given $\lambda$, the line $\lambda u + t = g(\lambda)$ provides the lower bound $t$ on the objective value for each $x \in \mathcal{D}$. We may compute $u$ from our constraint $f_1(x)$, and we may also compute $g(\lambda)$ by minimizing $(\lambda, 1)^T (u, t)$.

To list some characteristics for the line (hyperplane) given above:

1. The hyperplane has slope $-\lambda$. Since we defined $\lambda \succeq 0$, $-\lambda \preceq 0$.
2. The hyperplane is always tangent to at least one point on the boundary of $\mathcal{G}$.

If we look at all $t$ values when $u \preceq 0$, we find that we obtain the minimum $t$ value when $u = 0$. We need to only look at the $t$ values when $u \preceq 0$ because our feasible region is defined by the region of $\mathcal{G}$ left of the $t$ axis. This $t$ value is then a lower bound on the optimal objective value.

However, what we want to find is the value of $\lambda$ that will give the maximum $t$ so that we find the best lower bound on the optimal objective value. Therefore, we will plot the hyperplanes for different values of $\lambda$.

Values of $t$ for different $\lambda$.

The figure shows the hyperplanes for different values of $\lambda$. We see that $\lambda^*$ gives $d^*$ which is the best lower bound we may hope to achieve. Here, we also see that strong duality does not hold because $p^* \neq d^*$.

We need to define one more concept before we may move on to the proof sketch of Slater’s Theorem. We define an epigraph of $\mathcal{G}$ to be the set of points

$\mathcal{A} = \left\{(u, v, t) : \exists x \in \mathcal{D}, f_i(x) \leq u_i, i = 1,..., m, h_i(x) = v_i, i = 1,..., p, f_0(x) \leq t\right\}$

where set $\mathcal{A}$ contains all the points in $\mathcal{G}$ and all points which lie above the points in set $\mathcal{G}$.

For example, for the one constraint problem, the graph below shows the set $\mathcal{A}$.

Set $\mathcal{A}$ given $\mathcal{G}$.

Set $\mathcal{A}$ may also be used to express the objective as well as the Lagrangian dual. In some ways, it is actually easier to express the objective and dual in terms of $\mathcal{A}$ than in terms of $\mathcal{G}$.

The optimal value in terms of $\mathcal{A}$ is

$p^* = \inf\left\{t : (0, 0, t) \in \mathcal{A}\right\}$

and the dual

$g(\lambda, \nu) = \inf\left\{(\lambda, \nu, 1)^T (u, v, t) : (u, v, t) \in \mathcal{A}\right\}$.

Using $\mathcal{A}$, we may also more easily show weak duality

$p^* = (\lambda, \nu, 1)^T (0, 0, p^*) \geq g(\lambda, \nu)$.

Proof Sketch of Slater’s Theorem:
Now that we know the geometric interpretation of duality, we may finally use it to provide a proof sketch of Slater’s Theorem. To see a more complete proof, refer to pg. 234 in Boyd and Vandenberghe’s Convex Optimization.

We first define a second convex set $\mathcal{B}$

$\mathcal{B} = \left\{(0, 0, t) \in \mathrm{R}^m \times \mathrm{R}^p \times \mathrm{R} : t < p^*\right\}$.

We can immediately see that sets $\mathcal{A}$ and $\mathcal{B}$ are disjoint since set $\mathcal{A}$ contains all elements whose $t$ values are greater than or equal to $p^*$. If any element in $\mathcal{B}$ is in $\mathcal{A}$, then we have a contradiction because $p^*$ would no longer be the optimal value.

Since $\mathcal{A}$ and $\mathcal{B}$ are disjoint, convex sets, by the Separating Hyperplane Theorem (more formally given in Section 2.5.1 in Boyd and Vandenberghe), there exists $(\widetilde{\lambda}, \widetilde{\nu}, \mu)\neq 0$ and $\alpha$ such that

$\forall (u, v, t) \in \mathcal{A} \rightarrow \widetilde{\lambda}^T u + \widetilde{\nu}^T v + \mu t \geq \alpha$

and

$\forall (u, v, t) \in \mathcal{B} \rightarrow \widetilde{\lambda}^T u + \widetilde{\nu}^T v + \mu t \leq \alpha$.

By a simple argument, $\widetilde{\lambda} \geq 0$ and $\mu \geq 0$. We will first prove the case that strong duality holds when $\mu>0$.

Because $\mu > 0$, we may rewrite the first inequality to be

$\forall (u, v, t) \in \mathcal{A} \rightarrow (\widetilde{\lambda}^T/\mu) u + (\widetilde{\nu}^T/\mu) v + t \geq \alpha/\mu$.

We see that minimizing over $x \in \mathcal{D}$ gives us the dual $g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu)$. By the second inequality, we obtain

$\mu p^* \leq \alpha$

since $p^* = (\lambda, \nu, 1)^T (0, 0, p^*) \geq (\lambda, \nu)^T (0, 0, t)$ for all $(0, 0, t) \in \mathcal{B}$. We can conclude then

$g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu) \geq \alpha/\mu \geq p^*$.

By Weak Duality, we have $g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu) \leq p^*$. Therefore, in the case that $\mu > 0$, we have

$g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu) = p^*$.

This proves strong duality because we know that $d^* \geq g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu)$. We see that in the case when $\mu > 0$, $d^*= g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu)$ (otherwise, we contradict weak duality) and the hyperplane is tangent to $(0, 0, p^*)\in \mathcal{B}$ (and also consequently, the corresponding point in $\mathcal{A}$ where $t = p^*$).

Now, we must prove the case when $\mu = 0$. For the purposes of contradiction, we assume:

1. The rank of $A$ is $a$.
2. relint $\mathcal{D} =$ int $\mathcal{D}$ (subtle distinction, mainly that the interior of the domain is not empty).

In this case, we may again simplify the first inequality

$\forall (u, v, t) \in \mathcal{A} \rightarrow \widetilde{\lambda}^T u + \widetilde{\nu}^T v \geq \alpha$.

From $\forall (u, v, t) \in \mathcal{A} \rightarrow \widetilde{\lambda}^T u + \widetilde{\nu}^T v +\mu t \geq \alpha \geq \mu p^*$ and since we are proving the case $\mu = 0$,

$\forall (u, v, t) \in \mathcal{A} \rightarrow \widetilde{\lambda}^T u + \widetilde{\nu}^T v \geq 0$.

Because Slater’s condition holds (i.e. we have a $\widetilde{x}$ where $f_i(\widetilde{x}) < 0$ and $A\widetilde{x} = b$), we may further simplify the inequality to

$\forall (u, v, t) \in \mathcal{A} \rightarrow \widetilde{\lambda}^T u \geq 0$

for some values of $x \in \mathcal{D}$.

Since $\widetilde{\lambda} \succeq 0$ by the first inequality and $f_i(\widetilde{x}) < 0$ by Slater’s condition, $\widetilde{\lambda} = 0$ is the only possibility. Since $\widetilde{\lambda} = 0$ and $\mu = 0$, $\widetilde{\nu} \neq 0$ to satisfy $(\widetilde{\lambda}, \widetilde{\nu}, \mu) \neq 0$.

Thus, we know that

$\widetilde{\nu}^T (Ax -b) \geq 0$.

By Slater’s condition, since there are points $\widetilde{x} \in$ int $\mathcal{D}$, so for some values, $\widetilde{\nu}^T(A\widetilde{x} - b) = 0$. We have a contradiction. Either $\widetilde{\nu}^T (Ax - b) < 0$, contradicting the condition we gave above, for some $x$ or $\widetilde{\nu}^T A = 0$, contradicting that the rank is $a$.

(We know that for some values, $\widetilde{\nu}^T (Ax - b) < 0$ because $x$ is in the interior of $\mathcal{D}$. Therefore, we may arbitrarily choose a $y$ such that $A(\widetilde{x} + y) = b + Ay$ so $\widetilde{\nu}(b + Ay - b) = \widetilde{\nu}^T Ay < 0$. It is easy to choose an arbitrary $y$ such that the inequality follows.)