## Lagrangian Dual

It’s January again and that means I have a whole month to do, essentially, whatever I want. And part of what I signed up for is the Directed Reading Program organized by the MIT math department. The topic I will be studying is Convex Optimization. I’ve been paired with a grad student, Adrian Vladu, to help me understand what I’m reading in the topic. So most of the cool things I’m posting here, I gleaned from him or the several resources which I will cite throughout.

A mathematical optimization problem has the following form:

minimize $f_0(x)$

subject to $f_i(x)\leq b_i$, $i = 1, ..., m$

where the vector $x=(x_1, ..., x_n)$ is the optimization variable, the function $f_0: \mathcal{R}^n \rightarrow \mathcal{R}$ is the objective function, the functions $f_i: \mathcal{R}^n \rightarrow \mathcal{R}$ are constraint functions and the constants $b_1, ..., b_m$ are upper bounds on the constraints. The optimal solution vector, $p^*$, to the optimization problem is one where the computed objective value is the smallest among all vectors, $v$, that satisfy the constraints: $f_0(v) \geq f_0(p^*)$.

Linear optimization (or linear programming LP) problems define a class of optimization problems where the objective and constraint functions are linear, i.e. $f_i(\alpha x + \beta y) = \alpha f_i(x) + \beta f_i(y)$ for all $i = 0, 1,..., m$, $x, y \in \mathcal{R}^n$ and $\alpha, \beta \in \mathcal{R}$.

The class of convex optimization problems contains optimization problems where the objective and constraint functions are convex, i.e. $f_i(\alpha x + \beta y) \leq \alpha f_i(x) + \beta f_i(y)$ for all $i = 0, 1, ..., m$, $x, y \in \mathcal{R}^n$, $\alpha, \beta \in \mathcal{R}$, $\alpha +\beta = 1$, and $\alpha, \beta \geq 0$.

I won’t immediately use these definitions, but in later blog posts, I most certainly will.

But, first, I will digress slightly to discuss the Lagrangian dual which is important in understanding how the dual we are used to seeing to an optimization problem is derived. For a given optimization problem in standard form:

minimize $f_0(x)$

subject to $f_i(x) \leq 0$, $i=1,...,m$

$h_i(x)=0$, $i=1,...,p$.

Let $x \in \mathcal{R}^n$, and $p^*$ be the optimal value to the optimization problem.

The Lagrangian function $L:\mathcal{R}^n \times \mathcal{R}^m \times \mathcal{R}^p \rightarrow \mathcal{R}$ associated with this optimization problem is defined as:

$L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)$.

Intuitively, the Lagrangian function measures the amount of “violations” a particular $x$ incurs. A violation is a constraint that is not satisfied. To truly measure when all constraints are satisfied by a solution $p^*$ to the problem, we may create two indicator functions, $I_{-}(u)$ and $I_0(u)$, to “indicate” when a constraint is violated. We may define the indicator function, $I_-(u)$ as

$\displaystyle I_{-}(u)=\begin{cases} \infty, & u > 0.\\ 0, & u \leq 0. \end{cases}$

and $I_0(u)$ as

$\displaystyle I_{0}(u)=\begin{cases} \infty, & u \neq 0.\\ 0, & u = 0. \end{cases}$.

We can then write an “indicator equation” for whether a solution is a feasible solution,

$f_0(x) + \sum_{i=1}^m I_-(f_i(x)) + \sum_{i=1}^p I_0(h_i(x))$.

Then, minimizing the above will yield the optimal solution to the original optimization problem. We may easily verify that this is true because if no feasible solutions exist, the above function will output $\infty$ for all $x$. Furthermore, if $x$ is a feasible solution, then both $\sum_{i=1}^m I_-(f_i(x))$ and $\sum_{i=1}^p I_0(h_i(x))$ will be zero. Therefore, we are left to minimize $f_0(x)$ which was our objective function in our original optimization problem.

Because the Lagrangian function does not use indicator functions, it inflicts a less “severe punishment” for violations. But it is a lower bound, since if we assume $\lambda_i\geq 0$ for all $i$, then $\lambda_i u \leq I_-(u)$ and $\nu_i u \leq I_0(u)$ for all $u$ and for all $i$.

We define the Lagrange dual function, $g: \mathcal{R}^m \times \mathcal{R}^p\rightarrow \mathcal{R}$, for $\lambda\in \mathcal{R}^m$, and $\nu \in \mathcal{R}^p$ to be the minimum value of the Lagrangian over $x$:

$g(\lambda, \nu) = \inf_{x \in \mathrm{D}} L(x, \lambda, \nu)$.

By our argument that the Lagrangian function is a lower bound on the solution of the original optimization problem, we see that the Lagrangian dual is precisely a lower bound of the optimal solution of the original optimization problem.

Stated more rigorously, we may show that the Lagrangian dual is a lower bound on the optimal value of the optimization problem, $g(\lambda, \nu) \leq p^*$ as follows. For any feasible $x'$, by definition, we have $\sum_{i=1}^m \lambda_i f_i(x') + \sum_{i=1}^p \nu_i h_i(x') \leq 0$. Therefore,

$L(x', \lambda, \nu) = f_0(x') + \sum_{i=1}^m \lambda_i f_i(x') + \sum_{i=1}^p \nu_i h_i(x') \leq f_0(x')$.

Since this is true for all feasible $x'$ and by definition of the Lagrangian dual as the lower bound on the Lagrangian, $g(\lambda, \nu) \leq f_0(x')$ for all feasible $x'$ and so $g(\lambda, \nu) \leq p^*$.

Thus, we conclude that the Lagrangian dual is a lower bound on the optimal value of its corresponding optimization problem. Then, a transformation will result in the dual form of the primal that we are used to seeing. (To be discussed tomorrow.)

References:

1. S Boyd, L Vandenberghe, Convex Optimization.
2. Stephen Boyd’s Lecture on the Lagrangian dual.