It’s January again and that means I have a whole month to do, essentially, whatever I want. And part of what I signed up for is the Directed Reading Program organized by the MIT math department. The topic I will be studying is Convex Optimization. I’ve been paired with a grad student, Adrian Vladu, to help me understand what I’m reading in the topic. So most of the cool things I’m posting here, I gleaned from him or the several resources which I will cite throughout.
A mathematical optimization problem has the following form:
subject to ,
where the vector is the optimization variable, the function is the objective function, the functions are constraint functions and the constants are upper bounds on the constraints. The optimal solution vector, , to the optimization problem is one where the computed objective value is the smallest among all vectors, , that satisfy the constraints: .
Linear optimization (or linear programming LP) problems define a class of optimization problems where the objective and constraint functions are linear, i.e. for all , and .
The class of convex optimization problems contains optimization problems where the objective and constraint functions are convex, i.e. for all , , , , and .
I won’t immediately use these definitions, but in later blog posts, I most certainly will.
But, first, I will digress slightly to discuss the Lagrangian dual which is important in understanding how the dual we are used to seeing to an optimization problem is derived. For a given optimization problem in standard form:
subject to ,
Let , and be the optimal value to the optimization problem.
The Lagrangian function associated with this optimization problem is defined as:
Intuitively, the Lagrangian function measures the amount of “violations” a particular incurs. A violation is a constraint that is not satisfied. To truly measure when all constraints are satisfied by a solution to the problem, we may create two indicator functions, and , to “indicate” when a constraint is violated. We may define the indicator function, as
We can then write an “indicator equation” for whether a solution is a feasible solution,
Then, minimizing the above will yield the optimal solution to the original optimization problem. We may easily verify that this is true because if no feasible solutions exist, the above function will output for all . Furthermore, if is a feasible solution, then both and will be zero. Therefore, we are left to minimize which was our objective function in our original optimization problem.
Because the Lagrangian function does not use indicator functions, it inflicts a less “severe punishment” for violations. But it is a lower bound, since if we assume for all , then and for all and for all .
We define the Lagrange dual function, , for , and to be the minimum value of the Lagrangian over :
By our argument that the Lagrangian function is a lower bound on the solution of the original optimization problem, we see that the Lagrangian dual is precisely a lower bound of the optimal solution of the original optimization problem.
Stated more rigorously, we may show that the Lagrangian dual is a lower bound on the optimal value of the optimization problem, as follows. For any feasible , by definition, we have . Therefore,
Since this is true for all feasible and by definition of the Lagrangian dual as the lower bound on the Lagrangian, for all feasible and so .
Thus, we conclude that the Lagrangian dual is a lower bound on the optimal value of its corresponding optimization problem. Then, a transformation will result in the dual form of the primal that we are used to seeing. (To be discussed tomorrow.)