Lecture 14: Algorithms for Solving LPs
LP Solving Algorithms
So far we learned a lot about the structure of LPs. But how do we actually solve them? Can we do it faster than by just enumerating all basic feasible solutions (BFS)?
- Simplex -- 1940s.
- Ellipsoid method -- 1970s.
- Interior point method -- 1980s.
Simplex
- Recall that we know that for an LP (in standard form) opt always corresponds to a vertex of a polytope (a BFS).
- Imagine we are given some BFS, how to check if it is optimal?
- Naive approach: check all other BFS' and compare their costs.
$\Rightarrow$ Problem: there can be $m \choose n$ of them!
- But do we really need to look at all the other BFS'?
Interlude: Vertices in Standard form/bases
-
Without loss of generality make $A$ have full row rank (define):
- find basis in rows of $A$, say $a_1,\ldots,a_k$
- any other $a_\ell$ is linear combo of those.
- so $a_\ell x = \sum \lambda_i a_i x $
- so better have $b_l = \sum \lambda_i a_i$ if any solution.
- if so, anything feasible for $a_1,\ldots,a_\ell$ feasible for all.
- $m$ constraints $Ax=b$ all tight/active
- given this, need $n-m$ of the $x_i \ge 0$ constraints
- also, need them to form a basis with the $a_i$.
- write matrix of tight constraints, first $m$ rows then
identity matrix adjacent to zero matrix
\[
\begin{array}{l}
m\text{ constraint rows}
n\text{ row identity}
\end{array} \left( \begin{array}{lccccccc} &\multicolumn{4}{c}{n-m\text{ columns}}&\multicolumn{3}{c}{m\text{ columns}}
&\multicolumn{4}{c}{\overbrace{\qquad\qquad\qquad}}&\multicolumn{3}{c}{\overbrace{\qquad\qquad}}
&\multicolumn{3}{c}{\cdots}&A&\multicolumn{3}{c}{\cdots}
&1&
&&1&&&&0
&&&1
&&&&1
&&&&&1
&&0&&&&1
&&&&&&&1
\end{array} \right)x \begin{array}{c}
=
\ge
=
\end{array} \left( \begin{array}{c}
b
0
0
\end{array} \right) \]
- zero matrix corresponds to slack (nonzero) $x_i$
- need linearly independent rows
- equiv, need linearly independent columns
- but columns are linearly independent iff $m$ columns of $A$
including all that correspond to nonzero $x_i$ are linearly
independent
- because columns of identity matrix are clearly independent
- but columns of zero matrix depend entirely on $A$ for independence
- gives other way to define a vertex: $x$ is vertex if
- $Ax=b$
- $m$ linearly independent columns of $A$ include all $x_j \ne 0$
- $x_j$ of columns called basic set $B$, others nonbasic set $N$
- given bases, can compute $x$:
- $A_B$ is basis columns, $m \times m$ and full rank.
- solve $A_B x_B = b$, set other $x_N=0$.
- note can have many bases (column sets) for same vertex (choice of 0 $x_j$)
- note algebra is $m$-dimensional, so really depends only on number of constraints not variables
Summary: $x$ is vertex of $P$ if for some basis $B$,
- $x_N=0$
- $A_B$ nonsingular
- $A_B^{-1} b \ge 0$
Back to the Simplex Method
- The algorithm:
- start with a basic feasible soluion
- try to improve it
- picture: move on improving edge.
- math: work relative to current $x$
- rewrite LP: $\min c_Bx_B + c_N x_N$, $A_Bx_B+A_N x_N=b$, $x \ge 0$
- true for all $x$, not just current
- $B$ is basis for bfs
- since $A_Bx_B = b-A_Nx_N$, so $x_B = A_B^{-1}(b-A_Nx_N)$, know that $$ \begin{eqnarray*} cx &= &c_Bx_B+c_Nx_N\\ &= & c_B A_B^{-1}(b-A_Nx_N) + c_N x_N\\ &= & c_B A_B^{-1} b + (c_N-c_BA_B^{-1}A_N)x_N\\ \end{eqnarray*} $$
- reduced cost $\tilde c_N = c_N-c_BA_B^{-1}A_N$
- if no $\tilde c_j < 0$, then increasing any $x_j \in N$ increases (nondecreases) cost (may violate feasiblity for $x_B$, but who cares?), so are at optimum!
- if some $\tilde c_j < 0$, can increase $x_j$ to decrease cost
- but since $x_B$ is func of $x_N$, will have to stop when $x_B$
hits a constraint.
- this happens when some $x_i$, $i \in B$ hits 0.
- we bring $j$ into basis, take $i$ out of basis.
- we've moved to an adjacent basis.
- called a pivot
- show picture
Potential problems
- Need initial vertex. How find? HW.
- maybe some $x_i \in B$ already 0, so can't increase $x_j$, just pivot to same obj value.
- could lead to cycle in pivoting, infinite loop.
- can prove exist noncycling pivots (eg, lexicographically first $j$ and $i$)
- Would hope that you don't need to visit too many of the BFS. Not true! Klee-Minty cube (a slightly twisted hypercube)
- We can't prove though that there is no pivot choice rule that leads to always fast simplex (the best ones are “only” subexponential)
- Hirch conjecture (1957): diameter/number of pivots needed is at most $m+n$ (very optimistic!) Disproved by Santos in 2010 but only barely so: diameter $\geq (1+\eps)(m+n)$.
- Still possible that, e.g., diameter $\leq poly(m,n)$.
- Kalai-Kleitmen 1992: $m^{\log n}$ bound on path length, also $2^n m$.
- There is a linear time simplex for fixed $n$!
- Note small diameter does not guarantee fast simplex pivot rule - as some of the pivots might require increasing the value of the objective from time to time!
- In practice: works really really well (unless it doesn't). (Needs some linear algebra tricks to make pivoting fast.)
Simplex and Duality
- defined reduced costs of nonbasic vars $N$ by \[ {\tilde c}_N = c_N-c_BA^{-1}_BA_N \] and argued that when all ${\tilde c}_N \ge 0$, had optimum.
- Define $y=c_BA^{-1}_B$ (so of course $c_B=yA_B$)
- nonegative reduced costs means $c_N \ge yA_N$
- put together with $c_B=yA_B$, see $yA \le c$ so $y$ is dual feasible
- but, $yb = c_B A^{-1}_B b = c_Bx_B=cx$ (since $x_N=0$)
- so $y$ is dual optimum.
- simplex finds primal and dual optima simultaneously
- more generally, $yb-cx$ measures duality gap for current solution!
- another way to prove duality theorem: prove there is a terminating (non cycling) simplex algorithm.
Polynomial Time Bounds
We know a lot about structure. And we've seen how to verify optimality in polynomial time. Now turn to question: can we solve in polynomial time?
Yes, sort of (Khachiyan 1979):
- polynomial algorithms exist
- strongly polynomial unknown.
Ellipsoid Method
Basic idea: Reduce optimization to feasibility checking. (Remind how it is with polytope $P$.)
- Bolzano-Weierstrass Theorem---proves certain sequence has a subsequence with a limit by repeated subdividing of intervals to get a point in the subinterval.
- The Bolzano-Weierstrass method: Divide the desert by a line running from north to south. The lion is then either in the eastern or in the western part. Let's assume it is in the eastern part. Divide this part by a line running from east to west. The lion is either in the northern or in the southern part. Let's assume it is in the northern part. We can continue this process arbitrarily and thereby constructing with each step an increasingly narrow fence around the selected area. The diameter of the chosen partitions converges to zero so that the lion is caged into a fence of arbitrarily small diameter.
- Analogue of a fence: separation oracle. Given point $x$ that is not in $P$ come up with a separating hyperplane, i.e., $u$ such that $u^T x \leq 0$ but $y^Tx' > 0$ for all $x'\in P$.
- We want to use ellipsoid instead of boxes.
- Ellipsoid $E(D,z):=\{x \mid (x-z)^TD^{-1}(x-z) \leq 1 \}$, where $D=BB^T$ for some invertible matrix $B$.
- Note: $E(r\cdot I,z)$ is just an radius $r$ sphere centered at $z$.
- In general: $E(D,z)$ is a mapping of a unit sphere via an affine change of coordinates $x\rightarrow Bx +z$.
- Goal: start with a big ellipsoid that encompasses everything (for sure).
- In each step, query the center of the ellipsoid $z$. If $z\in P$ done. Otherwise, there is a separating hyperplane $u$. Shift it so it passes through the center and draw a smaller ellipsoid that encompasses the half in which the feasible point might lie.
- How to measure progress? Keep track of the volume of the ellipsoid. Want it to shrink significantly in each step.
- Outline of algorithm:
- solve feasibility, which we know is equivalent to optimizing
- goal: find a feasible point for $P=\set{Ax \le b}$
- start with ellipse containing $P$, center $z$
- check if $z \in P$
- if not, use separating hyperplane to get 1/2 of ellipse containing $P$
- find a smaller ellipse containing this 1/2 of original ellipse
- until center of ellipse is in $P$.
- Consider sphere case (wlog as affine transformations change the volume by the same factor and we care about the ratio change), separating hyperplane $x_1=0$
- try center at $(\epsilon,0,0,\ldots)$
- Draw picture to see constraints
- requirements:
- $d_1^{-1}(x_1-\epsilon )^2+\sum_{i > 1}d_i^{-1}x_i^2 \le 1$
- constraint at $(1,0,0)$: $d_1^{-1}(x-\epsilon )^2 = 1$ so $d_1 = (1-\epsilon )^2$
- constraint at $(0,1,0)$: $\epsilon ^2/(1-\epsilon )^2+d_2^{-1} = 1$ so $d_2^{-1} = 1-\epsilon ^2/(1-\epsilon )^2\approx 1-\epsilon ^2$
- What is volume? about $(1-\epsilon )/(1-\epsilon ^2)^{n/2}$
- set $\epsilon $ about $1/n$, get $(1-1/n)$ volume ratio.
- Formalize shrinking lemma:
- Let $E=(z,D)$ define an $n$-dimensional ellipsoid
- consider separating hyperplane $ax \le az$
- Define $E'=(z',D')$ ellipsoid: $$ \begin{eqnarray*} z' &= &z-\frac{1}{n+1}\frac{Da^T}{\sqrt{aDa^T}}\\ D' &= & \frac{n^2}{n^2-1}(D-\frac{2}{n+1}\frac{Da^TaD}{aDa^T}) \end{eqnarray*} $$
- then $$ \begin{eqnarray*} E \cap \set{x \mid ax \le ez} &\subseteq &E'\\ \vol(E') &\le &e^{1/(2n+1)}\vol(E) \end{eqnarray*} $$
- for proof, first show works with $D=I$ and $z=0$. new ellipse: $$ \begin{eqnarray*} z' &=&-\frac1{n+1}\\ D' &= & \frac{n^2}{n^2-1}(I-\frac{2}{n+1} I_{11}) \end{eqnarray*} $$ and volume ratio easy to compute directly.
- for general case, transform to coordinates where $D=I$ (using new basis $B$), get new ellipse, transform back to old coordinates, get $(z',D')$ (note transformation don't affect volume ratios.
- So ellipsoid shrinks. Now prove 2 things:
- needn't start infinitely large
- can't get infinitely small
- Starting size:
- recall bounds on size of vertices (polynomial)
- so coords of vertices are exponential but no larger
- so can start with sphere with radius exceeding this exponential bound
- this only uses polynomial values in $D$ matrix.
- if unbounded, no vertices of $P$, will get vertex of box.
- Ending size:
- suppose polytope full dimensional
- if so, it has $n+1$ affinely indpendent vertices
- all the vertices have poly size coordinates
- so they contain a box whose volume is a poly-size number (computable as determinant of vertex coordinates)
- Put together:
- starting volume $2^{n^{O(1)}}$
- ending volume $2^{-n^{O(1)}}$
- each iteration reduces volume by $e^{1/(2n+1)}$ factor
- so $2n+1$ iters reduce by $e$
- so $n^{O(1)}$ reduce by $e^{n^{O(1)}}$
- at which point, ellipse doesn't contain $P$, contra
- must have hit a point in $P$ before.
- Justifying full dimensional:
- feasible points form a “lattice” of grid points with bounded number of bits
- take $\set{Ax \le b}$, replace with $P'=\set{Ax \le b+\epsilon}$ for tiny $\epsilon$
- any point of $P$ is an interior of $P'$, so $P'$ full dimensional (only have interior for full dimensional objects)
- $P$ empty (of lattice points) iff $P'$ is (because $\epsilon$ so small)
- can “round” a point of $P'$ to $P$.
- Infinite precision:
- built a new ellipsoid each time.
- maybe its bits got big?
- no.
- Optimization
- Could use optimization to feasibility transform
- But an easier approach is binary search on value of optimum
- Add constraint that optimum must exceed proposed value
- Vary value
- Need to find vertex?
- use rounding technique
Feasibility vs. Separation vs. Optimization
Notice in ellipsoid, were only using one constraint at a time.
- didn't matter how many there were.
- didn't need to see all of them at once.
- just needed each to be represented in polynomial size.
- so ellipsoid works, even if huge number of constraints, so long as have separation oracle: given point not in $P$, find separating hyperplane.
- of course, feasibility is same as optimize, so can optimize with sep oracle too.
- this is on a polytope by polytope basis. If can separate a particular polytope, can optimize over that polytope.
- Another good reason to optimize by binary search on objective:
- just need feasibility oracle/separator
- Might not have separator for combined primal-dual formulation
- especially since dual can have exponentially many variables!
This is very useful in many applications. e.g. network design.
Can also show that optimization implies separation:
- suppose can optimize over $P$
- then of course can find a point in $P$
- suppose $0\in P$ (saves notation mess---just shift $P$)
- define $P^* = \set{z \mid zx \le 1\ \forall x \in P}$
- can separate over $P^*$:
- given $w$, run OPT$(p)$ with $w$ objective
- get $x^*$ maximizing $wx$
- if $wx^* \le 1$ then $w \in P^*$
- else $wx^* > 1 \ge x^*z \ \forall z \in P^*$ so $x^*$ is separating hyperplane
- since can separate $P^*$, can optimize it
- suppose want to separate $y$ from $P$
- let $z=$OPT$(P^*,y)$.
- if $yz>1$ then (since $z \in P^*$) we have $yz>1$ but $xz \le 1 \ \forall x \in P$ (separating hyperplane)
- if $y \le 1$ then suppose $y \notin P$.
- then $ax \le \beta$ for $x \in P$ but $ay > \beta$
- since $0 \in P$, $\beta \ge 0$
- if $\beta > 0$ then $\frac{a}{\beta}x \le 1\ \forall x \in P$ so its in $P^*$ but $\frac{a}{\beta}y > 1$ so it is a better opt for $y$ contra
- if $\beta = 0$ then $\lambda ax \le 0 \le 1 \forall \lambda>0$ so $\lambda a \in P^*$ but $\lambda a y>1$ for some $\lambda>0$ so is better opt for $y$ contra.