Linear programming.
- define
- assumptions:
- nonempty, bounded polyhedron
- minimizing $x_1$
- unique minimum, at a vertex
- exactly $d$ constraints per vertex
- definitions:
- hyperplanes $H$
- basis $B(H)$ of hyperplanes that define optimum
- optimum value $O(H)$
- Simplex
- exhaustive polytope search:
- walks on vertices
- runs in $O(n^{\lceil d/2 \rceil})$ time in theory
- often great in practice
- polytime algorithms exist (ellipsoid)
- but bit-dependent (weakly polynomial)!
- OPEN: strongly polynomial LP
- goal today: polynomial algorithms for small $d$
Random sampling algorithm
- Goal: find $B(H)$
- Plan: random sample
- solve random subproblem
- keep only violating constraints $V$
- recurse on leftover
- problem: violators may not contain all of $B(H)$
- but, contain some of $B(H)$
- opt of sample better than opt of whole
- but any point feasible for $B(H)$ no better than $O(H)$
- so current opt not feasible for $B(H)$
- so some $B(H)$ violated
- revised plan:
- random sample
- ignore useless planes, add violators to “active set”
- repeat sample on whole problem while keeping active set
- claim: add one $B(H)$ plane per iteration
- Algorithm SampLP:
- set $S$ of “active” hyperplanes.
- if $n< 9d^2$ do simplex ($d^{O(d)}$)
- pick $R \subseteq H-S$ of size $d\sqrt{n}$
- $x \leftarrow$ SampLP$(R \cup S)$
- $V\leftarrow$ hyperplanes of $H$ that violate $x$
- if $V \le 2\sqrt{n}$, add to $S$
- Runtime analysis:
- mean size of $V$ at most $\sqrt{n}$ (todo)
- each successful recursion adds $\le 2\sqrt{n}$ plans to $S$
- and also must add on $B(H)$ hyperplane
- each iteration successful with prob. 1/2.
- deduce expect $2d$ iterations till have all of $B(H)$; done.
- overall $S$ has $\le 2d\sqrt{n}$ violators plus $d\sqrt{n}$ sample: total $\le 3d\sqrt{n}$
- $O(d n)$ per phase needed to check violating constraints: $O(d^2 n)$ total
- recursion size at most $3d\sqrt{n}$ \[T(n)\le 2dT(3d\sqrt{n})+O(d^2 n)=O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}\] (Note valid use of linearity of expectation)
Must prove claim, that mean $V \le \sqrt{n}$.
- Lemma:
- suppose $|H-S|=m$.
- sample $R$ of size $r$ from $H-S$
- then expected violators $d(m-r-1)/(r-d)$
- Compare to MST: says roughly, if pick constraints with prob. $p$, see $d/p$ violations!
- book broken: only works for empty $S$
- Let $C_H$ be set of optima of subsets $T \cup S$, $T \subseteq H$
- Let $C_R$ be set of optima of subsets $T \cup S$, $T \subseteq R$
- note $C_R \subseteq C_H$, and $O(R \cup S)$ is only point violating no constraints of $R$
- Let $v_x$ be number of constraints in $H$ violated by $x \in C_H$,
- Let $i_x$ indicate $x=OPT(R \cup S)$ $$ \begin{eqnarray*} E[|V|] &= &E[\sum v_x i_x]\\ &= & \sum v_x \Pr[i_x] \end{eqnarray*} $$
- decide $\Pr[i_x]$
- $\binom{m }{ r}$ equally likely subsets.
- how many have optimum $x$?
- let $q_x$ be number of planes defining $x$ not already in $S$
- must choose $q_x$ planes to define $x$
- all other choices must avoid planes violating $x$.
- prob. $$ \begin{eqnarray*} \binom{m-v_x-q_x }{ r-q_x} / \binom{m }{ r} &= & \frac{(m-v_x-q_x)-(r-q_x)+1}{r-q_x} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \\ &\le & \frac{(m-r+1)}{r-d} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \end{eqnarray*} $$
- deduce \[ E[V] \le \frac{m-r+1}{r-d}\sum v_x \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \]
- summand is prob that $x$ is a point that violates exactly one
constraint in $R$.
- must pick $q_x$ constraints defining $x$
- must pick $r-q_x-1$ constraints from $m-v_x-q_x$ nonviolators
- must pick one of $v_x$ violators
- therefore, sum is expected number of points that violate exactly one constraint in $R$.
- but this is only $d$ (one for each constraint in basis of $R$)
- nicer proof later.
Result:
- saw sampling LP that ran in time $O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}$
- key idea: if pick $r$ random hyperplanes and solve, expect only $dm/r$ violating hyperplanes.
Iterative Reweighting
Get rid of recursion and highest order term.
Idea:
- Multiplicative weights
- upweight important stuff, downweight unimportant
- even if sometimes wrong, important stuff grows faster
- Boosting in machine learning
- combination of experts in 6.854
idea: be “softer” regarding mistakes
- plane in $V$ gives “evidence” it's in $B(H)$
- Algorithm:
- give each plane weight one
- pick $9d^2$ planes with prob. proportional to weights
- find optimum of $R$
- find violators of $R$
- if \[ \sum_{h \in V} w_h \le (2\sum_{h \in H} w_h)/(9d-1) \] then double violator weights
- repeat till no violators
- Analysis
- Iter successful with prob. 1/2
- Based on (homework) generalizing lemma: if choose $r$ with weighted proportions for $W$, then expected violating weight is $Wd/r$.
- show weight of basis grows till rest is negligible.
- claim $O(d\log n)$ iterations suffice.
- deduce runtime $O(d^2 n \log n)+d^{d/2+O(1)}\log n$.
- proof of claim:
- after each iter, double weight of some basis element
- after $kd$ iterations, basis weight at least $d2^{k}$
- total weight increase at most $(1+2/(9d-1))^{kd} \le n\exp(2kd/(9d-1))$
- after $d\log n$ iterations, done.
- Iter successful with prob. 1/2
- so runtime $O(d^2n\log n)+(d\log n)\dot d^{O(d)}$
- Can improve to linear in $n$
Mixed algorithm:
- use recursive algorithm at top level
- but solve each recursive call by iterative reweighting algorithm
- runtime $O(d^2n)+(d^2\log n)\cdot d^{O(d)} + O(d^4\sqrt{n}\log n)$
- so truly linear in $n$
- one bottleneck is simplex which gives $d^{O(d)}$ terms
- improve with subexponential algorithms to $O(d^2 n)+e^{O(\sqrt{d\log d})}\log n$.
Backwards Analysis
Seidel's Randomized incremental algorithm \[ T(n) \le T(n-1,d)+\frac{d}{n}(O(dn)+T(n-1,d-1)) = O(d!n) \]
Also simplifies proof of main theorem:
- On choosing $r$ random hyperplanes $R$, want expected number of violators $v$
- instead, consider random $h \in H-R$, ask prob. of violation
- which is just $E[v]/(n-r)$
- for prob. $h$ is violated, use backwards analysis
- instead of picking $R$ then $h$, pick $R \cup \{h\}$ then pick $h$
- prob. $h$ violated is just prob. it is one of opt in $R \cup \{h\}$
- which is $d/(r+1)$
- so $E[v]/(n-r)=d/(r+1)$ so $E[v] = (n-r)d/(r+1)$
Hidden Dimension
Idea:
- when recurse, don't just start from the beginning
- When last hyperplane $h$ violates recursive solution,
- Use opt of recursion as starting point for finding opt including $h$
Intuition
- look more closely at Seidel's recursion
- Seidel's “advantage” was that when omitted hyperplan $H$ was violator, we knew it was in opt
- so reduce to findinig $d-1$ more constraints of opt
- in fact this is pessimistic!
- optimum basis $h_1,\ldots,h_d$
- order such that $OPT(H-h_1)< OPT(H-h_2)< \cdots< OPT(H-h_d)$
- note none can be equal since each removal yields different optimal basis
- suppose discarded $h_j$ for recursion
- so found $OPT(H-h_j)$
- claim $h_1 \in OPT(H-h_j)$
- suppose not
- then $OPT(H-h_i - h_j) = OPT(H-h_j)$
- but $OPT(H-h_1) \ge OPT(H-h_1-h_j)$
- which combines to imply $OPT(H-h_1) \ge OPT(H-h_j)$
- contradicts $h_i$ order
- conclusion: $OPT(H-h_j)$ already contains $h_1,\ldots h_{j-1}$
- so recursion found $j$ optimal constraints, not $1$ as Seidel thought
- in expectation, we find $d/2$.
- How can we use this?
- use optimum basis found so far as “hint” for OPT
- it contains many hyperplanes of OPT
- use recursion to add more yperplanes
- show adding many at once
- so not too much recursion
Algorithm BasisLP
- Input constraints $H$ and kickoff basis $T$
- $T$ is a “hint” that affects runtime but not value
- If $H=T$, output (obvious OPT)
- Pick a random $h \in H-T$
- $T'\leftarrow$BasisLP$(H-h,T)$ (Seidel's recursion)
- if $h$ doesn't violate $T'$, done
- else, output BasisLP$(H,OPT(T'\cup h))$. (change Seidel's recursion)
- note finding $OPT(T' \cup h)$ trivial since only $d+1$ constraints
Intuition:
- $T$ contains many of the optimum hyperplanes,
- so make sure what you discard is not in $T$
- through recursive calls, $T$ keeps getting tighter (larger OPT)
- so closer to OPT
- so has more of its hyperplanes
- Seidel showed you collected one when a violation happens
- strengthen: prove it collects many (in expectation) so get them all quickly.
Enforcing constraints
- Given basis $T \subseteq H$, say constraint $h$ is enforcing if $OPT(H-h) < OPT(T)$.
- else (since $OPT(T) \le OPT(H)$) must have $h$ in $OPT(H)$ (else $OPT(H-h)=OPT(H) \ge OPT(T)$)
- Also, must have $h \in T$ since otherwise $T \subseteq H-h$ which would make $OPT(H-h) \ge OPT(T)$.
- So enforcing is in $OPT(H) \cap T$
- So at most $d$ of them.
- And if exactly $d$, then $T=OPT(H)$.
Hidden dimension
- Define hidden dimension of $T$ as $d$ minus number of enforcing constraints.
- Claim: if second recursive call happens, hidden dimension drops:
- New kickoff basis is larger value than old
- So all previously enforcing constraints are enforcing
- And left-out constraint is enforcing
- In fact, much better:
- assume did second recursive call
- suppose $m$ opt constraints $h_i$ not yet in $T$
- order $h_i$ by value of $OPT(H-h_i)$
- by backwards analysis, equally likely to leave any out
- as above, if leave out $r^{th}$, then all $1,\ldots,r$ are enforcing on second recursive call
- Conclude for $m$ constraints and hidden dimension $k$, number of violation tests satisfies $$ \begin{eqnarray*} T(n,k) = T(n-1,d) + \frac{m}{n-d}\left(1+\sum_{i=1}^{m} T(m,d-i)\right) \end{eqnarray*} $$
- Careful analysis shows $T(n,0) \le ne^{O(\sqrt{d\log n})}$
- worse $n$ dependence than previous algorithms (nonlinear)
- but what's really cool here is that it is subexponential in $d$
- none of the previous algorithms are
- use as base case in SampLP to get algorithm that is linear in $n$ and subexponential in $d$
- This was the first subexponential algorithm, though it was the first realized one because you had to solve the recurrence very carefully.
Simplex
The algorithms we studied above are not simplex algorithms
- they discard constraints to find an optimum of a subset, then jump there
- simplex has to move neighbor to neighbor
- In each step, move to a neighbor with better objective
- (we assume general position, so no degenerate or equal-objective vertices)
- (even in this case, Klee-Minty cube makes naive simplex take exponential time)
- open: is there a polynomial time simplex?
- this would appeal since simplex is so popular in practice
We don't have polynomial, but Kalai gave a subexponential simplex algorithm
- interestingly, it's just the dual of the BasisLP algorithm
- but was discovered independently (and after)
- has same runtime: $ne^{O(\sqrt{d\log n})}$
- (but it's runtime was proven first)
- Also, Kalai and Kleitman have shown polytope diameter at most $n^{\log d}$
- still open: simplex algorithm with this runtime
Kalai's algorithm
- Facets
- a facet is a “side” of the polytope; an intersection of the polytope with one of the hyperplanes defining it
- so there are at most $n$ facets.
- possibly less for “never feasible” constraints.
- given current vertex $v$, a facet is active if it contains a better vertex
- Algorithm
- Step 1: starting from $v$, find $r$ active facets
- Step 2: choosing randomly among the $r$, use simplex to optimize within that facet
- this gives new $v$; repeat till done
- easy to find $d-1$ active facets
- $v$ has a better neighbor
- the edge to this neighbor is the intersection of $d-1$ constraints
- each of those constraints defines an active facet (containing $v$)
- so choosing one at random is really just choosing a random tight constraint at $v$ and using its facet
- Analysis
- order facets by value of best vertex on facet
- if you find opt in one of the facets, all worse facets become “inactive”
- so recurse on number of active facets $n$
- $$ \begin{align*} T(n,d) &= T(n-1,d-1) + \frac{1}{d-1}\sum_{i=1}^{d-1} T(n-i,d)\\ &= e^{O(\sqrt{n\log d})} \end{align*} $$
- for better runtime, find $\max(d,n/2)$ facets
- finding $n/2$ takes more work; a kind of BFS over pivots from $v$
- recurrence yields $T(d,n) = e^{O(\sqrt{d\log n})}$
- better for small $d$
- if $d$ and $n$ comparable, get $e^{\sqrt{n}}
Contrast
- We saw various sampling algorithms linear in $n$ but exponential in $d$
- So for worst case $d=n$ still exponential
- Kalai algorithm is subexponential in all cases
- Also, Kalai's algorithm if simplex. All moves are pivots.
- Unlike sampling algorithm that optimized over subsets (jumps, not pivots)
- Later, was discovered that BasisLP has same runtime
- and in fact, that BasisLP is exactly Kalai's algorithm running in dual space