Linear programming.
- define
- assumptions:
- nonempty, bounded polyhedron
- minimizing $x_1$
- unique minimum, at a vertex
- exactly $d$ constraints per vertex
- definitions:
- hyperplanes $H$
- basis $B(H)$ of hyperplanes that define optimum
- optimum value $O(H)$
- Simplex
- exhaustive polytope search:
- walks on vertices
- runs in $O(n^{\lceil d/2 \rceil})$ time in theory
- often great in practice
- polytime algorithms exist (ellipsoid)
- but bit-dependent (weakly polynomial)!
- OPEN: strongly polynomial LP
- goal today: polynomial algorithms for small $d$
Random sampling algorithm
- Goal: find $B(H)$
- Plan: random sample
- solve random subproblem
- keep only violating constraints $V$
- recurse on leftover
- problem: violators may not contain all of $B(H)$
- but, contain some of $B(H)$
- opt of sample better than opt of whole
- but any point feasible for $B(H)$ no better than $O(H)$
- so current opt not feasible for $B(H)$
- so some $B(H)$ violated
- revised plan:
- random sample
- ignore useless planes, add violators to “active set”
- repeat sample on whole problem while keeping active set
- claim: add one $B(H)$ plane per iteration
- Algorithm SampLP:
- set $S$ of “active” hyperplanes.
- if $n< 9d^2$ do simplex ($d^{O(d)}$)
- pick $R \subseteq H-S$ of size $d\sqrt{n}$
- $x \leftarrow$ SampLP$(R \cup S)$
- $V\leftarrow$ hyperplanes of $H$ that violate $x$
- if $V \le 2\sqrt{n}$, add to $S$
- Runtime analysis:
- mean size of $V$ at most $\sqrt{n}$ (todo)
- each successful recursion adds $\le 2\sqrt{n}$ planes to $S$
- and also must add one $B(H)$ hyperplane
- each iteration successful with prob. 1/2.
- deduce expect $2d$ iterations till have all of $B(H)$ so done.
- overall $S$ has $\le 2d\sqrt{n}$ violators plus $d\sqrt{n}$ sample: total $\le 3d\sqrt{n}$
- $O(d n)$ per phase needed to check violating constraints: $O(d^2 n)$ total
- recursion size at most $3d\sqrt{n}$
- so about $\log\log n$ levels of recursion each expanding by $d$
- so about $d^{{\log\log n}}=(\log n)^{\log d}$. more precisely \[T(n)\le 2dT(3d\sqrt{n})+O(d^2 n)=O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}\] (Note valid use of linearity of expectation)
Must prove claim, that mean $V \le \sqrt{n}$.
- Lemma:
- suppose $|H-S|=m$.
- sample $R$ of size $r$ from $H-S$
- then expected violators $d(m-r-1)/(r-d)$
- Compare to MST: says roughly, if pick constraints with prob. $p$, see $d/p$ violations!
- book broken: only works for empty $S$
- Let $C_H$ be set of optima of subsets $T \cup S$, $T \subseteq H$
- note $O(R \cup S)$ is only point violating no constraints of $R$
- Let $v_x$ be number of constraints in $H$ violated by $x \in C_H$,
- Let $i_x$ indicate $x=OPT(R \cup S)$ $$ \begin{eqnarray*} E[|V|] &= &E[\sum v_x i_x]\\ &= & \sum v_x \Pr[i_x] \end{eqnarray*} $$
- decide $\Pr[i_x]$
- $\binom{m }{ r}$ equally likely subsets.
- how many have optimum $x$?
- let $q_x$ be number of planes defining $x$ not already in $S$
- must choose $q_x$ planes to define $x$
- all other choices must avoid planes violating $x$.
- prob. $$ \begin{eqnarray*} \binom{m-v_x-q_x }{ r-q_x} / \binom{m }{ r} &= & \frac{(m-v_x-q_x)-(r-q_x)+1}{r-q_x} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \\ &\le & \frac{(m-r+1)}{r-d} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \end{eqnarray*} $$
- deduce \[ E[V] \le \frac{m-r+1}{r-d}\sum v_x \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \]
- summand is prob that $x$ is a point that violates exactly one
constraint in $R$.
- must pick $q_x$ constraints defining $x$
- must pick $r-q_x-1$ constraints from $m-v_x-q_x$ nonviolators
- must pick one of $v_x$ violators
- therefore, sum is expected number of points that violate exactly one constraint in $R$.
- but this is only $d$ (one for each constraint in basis of $R$)
- nicer proof later.
Result:
- saw sampling LP that ran in time $O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}$
- key idea: if pick $r$ random hyperplanes and solve, expect only $dm/r$ violating hyperplanes.
Iterative Reweighting
Get rid of recursion and highest order term.
Idea:
- Multiplicative weights
- upweight important stuff, downweight unimportant
- even if sometimes wrong, important stuff grows faster
- Boosting in machine learning
- combination of experts in 6.854
idea: be “softer” regarding mistakes
- plane in $V$ gives “evidence” it's in $B(H)$
- Algorithm:
- give each plane weight one
- pick $9d^2$ planes with prob. proportional to weights
- find optimum of $R$
- find violators of $R$
- if \[ \sum_{h \in V} w_h \le (2\sum_{h \in H} w_h)/(9d-1) \] then double violator weights
- repeat till no violators
- Analysis
- Iter successful with prob. 1/2
- Based on (homework) generalizing lemma: if choose $r$ with weighted proportions for $W$, then expected violating weight is $Wd/r$.
- show weight of basis grows till rest is negligible.
- claim $O(d\log n)$ iterations suffice.
- deduce runtime $O(d^2 n \log n)+d^{d/2+O(1)}\log n$.
- proof of claim:
- after each iter, double weight of some basis element
- after $kd$ iterations, basis weight at least $d2^{k}$
- total weight increase at most $(1+2/(9d-1))^{kd} \le n\exp(2kd/(9d-1))$
- after $d\log n$ iterations, done.
- Iter successful with prob. 1/2
- so runtime $O(d^2n\log n)+(d\log n)\dot d^{O(d)}$
Mixed algorithm:
- Can improve to linear in $n$
- use recursive algorithm at top level
- but solve each recursive call by iterative reweighting algorithm
- runtime $O(d^2n)+(d^2\log n)\cdot d^{O(d)} + O(d^4\sqrt{n}\log n)$
- so truly linear in $n$
- one bottleneck is simplex which gives $d^{O(d)}$ terms
- improve with subexponential algorithms to $O(d^2 n)+e^{O(\sqrt{d\log d})}\log n$.
Backwards Analysis
Seidel's Randomized incremental algorithm \[ T(n) \le T(n-1,d)+\frac{d}{n}(O(dn)+T(n-1,d-1)) = O(d!n) \]
Also simplifies proof of main sampling theorem:
- On choosing $r$ random hyperplanes $R$, want expected number of violators $v$
- instead, consider random $h \in H-R$, ask prob. of violation
- which is just $E[v]/(n-r)$
- for prob. $h$ is violated, use backwards analysis
- instead of picking $R$ then $h$, pick $R \cup \{h\}$ then pick $h$
- prob. $h$ violated is just prob. it is one of opt in $R \cup \{h\}$
- which is $d/(r+1)$
- so $E[v]/(n-r)=d/(r+1)$ so $E[v] = (n-r)d/(r+1)$
Hidden Dimension
Idea:
- when recurse, don't just start from the beginning
- When last hyperplane $h$ violates recursive solution,
- Use opt of recursion as starting point for finding opt including $h$
Intuition
- look more closely at Seidel's recursion
- Seidel's “advantage” was that when omitted hyperplan $H$ was violator, we knew it was in opt
- so reduce to finding $d-1$ more constraints of opt
- in fact this is pessimistic!
- optimum basis $h_1,\ldots,h_d$
- order such that $OPT(H-h_1)< OPT(H-h_2)< \cdots< OPT(H-h_d)$
- note none can be equal since each removal yields different optimal basis
- suppose discarded $h_j$ for recursion
- so found $OPT(H-h_j)$
- claim $h_1 \in B(H-h_j)$
- suppose not
- then $OPT(H-h_1 - h_j) = OPT(H-h_j)$
- but $OPT(H-h_1) \ge OPT(H-h_1-h_j)$
- which combines to imply $OPT(H-h_1) \ge OPT(H-h_j)$
- contradicts $h_i$ order
- conclusion: $OPT(H-h_j)$ already contains $h_1,\ldots h_{j-1}$
- so recursion found $j$ optimal constraints, not $1$ as Seidel thought
- in expectation, we find $d/2$.
- How can we use this?
- use optimum basis found so far as “hint” for OPT
- it contains many hyperplanes of OPT
- use recursion to add more hyperplanes
- show adding many at once
- so not too much recursion
Algorithm BasisLP
- Input constraints $H$ and kickoff basis $T$
- $T$ is a “hint” that affects runtime but not value
- If $H=T$, output (obvious OPT)
- Pick a random $h \in H-T$
- $T'\leftarrow$BasisLP$(H-h,T)$ (Seidel's recursion)
- if $h$ doesn't violate $T'$, done
- else, output BasisLP$(H,OPT(T'\cup h))$. (change Seidel's recursion)
- note finding $OPT(T' \cup h)$ trivial since only $d+1$ constraints
Intuition:
- $T$ contains many of the optimum hyperplanes,
- so make sure what you discard is not in $T$
- through recursive calls, $T$ keeps getting tighter (larger OPT)
- so closer to OPT
- so has more of its hyperplanes
- Seidel showed you collected one when a violation happens
- strengthen: prove it collects many (in expectation) so get them all quickly.
Enforcing constraints
- Given basis $T \subseteq H$, say constraint $h$ is enforcing if $OPT(H-h) < OPT(T)$.
- Claim must have $h \in B(H)$
- suppose $h \notin B(H)$
- then $OPT(H-h)=OPT(H) \ge OPT(T)$
- so $h$ not enforcing
- Also, must have $h \in T$
- since otherwise $T \subseteq H-h$
- which would make $OPT(H-h) \ge OPT(T)$.
- So enforcing is in $OPT(H) \cap T$
- So at most $d$ of them.
- And if exactly $d$, then $T=OPT(H)$.
Hidden dimension
- Define hidden dimension of $T$ as $d$ minus number of enforcing
constraints.
- number of “distractions” from optimum
- if 0, found opt.
- Also, when discard random constraint not in $T$, number of $B(H)$ constraints we can discard is only hidden dimension
- Which determines probability we need to recurse
- Claim: if second recursive call happens, hidden dimension drops:
- New kickoff basis is larger value than old
- So all previously enforcing constraints are enforcing
- And left-out constraint is enforcing
- In fact, much better:
- assume did second recursive call
- suppose $m$ (hidden dimension) opt constraints $h_i$ not yet in $T$
- order $h_i$ by value of $OPT(H-h_i)$
- by backwards analysis, equally likely to leave any out
- as above, if leave out $r^{th}$, then all $1,\ldots,r$ are enforcing on second recursive call
- Conclude for hidden dimension $k$, number of violation tests satisfies $$ \begin{eqnarray*} T(n,k) = T(n-1,k) + \frac{1}{n-d}\left(1+\sum_{i=1}^{k} T(n,i)\right) \end{eqnarray*} $$
- Analysis shows $T(n,k) \le 2^k(n-d)$
- so BasisLP runtimes is $O(d^42^dn)$ (multiply by base case time).
Combine with SampLP gives $O(d^2n+b^{\sqrt{d\log d}}\log n)$
- worse $n$ dependence than previous algorithms (nonlinear)
- but what's really cool here is that it is subexponential in $d$
- none of the previous algorithms are
- use as base case in SampLP to get algorithm that is linear in $n$ and subexponential in $d$
Simplex
The algorithms we studied above are not simplex algorithms
- they discard constraints to find an optimum of a subset, then jump there
- simplex has to move neighbor to neighbor
- In each step, move to a neighbor with better objective
- (we assume general position, so no degenerate or equal-objective vertices)
- (even in this case, Klee-Minty cube makes naive simplex take exponential time)
- open: is there a polynomial time simplex?
- this would appeal since simplex is so popular in practice
We don't have polynomial, but Kalai gave a subexponential simplex algorithm
- interestingly, it's just the dual of the BasisLP algorithm
- but was discovered independently (and after)
- has same runtime: $ne^{O(\sqrt{d\log n})}$
- (but it's runtime was proven first)
- Also, Kalai and Kleitman have shown polytope diameter at most $n^{\log d}$
- still open: simplex algorithm with this runtime
Kalai's algorithm
- Facets
- a facet is a “side” of the polytope; an intersection of the polytope with one of the hyperplanes defining it
- so there are at most $n$ facets.
- possibly less for “never feasible” constraints.
- given current vertex $v$, a facet is active if it contains a better vertex
- Algorithm
- Step 1: starting from $v$, find $r$ active facets
- Step 2: choosing randomly among the $r$, use simplex to optimize within that facet
- this gives new $v$; repeat till done
- easy to find $d-1$ active facets
- $v$ has a better neighbor
- the edge to this neighbor is the intersection of $d-1$ constraints
- each of those constraints defines an active facet (containing improvement on $v$)
- so choosing one at random is really just choosing a random tight constraint at $v$ and using its facet
- Analysis
- order facets by value of best vertex on facet
- if you find opt in one of the facets, all worse facets become “inactive”
- so recurse on number of active facets $n$
- $$ \begin{align*} T(n,d) &= T(n-1,d-1) + \frac{1}{d-1}\sum_{i=1}^{d-1} T(n-i,d)\\ &= e^{O(\sqrt{n\log d})} \end{align*} $$
- for better runtime, find $\max(d,n/2)$ facets
- finding $n/2$ takes more work; a kind of BFS over pivots from $v$
- recurrence yields $T(d,n) = e^{O(\sqrt{d\log n})}$
- better for small $d$
- if $d$ and $n$ comparable, get $e^{\sqrt{n}}$
Contrast
- We saw various sampling algorithms linear in $n$ but exponential in $d$
- So for worst case $d=n$ still exponential
- Kalai algorithm is subexponential in all cases
- Also, Kalai's algorithm if simplex. All moves are pivots.
- Unlike sampling algorithm that optimized over subsets (jumps, not pivots)
- Later, was discovered that BasisLP has same runtime
- and in fact, that BasisLP is exactly Kalai's algorithm running in dual space