6.854 Lecture Notes

Linear programming.

define
assumptions:
- nonempty, bounded polyhedron
- minimizing $x_1$
- unique minimum, at a vertex
- exactly $d$ constraints per vertex
definitions:
- hyperplanes $H$
- basis $B(H)$ of hyperplanes that define optimum
- optimum value $O(H)$
Simplex
- exhaustive polytope search:
- walks on vertices
- runs in $O(n^{\lceil d/2 \rceil})$ time in theory
- often great in practice
polytime algorithms exist (ellipsoid)
but bit-dependent (weakly polynomial)!
OPEN: strongly polynomial LP
goal today: polynomial algorithms for small $d$

Random sampling algorithm

Goal: find $B(H)$
Plan: random sample
- solve random subproblem
- keep only violating constraints $V$
- recurse on leftover
problem: violators may not contain all of $B(H)$
but, contain some of $B(H)$
- opt of sample better than opt of whole
- but any point feasible for $B(H)$ no better than $O(H)$
- so current opt not feasible for $B(H)$
- so some $B(H)$ violated
revised plan:
- random sample
- ignore useless planes, add violators to “active set”
- repeat sample on whole problem while keeping active set
- claim: add one $B(H)$ plane per iteration
Algorithm SampLP:
- set $S$ of “active” hyperplanes.
- if $n< 9d^2$ do simplex ($d^{O(d)}$)
- pick $R \subseteq H-S$ of size $d\sqrt{n}$
- $x \leftarrow$ SampLP$(R \cup S)$
- $V\leftarrow$ hyperplanes of $H$ that violate $x$
- if $V \le 2\sqrt{n}$, add to $S$
Runtime analysis:
- mean size of $V$ at most $\sqrt{n}$ (todo)
- each successful recursion adds $\le 2\sqrt{n}$ planes to $S$
- and also must add one $B(H)$ hyperplane
- each iteration successful with prob. 1/2.
- deduce expect $2d$ iterations till have all of $B(H)$ so done.
- overall $S$ has $\le 2d\sqrt{n}$ violators plus $d\sqrt{n}$ sample: total $\le 3d\sqrt{n}$
- $O(d n)$ per phase needed to check violating constraints: $O(d^2 n)$ total
- recursion size at most $3d\sqrt{n}$
- so about $\log\log n$ levels of recursion each expanding by $d$
- so about $d^{{\log\log n}}=(\log n)^{\log d}$. more precisely \[T(n)\le 2dT(3d\sqrt{n})+O(d^2 n)=O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}\] (Note valid use of linearity of expectation)

Must prove claim, that mean $V \le \sqrt{n}$.

Lemma:
- suppose $|H-S|=m$.
- sample $R$ of size $r$ from $H-S$
- then expected violators $d(m-r-1)/(r-d)$
- Compare to MST: says roughly, if pick constraints with prob. $p$, see $d/p$ violations!
book broken: only works for empty $S$
Let $C_H$ be set of optima of subsets $T \cup S$, $T \subseteq H$
Let $C_R$ be set of optima of subsets $T \cup S$, $T \subseteq R$
note $C_R \subseteq C_H$, and $O(R \cup S)$ is only point violating no constraints of $R$
Let $v_x$ be number of constraints in $H$ violated by $x \in C_H$,
Let $i_x$ indicate $x=OPT(R \cup S)$ $$ \begin{eqnarray*} E[|V|] &= &E[\sum v_x i_x]\\ &= & \sum v_x \Pr[i_x] \end{eqnarray*} $$
decide $\Pr[i_x]$
- $\binom{m }{ r}$ equally likely subsets.
- how many have optimum $x$?
- let $q_x$ be number of planes defining $x$ not already in $S$
- must choose $q_x$ planes to define $x$
- all other choices must avoid planes violating $x$.
- prob. $$ \begin{eqnarray*} \binom{m-v_x-q_x }{ r-q_x} / \binom{m }{ r} &= & \frac{(m-v_x-q_x)-(r-q_x)+1}{r-q_x} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \\ &\le & \frac{(m-r+1)}{r-d} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \end{eqnarray*} $$
- deduce \[ E[V] \le \frac{m-r+1}{r-d}\sum v_x \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \]
- summand is prob that $x$ is a point that violates exactly one constraint in $R$.
  - must pick $q_x$ constraints defining $x$
  - must pick $r-q_x-1$ constraints from $m-v_x-q_x$ nonviolators
  - must pick one of $v_x$ violators
- therefore, sum is expected number of points that violate exactly one constraint in $R$.
- but this is only $d$ (one for each constraint in basis of $R$)
nicer proof later.

Result:

saw sampling LP that ran in time $O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}$
key idea: if pick $r$ random hyperplanes and solve, expect only $dm/r$ violating hyperplanes.

Iterative Reweighting

Get rid of recursion and highest order term.

Idea:

Multiplicative weights
upweight important stuff, downweight unimportant
even if sometimes wrong, important stuff grows faster
Boosting in machine learning
combination of experts in 6.854

idea: be “softer” regarding mistakes

plane in $V$ gives “evidence” it's in $B(H)$
Algorithm:
- give each plane weight one
- pick $9d^2$ planes with prob. proportional to weights
- find optimum of $R$
- find violators of $R$
- if \[ \sum_{h \in V} w_h \le (2\sum_{h \in H} w_h)/(9d-1) \] then double violator weights
- repeat till no violators
Analysis
- Iter successful with prob. 1/2
  - Based on (homework) generalizing lemma: if choose $r$ with weighted proportions for $W$, then expected violating weight is $Wd/r$.
- show weight of basis grows till rest is negligible.
- claim $O(d\log n)$ iterations suffice.
- deduce runtime $O(d^2 n \log n)+d^{d/2+O(1)}\log n$.
- proof of claim:
  - after each iter, double weight of some basis element
  - after $kd$ iterations, basis weight at least $d2^{k}$
  - total weight increase at most $(1+2/(9d-1))^{kd} \le n\exp(2kd/(9d-1))$
- after $d\log n$ iterations, done.
so runtime $O(d^2n\log n)+(d\log n)\dot d^{O(d)}$

Mixed algorithm:

Can improve to linear in $n$
use recursive algorithm at top level
but solve each recursive call by iterative reweighting algorithm
runtime $O(d^2n)+(d^2\log n)\cdot d^{O(d)} + O(d^4\sqrt{n}\log n)$
so truly linear in $n$
one bottleneck is simplex which gives $d^{O(d)}$ terms
improve with subexponential algorithms to $O(d^2 n)+e^{O(\sqrt{d\log d})}\log n$.

Backwards Analysis

Seidel's Randomized incremental algorithm \[ T(n) \le T(n-1,d)+\frac{d}{n}(O(dn)+T(n-1,d-1)) = O(d!n) \]

Also simplifies proof of main sampling theorem:

On choosing $r$ random hyperplanes $R$, want expected number of violators $v$
instead, consider random $h \in H-R$, ask prob. of violation
which is just $E[v]/(n-r)$
for prob. $h$ is violated, use backwards analysis
instead of picking $R$ then $h$, pick $R \cup \{h\}$ then pick $h$
prob. $h$ violated is just prob. it is one of opt in $R \cup \{h\}$
which is $d/(r+1)$
so $E[v]/(n-r)=d/(r+1)$ so $E[v] = (n-r)d/(r+1)$

Hidden Dimension

Idea:

when recurse, don't just start from the beginning
When last hyperplane $h$ violates recursive solution,
Use opt of recursion as starting point for finding opt including $h$

Intuition

look more closely at Seidel's recursion
Seidel's “advantage” was that when omitted hyperplan $H$ was violator, we knew it was in opt
so reduce to findinig $d-1$ more constraints of opt
in fact this is pessimistic!
optimum basis $h_1,\ldots,h_d$
order such that $OPT(H-h_1)< OPT(H-h_2)< \cdots< OPT(H-h_d)$
note none can be equal since each removal yields different optimal basis
suppose discarded $h_j$ for recursion
so found $OPT(H-h_j)$
claim $h_1 \in B(H-h_j)$
suppose not
- then $OPT(H-h_1 - h_j) = OPT(H-h_j)$
- but $OPT(H-h_1) \ge OPT(H-h_1-h_j)$
- which combines to imply $OPT(H-h_1) \ge OPT(H-h_j)$
- contradicts $h_i$ order
conclusion: $OPT(H-h_j)$ already contains $h_1,\ldots h_{j-1}$
so recursion found $j$ optimal constraints, not $1$ as Seidel thought
in expectation, we find $d/2$.
How can we use this?
- use optimum basis found so far as “hint” for OPT
- it contains many hyperplanes of OPT
- use recursion to add more hyperplanes
- show adding many at once
- so not too much recursion

Algorithm BasisLP

Input constraints $H$ and kickoff basis $T$
$T$ is a “hint” that affects runtime but not value
If $H=T$, output (obvious OPT)
Pick a random $h \in H-T$
$T'\leftarrow$BasisLP$(H-h,T)$ (Seidel's recursion)
if $h$ doesn't violate $T'$, done
else, output BasisLP$(H,OPT(T'\cup h))$. (change Seidel's recursion)
note finding $OPT(T' \cup h)$ trivial since only $d+1$ constraints

Intuition:

$T$ contains many of the optimum hyperplanes,
so make sure what you discard is not in $T$
through recursive calls, $T$ keeps getting tighter (larger OPT)
so closer to OPT
so has more of its hyperplanes
Seidel showed you collected one when a violation happens
strengthen: prove it collects many (in expectation) so get them all quickly.

Enforcing constraints

Given basis $T \subseteq H$, say constraint $h$ is enforcing if $OPT(H-h) < OPT(T)$.
Claim must have $h \in B(H)$
- suppose $h \notin B(H)$
- then $OPT(H-h)=OPT(H) \ge OPT(T)$
- so $h$ not enforcing
Also, must have $h \in T$
- since otherwise $T \subseteq H-h$
- which would make $OPT(H-h) \ge OPT(T)$.
So enforcing is in $OPT(H) \cap T$
So at most $d$ of them.
And if exactly $d$, then $T=OPT(H)$.

Hidden dimension

Define hidden dimension of $T$ as $d$ minus number of enforcing constraints.
- number of “distractions” from optimum
- if 0, found opt.
- Also, when discard random constraint not in $T$, number of $B(H)$ constraints we can discard is only hidden dimension
- Which determines probability we need to recurse
Claim: if second recursive call happens, hidden dimension drops:
- New kickoff basis is larger value than old
- So all previously enforcing constraints are enforcing
- And left-out constraint is enforcing
In fact, much better:
- assume did second recursive call
- suppose $m$ (hidden dimension) opt constraints $h_i$ not yet in $T$
- order $h_i$ by value of $OPT(H-h_i)$
- by backwards analysis, equally likely to leave any out
- as above, if leave out $r^{th}$, then all $1,\ldots,r$ are enforcing on second recursive call
Conclude for hidden dimension $k$, number of violation tests satisfies $$ \begin{eqnarray*} T(n,k) = T(n-1,k) + \frac{1}{n-d}\left(1+\sum_{i=1}^{k} T(n,i)\right) \end{eqnarray*} $$
Analysis shows $T(n,k) \le 2^k(n-d)$
so BasisLP runtimes is $O(d^42^dn)$ (multiply by base case time).

Combine with SampLP gives $O(d^2n+b^{\sqrt{d\log d}}\log n)$

worse $n$ dependence than previous algorithms (nonlinear)
but what's really cool here is that it is subexponential in $d$
none of the previous algorithms are
use as base case in SampLP to get algorithm that is linear in $n$ and subexponential in $d$

Simplex

The algorithms we studied above are not simplex algorithms

they discard constraints to find an optimum of a subset, then jump there
simplex has to move neighbor to neighbor
In each step, move to a neighbor with better objective
(we assume general position, so no degenerate or equal-objective vertices)
(even in this case, Klee-Minty cube makes naive simplex take exponential time)
open: is there a polynomial time simplex?
this would appeal since simplex is so popular in practice

We don't have polynomial, but Kalai gave a subexponential simplex algorithm

interestingly, it's just the dual of the BasisLP algorithm
but was discovered independently (and after)
has same runtime: $ne^{O(\sqrt{d\log n})}$
(but it's runtime was proven first)
Also, Kalai and Kleitman have shown polytope diameter at most $n^{\log d}$
still open: simplex algorithm with this runtime

Kalai's algorithm

Facets
- a facet is a “side” of the polytope; an intersection of the polytope with one of the hyperplanes defining it
- so there are at most $n$ facets.
- possibly less for “never feasible” constraints.
- given current vertex $v$, a facet is active if it contains a better vertex
Algorithm
- Step 1: starting from $v$, find $r$ active facets
- Step 2: choosing randomly among the $r$, use simplex to optimize within that facet
- this gives new $v$; repeat till done
easy to find $d-1$ active facets
- $v$ has a better neighbor
- the edge to this neighbor is the intersection of $d-1$ constraints
- each of those constraints defines an active facet (containing improvement on $v$)
- so choosing one at random is really just choosing a random tight constraint at $v$ and using its facet
Analysis
- order facets by value of best vertex on facet
- if you find opt in one of the facets, all worse facets become “inactive”
- so recurse on number of active facets $n$
- $$ \begin{align*} T(n,d) &= T(n-1,d-1) + \frac{1}{d-1}\sum_{i=1}^{d-1} T(n-i,d)\\ &= e^{O(\sqrt{n\log d})} \end{align*} $$
for better runtime, find $\max(d,n/2)$ facets
- finding $n/2$ takes more work; a kind of BFS over pivots from $v$
- recurrence yields $T(d,n) = e^{O(\sqrt{d\log n})}$
- better for small $d$
- if $d$ and $n$ comparable, get $e^{\sqrt{n}}$

Contrast

We saw various sampling algorithms linear in $n$ but exponential in $d$
So for worst case $d=n$ still exponential
Kalai algorithm is subexponential in all cases
Also, Kalai's algorithm if simplex. All moves are pivots.
Unlike sampling algorithm that optimized over subsets (jumps, not pivots)
Later, was discovered that BasisLP has same runtime
and in fact, that BasisLP is exactly Kalai's algorithm running in dual space