\documentclass[12pt]{article} \usepackage{wide,me} \parindent0pt

### Linear programming.

• define
• assumptions:
• nonempty, bounded polyhedron
• minimizing $x_1$
• unique minimum, at a vertex
• exactly $d$ constraints per vertex
• definitions:
• hyperplanes $H$
• basis $B(H)$ of hyperplanes that define optimum
• optimum value $O(H)$
• Simplex
• exhaustive polytope search:
• walks on vertices
• runs in $O(n^{\lceil d/2 \rceil})$ time in theory
• often great in practice
• polytime algorithms exist (ellipsoid)
• but bit-dependent (weakly polynomial)!
• OPEN: strongly polynomial LP
• goal today: polynomial algorithms for small $d$

Random sampling algorithm

• Goal: find $B(H)$
• Plan: random sample
• solve random subproblem
• keep only violating constraints $V$
• recurse on leftover
• problem: violators may not contain all of $B(H)$
• but, contain some of $B(H)$
• opt of sample better than opt of whole
• but any point feasible for $B(H)$ no better than $O(H)$
• so current opt not feasible for $B(H)$
• so some $B(H)$ violated
• revised plan:
• random sample
• ignore useless planes, add violators to “active set”
• repeat sample on whole problem while keeping active set
• claim: add one $B(H)$ plane per iteration
• Algorithm SampLP:
• set $S$ of “active” hyperplanes.
• if $n< 9d^2$ do simplex ($d^{O(d)}$)
• pick $R \subseteq H-S$ of size $d\sqrt{n}$
• $x \leftarrow$ SampLP$(R \cup S)$
• $V\leftarrow$ hyperplanes of $H$ that violate $x$
• if $V \le 2\sqrt{n}$, add to $S$
• Runtime analysis:
• mean size of $V$ at most $\sqrt{n}$ (todo)
• each successful recursion adds $\le 2\sqrt{n}$ planes to $S$
• and also must add one $B(H)$ hyperplane
• each iteration successful with prob. 1/2.
• deduce expect $2d$ iterations till have all of $B(H)$ so done.
• overall $S$ has $\le 2d\sqrt{n}$ violators plus $d\sqrt{n}$ sample: total $\le 3d\sqrt{n}$
• $O(d n)$ per phase needed to check violating constraints: $O(d^2 n)$ total
• recursion size at most $3d\sqrt{n}$
• so about $\log\log n$ levels of recursion each expanding by $d$
• so about $d^{{\log\log n}}=(\log n)^{\log d}$. more precisely $T(n)\le 2dT(3d\sqrt{n})+O(d^2 n)=O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}$ (Note valid use of linearity of expectation)

Must prove claim, that mean $V \le \sqrt{n}$.

• Lemma:
• suppose $|H-S|=m$.
• sample $R$ of size $r$ from $H-S$
• then expected violators $d(m-r-1)/(r-d)$
• Compare to MST: says roughly, if pick constraints with prob. $p$, see $d/p$ violations!
• book broken: only works for empty $S$
• Let $C_H$ be set of optima of subsets $T \cup S$, $T \subseteq H$
• Let $C_R$ be set of optima of subsets $T \cup S$, $T \subseteq R$
• note $C_R \subseteq C_H$, and $O(R \cup S)$ is only point violating no constraints of $R$
• Let $v_x$ be number of constraints in $H$ violated by $x \in C_H$,
• Let $i_x$ indicate $x=OPT(R \cup S)$ $$\begin{eqnarray*} E[|V|] &= &E[\sum v_x i_x]\\ &= & \sum v_x \Pr[i_x] \end{eqnarray*}$$
• decide $\Pr[i_x]$
• $\binom{m }{ r}$ equally likely subsets.
• how many have optimum $x$?
• let $q_x$ be number of planes defining $x$ not already in $S$
• must choose $q_x$ planes to define $x$
• all other choices must avoid planes violating $x$.
• prob. $$\begin{eqnarray*} \binom{m-v_x-q_x }{ r-q_x} / \binom{m }{ r} &= & \frac{(m-v_x-q_x)-(r-q_x)+1}{r-q_x} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \\ &\le & \frac{(m-r+1)}{r-d} \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r} \end{eqnarray*}$$
• deduce $E[V] \le \frac{m-r+1}{r-d}\sum v_x \binom{m-v_x-q_x }{ r-q_x-1} / \binom{m }{ r}$
• summand is prob that $x$ is a point that violates exactly one constraint in $R$.
• must pick $q_x$ constraints defining $x$
• must pick $r-q_x-1$ constraints from $m-v_x-q_x$ nonviolators
• must pick one of $v_x$ violators
• therefore, sum is expected number of points that violate exactly one constraint in $R$.
• but this is only $d$ (one for each constraint in basis of $R$)
• nicer proof later.

Result:

• saw sampling LP that ran in time $O(d^2 n \log n) + (\log n/d^2)^{2+\log d}d^{O(d)}$
• key idea: if pick $r$ random hyperplanes and solve, expect only $dm/r$ violating hyperplanes.

### Iterative Reweighting

Get rid of recursion and highest order term.

Idea:

• Multiplicative weights
• upweight important stuff, downweight unimportant
• even if sometimes wrong, important stuff grows faster
• Boosting in machine learning
• combination of experts in 6.854

idea: be “softer” regarding mistakes

• plane in $V$ gives “evidence” it's in $B(H)$
• Algorithm:
• give each plane weight one
• pick $9d^2$ planes with prob. proportional to weights
• find optimum of $R$
• find violators of $R$
• if $\sum_{h \in V} w_h \le (2\sum_{h \in H} w_h)/(9d-1)$ then double violator weights
• repeat till no violators
• Analysis
• Iter successful with prob. 1/2
• Based on (homework) generalizing lemma: if choose $r$ with weighted proportions for $W$, then expected violating weight is $Wd/r$.
• show weight of basis grows till rest is negligible.
• claim $O(d\log n)$ iterations suffice.
• deduce runtime $O(d^2 n \log n)+d^{d/2+O(1)}\log n$.
• proof of claim:
• after each iter, double weight of some basis element
• after $kd$ iterations, basis weight at least $d2^{k}$
• total weight increase at most $(1+2/(9d-1))^{kd} \le n\exp(2kd/(9d-1))$
• after $d\log n$ iterations, done.
• so runtime $O(d^2n\log n)+(d\log n)\dot d^{O(d)}$

Mixed algorithm:

• Can improve to linear in $n$
• use recursive algorithm at top level
• but solve each recursive call by iterative reweighting algorithm
• runtime $O(d^2n)+(d^2\log n)\cdot d^{O(d)} + O(d^4\sqrt{n}\log n)$
• so truly linear in $n$
• one bottleneck is simplex which gives $d^{O(d)}$ terms
• improve with subexponential algorithms to $O(d^2 n)+e^{O(\sqrt{d\log d})}\log n$.

### Backwards Analysis

Seidel's Randomized incremental algorithm $T(n) \le T(n-1,d)+\frac{d}{n}(O(dn)+T(n-1,d-1)) = O(d!n)$

Also simplifies proof of main sampling theorem:

• On choosing $r$ random hyperplanes $R$, want expected number of violators $v$
• instead, consider random $h \in H-R$, ask prob. of violation
• which is just $E[v]/(n-r)$
• for prob. $h$ is violated, use backwards analysis
• instead of picking $R$ then $h$, pick $R \cup \{h\}$ then pick $h$
• prob. $h$ violated is just prob. it is one of opt in $R \cup \{h\}$
• which is $d/(r+1)$
• so $E[v]/(n-r)=d/(r+1)$ so $E[v] = (n-r)d/(r+1)$

### Hidden Dimension

Idea:

• when recurse, don't just start from the beginning
• When last hyperplane $h$ violates recursive solution,
• Use opt of recursion as starting point for finding opt including $h$

Intuition

• look more closely at Seidel's recursion
• Seidel's “advantage” was that when omitted hyperplan $H$ was violator, we knew it was in opt
• so reduce to findinig $d-1$ more constraints of opt
• in fact this is pessimistic!
• optimum basis $h_1,\ldots,h_d$
• order such that $OPT(H-h_1)< OPT(H-h_2)< \cdots< OPT(H-h_d)$
• note none can be equal since each removal yields different optimal basis
• suppose discarded $h_j$ for recursion
• so found $OPT(H-h_j)$
• claim $h_1 \in B(H-h_j)$
• suppose not
• then $OPT(H-h_1 - h_j) = OPT(H-h_j)$
• but $OPT(H-h_1) \ge OPT(H-h_1-h_j)$
• which combines to imply $OPT(H-h_1) \ge OPT(H-h_j)$
• contradicts $h_i$ order
• conclusion: $OPT(H-h_j)$ already contains $h_1,\ldots h_{j-1}$
• so recursion found $j$ optimal constraints, not $1$ as Seidel thought
• in expectation, we find $d/2$.
• How can we use this?
• use optimum basis found so far as “hint” for OPT
• it contains many hyperplanes of OPT
• use recursion to add more hyperplanes
• show adding many at once
• so not too much recursion

Algorithm BasisLP

• Input constraints $H$ and kickoff basis $T$
• $T$ is a “hint” that affects runtime but not value
• If $H=T$, output (obvious OPT)
• Pick a random $h \in H-T$
• $T'\leftarrow$BasisLP$(H-h,T)$ (Seidel's recursion)
• if $h$ doesn't violate $T'$, done
• else, output BasisLP$(H,OPT(T'\cup h))$. (change Seidel's recursion)
• note finding $OPT(T' \cup h)$ trivial since only $d+1$ constraints

Intuition:

• $T$ contains many of the optimum hyperplanes,
• so make sure what you discard is not in $T$
• through recursive calls, $T$ keeps getting tighter (larger OPT)
• so closer to OPT
• so has more of its hyperplanes
• Seidel showed you collected one when a violation happens
• strengthen: prove it collects many (in expectation) so get them all quickly.

Enforcing constraints

• Given basis $T \subseteq H$, say constraint $h$ is enforcing if $OPT(H-h) < OPT(T)$.
• Claim must have $h \in B(H)$
• suppose $h \notin B(H)$
• then $OPT(H-h)=OPT(H) \ge OPT(T)$
• so $h$ not enforcing
• Also, must have $h \in T$
• since otherwise $T \subseteq H-h$
• which would make $OPT(H-h) \ge OPT(T)$.
• So enforcing is in $OPT(H) \cap T$
• So at most $d$ of them.
• And if exactly $d$, then $T=OPT(H)$.

Hidden dimension

• Define hidden dimension of $T$ as $d$ minus number of enforcing constraints.
• number of “distractions” from optimum
• if 0, found opt.
• Also, when discard random constraint not in $T$, number of $B(H)$ constraints we can discard is only hidden dimension
• Which determines probability we need to recurse
• Claim: if second recursive call happens, hidden dimension drops:
• New kickoff basis is larger value than old
• So all previously enforcing constraints are enforcing
• And left-out constraint is enforcing
• In fact, much better:
• assume did second recursive call
• suppose $m$ (hidden dimension) opt constraints $h_i$ not yet in $T$
• order $h_i$ by value of $OPT(H-h_i)$
• by backwards analysis, equally likely to leave any out
• as above, if leave out $r^{th}$, then all $1,\ldots,r$ are enforcing on second recursive call
• Conclude for hidden dimension $k$, number of violation tests satisfies $$\begin{eqnarray*} T(n,k) = T(n-1,k) + \frac{1}{n-d}\left(1+\sum_{i=1}^{k} T(n,i)\right) \end{eqnarray*}$$
• Analysis shows $T(n,k) \le 2^k(n-d)$
• so BasisLP runtimes is $O(d^42^dn)$ (multiply by base case time).

Combine with SampLP gives $O(d^2n+b^{\sqrt{d\log d}}\log n)$

• worse $n$ dependence than previous algorithms (nonlinear)
• but what's really cool here is that it is subexponential in $d$
• none of the previous algorithms are
• use as base case in SampLP to get algorithm that is linear in $n$ and subexponential in $d$

### Simplex

The algorithms we studied above are not simplex algorithms

• they discard constraints to find an optimum of a subset, then jump there
• simplex has to move neighbor to neighbor
• In each step, move to a neighbor with better objective
• (we assume general position, so no degenerate or equal-objective vertices)
• (even in this case, Klee-Minty cube makes naive simplex take exponential time)
• open: is there a polynomial time simplex?
• this would appeal since simplex is so popular in practice

We don't have polynomial, but Kalai gave a subexponential simplex algorithm

• interestingly, it's just the dual of the BasisLP algorithm
• but was discovered independently (and after)
• has same runtime: $ne^{O(\sqrt{d\log n})}$
• (but it's runtime was proven first)
• Also, Kalai and Kleitman have shown polytope diameter at most $n^{\log d}$
• still open: simplex algorithm with this runtime

Kalai's algorithm

• Facets
• a facet is a “side” of the polytope; an intersection of the polytope with one of the hyperplanes defining it
• so there are at most $n$ facets.
• possibly less for “never feasible” constraints.
• given current vertex $v$, a facet is active if it contains a better vertex
• Algorithm
• Step 1: starting from $v$, find $r$ active facets
• Step 2: choosing randomly among the $r$, use simplex to optimize within that facet
• this gives new $v$; repeat till done
• easy to find $d-1$ active facets
• $v$ has a better neighbor
• the edge to this neighbor is the intersection of $d-1$ constraints
• each of those constraints defines an active facet (containing improvement on $v$)
• so choosing one at random is really just choosing a random tight constraint at $v$ and using its facet
• Analysis
• order facets by value of best vertex on facet
• if you find opt in one of the facets, all worse facets become “inactive”
• so recurse on number of active facets $n$
• \begin{align*} T(n,d) &= T(n-1,d-1) + \frac{1}{d-1}\sum_{i=1}^{d-1} T(n-i,d)\\ &= e^{O(\sqrt{n\log d})} \end{align*}
• for better runtime, find $\max(d,n/2)$ facets
• finding $n/2$ takes more work; a kind of BFS over pivots from $v$
• recurrence yields $T(d,n) = e^{O(\sqrt{d\log n})}$
• better for small $d$
• if $d$ and $n$ comparable, get $e^{\sqrt{n}}$

Contrast

• We saw various sampling algorithms linear in $n$ but exponential in $d$
• So for worst case $d=n$ still exponential
• Kalai algorithm is subexponential in all cases
• Also, Kalai's algorithm if simplex. All moves are pivots.
• Unlike sampling algorithm that optimized over subsets (jumps, not pivots)
• Later, was discovered that BasisLP has same runtime
• and in fact, that BasisLP is exactly Kalai's algorithm running in dual space