Max Cut by Semidefinite Programming
Quadratic formulation
- No obvious linear
- How about quadratic? $$ \begin{eqnarray*} \min \sum_{ij} \frac{1-x_ix_j}{2}\\ x_i \in \pm 1 \end{eqnarray*} $$
- Relax integrality? Doesn't matter: $$ \begin{eqnarray*} \min \sum_{ij \in E} \frac{1-x_ix_j}{2}\\ x_i^2 &= &1 \end{eqnarray*} $$
- Unfortunately, proves quadratic programming is NP-complete, even nonintegral.
Vector relaxation:
- Relax $x_i$ to a vector $v_i$
- Product becomes dot product $$ \begin{eqnarray*} \min \sum_{ij \in E} \frac{1-v_i \cdot v_j}{2}\\ \|v_i\| &= &1 \end{eqnarray*} $$
- This is a relaxation, so opt is better.
- Note: no constraints possible on individual coordinates
- Remarkable fact: solving this is tractable!
- Consider matrix of vectors $V$
- Matrix of dot products is $D=V^2$
- In other words, $D$ is positive semidefinite (all eigenvalues nonnegative)
- Conversely, every positive semidefinite matrix $D$ has a “square root” $V$ with $V^2=D$.
- What does positive semidefinite mean? That $(\forall y) yDy^T \ge 0$
- Can write this down as a linear constraint on $D$! $\sum y_iy_j d_{ij} \ge 0$
- Infinitely many linear constraints, but can optimize anyway by
linear programming techniques.
- For example, finding a negative eigenvector serves as a separation oracle
- best current (interior point) algorithms have $O(n^{3.5}\log 1/\epsilon)$
- 1000s of variables can be solved in practice
- potential problem: irrationals. but we only care about approximating anyway
- Then square root can be computed by “Cholesky Factorization”
- One issue: results in $n$-dimensional vectors, where $n$ is number of constraints. We want boolean values!
- Requiring dimension of vectors to be small makes it NP-hard again.
Embeddings
- We have put graph in high dimensional space
- Such the $1-v_i \cdot v_j$ is “large” for edge $ij$
- i.e., $v_i \cdot v_j$ is small
- i.e., $i$ and $j$ are “far apart”
- Draw
Rounding:
- We need to return to $\pm 1$.
- i.e., need a rounding scheme.
- embedding gives large objective to far-apart points
- to keep that value, need to cut far apart items
- Given points in the plane, how separate into two groups such that far apart points get separated?
- Use a random hyperplane
- How generate? Random $n$-dimensional Gaussian vector
- What is odds $i$ and $j$ are separated?
- Gaussian is “spherically symmetric”: $\Pr[x_1,\ldots,x_n] \propto e^{-\sum x_i^2}$ depends only on radius
- To generate 2 Gaussians, pick a random $\theta$ and exponential squared-radius in polar coordinates, then look at $x$ and $y$
- change coordinates so $i$ and $j$ are in 2-dimensional basis plane
- values of Gaussian in other direction irrelevant
- all that matters is Gaussian within the plane
- i.e., look at the random line the gausian generates, check if separates $v_i$ and $v_j$
- odds depend on angle between
- in particular, probability of separation is $\arccos(v_i \cdot v_j)/\pi$
- So, expected cut value is $\sum \arccos(v_i \cdot v_j)/\pi$
- Compare to optimum:
- Our objective function is $\sum \frac{1-v_i\cdot v_j}{2}$
- which is better than opt
- so our approximation is better than worst case ratio
- What is worst case ratio? \[ \min \frac{\arccos(x)/\pi}{(1-x)/2} \ge .878 \]
- In other words, .878 approx to opt.
Summary
- “Semidefinite programming” lets you relax quadratic programs to vector programs
- Gives .878 approximation to max cut
- Can derandomize using conditional expectations
- in practice, solving SDPs is slow and hard, but can be done
- Can you approx better?
- This rounding scheme is tight
- But maybe there is a better SDP?
- 5-cycle has integrality gap $.884\ldots$
- other worse examples converge to .878
- Khot: can't beat $.878$ in $P$ assuming Unique Games Conjecture
- Hastad proved cannot beat 16/17 unless P=NP
2 hours from here to end
Graph Coloring
3-coloring
- define
- NP-complete
- no good approximation
- wigderson: $\sqrt{n}$
- explain how: greedy plus large independent sets
Relax:
- $k$-vector coloring: ensure dot products $-1/(k-1)$
- prove for 3-coloring by embedding in plane
- write as semidefinite program
Rounding?
- Pull out a large independent set, repeat
- how pull independent set? random hyperplane, shifted
- gives independent set of size $n/\Delta^{1/3}$
- so gives $\Delta^{1/3}$ by repeated removal
- so get $n^{1/3}$
- Improve with wigderson:
- If $\Delta > n^{3/4}$, remove a neighbor set
- Consumes at most $n^{1/4}\log n$ colors
- when $\Delta < n^{3/4}$, use semidefinite
- Result: $O(n^{1/4}\log n)$ colors.
Finding the independent set:
- Pick a random $r$, and take all vertices $v$ with $r\cdot v \ge c$
Min Ratio Cut
Lead-In: Max-Flow Min-Cut
Probably cover after introducing ratio cut.
Find $s$-$t$ mincut
- LP-relaxation
- $z_v$ indicator of being on $t$-side of cut
- $y_{vw}$ indicator of edge being cut
- constraint: if edge not cut, endpoints must have same indicator $$ \begin{eqnarray*} D&=&\min \sum_{vw} y_{vw} u_{vw}\\ y_{vw} &\ge &0\\ z_w &\le z_v + y_{vw}\\ z_t &= &1\\ z_s &= 0 \end{eqnarray*} $$
- Relax?
- Think of $y_{vw}$ as “lengths”
- Main constraint is then a triangle inequality
- Which means $z_v$ are distances from $s$
- Randomized rounding:
- pick a random distance in $(0,1)$
- Expected number of cut edges is equal to objective
- i.e. round with approx ratio 1!
- so in fact, any distance must work.
- never go below 1, so cannot go above it either.
Motivation
Common task: computation/simulation over a mesh.
- vertices have state
- state influences neighbors
- whole system evolves over time
- examples: weather simulation, wind tunnel, differential equations
Suppose you want to parallelize
- Update state by checking neighbors
- Each node handled by some processor
- if neighbors on other processors, communication required
- Goal: good partition of nodes to processors
- Require: same number of neighbors per processor (balanced work)
- Require: small number of neighbor links across processors (low communication)
Graph bisection
- Special case for 2 processors
- Balanced cut---half vertices on each
- Minimize number of crossing edges.
$b$-balanced separator
- Loosen up a bit
- Require only $\ge bn$ vertices per side, for some $b\le 1/2$
Divide and conquer
- Given hard graph problem, find separator/bisection
- Solve each half
- patch clumsily at cut
- most of problem in pieces
- so clumsy patch doesn't add much cost
- But patch paid at each level of divide and conquer---adds up
- So divide and conquer must finish fast
- So needs to be balanced
- Example: min linear arrangement (VLSI)
- line up nodes to minimize edges crossing worst perpendicular cut
- Given opt solution, half-way down is a balanced cut
- so min balanced cut no worse
- find, lay out two halves on left and right side of line recursively
- worst cut on LHS combines edges entirely in LHS with edges crossing to RHS
- So, worst cut is less than worst(LHS,RHS)+bisection
- So is $O(\log n \cdot $bisection$)=O(\log n \cdot )$linear arrangement$)$.
Ratio Cut
Definition:
- partition $(S,S')$ minimizing $$ \begin{eqnarray*} \frac{|(S \times S') \cap E|}{|S|\cdot|S'|} \end{eqnarray*} $$
- Also known as “sparsest cut”.
- since larger size at least $n/2$, this is $\Theta(\mbox{crossing}/n\cdot \mbox{smaller side vertices})$
- i.e. proportional to average crossing degree of smaller side
Use to find separator
- suppose bisection of value $b$
- can find $(1/3,2/3)$ separator of value $O(b\log n)$
- to do so, approx sparsest cut
- Remove smaller side
- Repeat till larger side smaller than $2/3 n$
why good?
- consider what remains of opt bisection on large side
- no more edges
- each side still has $n/6$ vertices
- so good ratio
- so always remove good ratio
- so total removed has good ratio
- generalize $s$-$t$ min-cut
- indicator variable $y_{vw}$ says edge $vw$ cut
- indicator variable $z_{vw}$ says $v$, $w$ separated by cut
- “triangle inequalities” constraints $z_{vw}$ based on $y_{vw}$
- want to minimize ratio $\sum y_{vw}/\sum z_{vw}$
Relaxation:
- instead of minimizing ratio, minimize numerator while requiring denominator exceed $1$ (equivalent by scaling).
- Assign lengths to edges such that
- Sum of pairwise distances is 1 (or more---doesn't hurt)
- Sum of edge lengths is minimized
- Why is this a relaxation?
- Consider any $S$
- Assign length $1/|S|\cdot|S'|$ to each edge in cut
- Sum of pairwise distance is that times number of cut pairs, i.e. 1
- Sum of lengths is value of that ratio cut
- So LP opt below min ratio cut
- Write as LP
- edge lengths $d_{ij}$
- distances $d_{ij} \ge 0$
- triangle constraints $d_{ik} \le d_{ij}+d_{jk}$
- constrain $\sum d_{ij} \ge 1$ (over all pairs)
- minimize “volume” (over existing edges) $\sum_{ij \in E}y_{ij}$
Embeddings
Goal
- Above relaxation gives turns graph into a metric space.
- A cut is also a metric, but a special kind
- We want to “round” arbitrary metric to cut metric
How do we measure error in rounding?
- An embedding $\phi$ from $(X,d)$ to $(X',d')$
- is a contraction if $d'(\phi(u),\phi(v)) \le d(u,v)$
- has distortion $c=\max(d(u,v)/d'(\phi(u),\phi(v)))$
- is an isometry if distortion 1, i.e. $d'(\phi(u),\phi(v)) = d(u,v)$
- Standard Metrics over $\Re^n$
- $\ell_1(x,y) = \sum |x_i-y_i|$
- $\ell_2(x,y) = \sqrt{\sum (x_i-y_i)^2}$
- $\ell_\infty(x,y) = \max |x_i-y_i|$
- $\ell_p(x,y) = ({\sum (x_i-y_i)^p})^{1/p}$
Every metric (isometric) embeddable in $\ell_\infty$
- Just make one coordinate for each point in $X$
- Set $\phi(u)_x = d(u,x)$
- Then $\phi(u)_x-\phi(v)_x = d(u,x)-d(v,x) \le d(u,v)$ by triangle inequalty
- But $\phi(v)_u = d(u,v)$
- So max is $d(u,v)$
Isometric Embedding in $\ell_2$ tractable
- suppose embeddings into $v_1,v_2,\ldots$
- wlog map $v_1=0$
- $d^2_{ij} = (v_i-v_j)^2 = v_i^2+v_j^2-2v_iv_j$
- But $v_i^2 = (v_i-v_1)^2 = d_{i0}^2$
- So $v_i\cdot v_j = \frac12(d_{1j}^2+d_{1i}^2-d^2_{ij})$
- Write matrix of dot products
- take Cholesky factorization
- Similarly, can find minimum distortion by optimizing SDP.
We are actually going to round via embedding in $\ell_1$:
- Suppose have $\ell_1$ embedding with ratio $\beta$ between “edge volume” (sum of edge lengths) and “all-pairs volume” (sum of all-pairs distances)
- i.e. such that edge-volume $\le \beta \cdot$ total-volume (note $\beta< 1$)
- Claim can round to cut with same ratio
- For coordinate $j$ define $\delta_j^{u,v}(t) = 1$ iff $t$ between $u_j$ and $v_j$, and 0 otherwise.
- Each $j,t$ defines a cut (exactly like $s$-$t$ cut case)
- Integrating over $t$ gives $|u_j-v_j|$
- So summing over dimensions $j$ gives $\ell_1$ metric sum (for edge or total volume)
- know edge-volume $\le \beta \cdot$ total-volume
- So must be some specific $t,j$ where this ratio achieved
- i.e $\int f(x) \le \beta \int g(x)$
- so know $f(x) \le \beta g(x)$ somewhere
- This is a cut achieving the ratio.
- Actually, don't need to integrate, since $\delta$ is constant except at $n$ places ($j$ coords of points)
- so if our LP gave us an $\ell_1$ metric we'd be done
- unfortunately it gives us an arbitrary metric.
Embedding
- If our metric were $\ell_1$, done
- So, try embedding
- Bourgain: can embed any metric into $\log^2n$ dimensions (unimportant) with $O(\log n)$ distortion
- Conclude $O(\log n)$ approx for ratio cut
Algorithm
- Define embedding
- For each power of $2$, pick $L=O(\log n)$ random subsets of size $n/2^t$
- Gives collection $A_i$ of $L\log n$ sets
- Set $\phi(u)_i = d(u,A)$
- then divide by $L\log n$ (scaling)
- this is like $\ell_{\infty}$ embedding, but distance to sets instead of points
- This is a contraction:
- contraction for any one $A$: if $s \in A$ determines $d(u,A)$, and $t\in A$ determines $d(v,A)$, then $$ \begin{eqnarray*} d(u,s) &\le & d(u,t)\\ &\le &d(u,v)+d(v,t)\\ d(u,s)-d(v,t) &\le &d(u,v)\\ d(u,A)-d(v,A) &\le &d(u,v) \end{eqnarray*} $$
- so each coordinate is at most $d(u,v)$
- dividing by $L\log n$ coordinates scales to a contraction.
- Distortion:
- we need to prove it doesn't contract too much
- let $\rho_t$ be minimum radius such that both $B(u,\rho_t)$ and $B(v,\rho_t)$ have $2^t$ points in them.
- if $\rho_t < d_{uv}/3$ then these radii define disjoint balls
- wlog, $B(u,\rho_t)$ has exactly $2^t$ points;
- meanwhile, $B(v,\rho_{t-1})$ has at least $2^{t-1}$ points
- then with constant probability, a sample of size $n/2^t$ has points in $B(v,\rho_{t-1})$ but not $B(u,\rho_t)$
- in other words, distance from $u$ is less than $\rho_t$ but from $v$ is more than $\rho_{t-1}$
- so contributes at least $\rho_t-\rho_{t-1}$ to difference in $\phi_i$ coordinate
- we pick $L$ samples, so chernoff bound says we are near expectation whp
- true for all $t$ until $\rho_t \ge d(u,v)/3$
- so, summing over all $i$, difference in coords is $\Omega(L\sum \rho_t-\rho_{t-1}) = \Omega(L d_{uv})$
- we then scale by $1/L\log n$
- so distance at least $d(u,v)/\log n$
- Note: $L$ didn't matter for distortion, just dimension
- But we do achieve $O(\log^2 n)$-dimensional embedding
Gap is tight:
- Take constant degree expander
- Set edge lengths $1/\log n$
- Most vertices $\Omega(\log n)$ distant
- So sum of distances is $\Omega(n^2)$
- While sum of edge lengths is only $O(n\log n)$
- So ratio is $O((\log n)/n)$
- But any cut with $k$ vertices on small side has $\Omega(k)$ edges leaving
- So best cut ratio is $\Omega(k/kn)=\Omega(1/n)$
- So $\log n$ gap in our rounding is tight.
Extensions
- Generalization to multicommodity flow/cut
- $\Theta(\log k)$ gap for $k$ commodities
- Arora Rao Vazirani use a more sophisticated SDP approach to approximate ration cut to $O(\sqrt{\log n})$
- Other powerful metric embedding methods for near neighbor search, tree metrics for online algorithms