6.856 Lecture Notes

Max Cut by Semidefinite Programming

Quadratic formulation

No obvious linear
How about quadratic? $$ \begin{eqnarray*} \min \sum_{ij} \frac{1-x_ix_j}{2}\\ x_i \in \pm 1 \end{eqnarray*} $$
Relax integrality? Doesn't matter: $$ \begin{eqnarray*} \min \sum_{ij \in E} \frac{1-x_ix_j}{2}\\ x_i^2 &= &1 \end{eqnarray*} $$
Unfortunately, proves quadratic programming is NP-complete, even nonintegral.

Vector relaxation:

Relax $x_i$ to a vector $v_i$
Product becomes dot product $$ \begin{eqnarray*} \min \sum_{ij \in E} \frac{1-v_i \cdot v_j}{2}\\ \|v_i\| &= &1 \end{eqnarray*} $$
This is a relaxation, so opt is better.
Note: no constraints possible on individual coordinates
Remarkable fact: solving this is tractable!
- Consider matrix of vectors $V$
- Matrix of dot products is $D=V^2$
- In other words, $D$ is positive semidefinite (all eigenvalues nonnegative)
- Conversely, every positive semidefinite matrix $D$ has a “square root” $V$ with $V^2=D$.
- What does positive semidefinite mean? That $(\forall y) yDy^T \ge 0$
- Can write this down as a linear constraint on $D$! $\sum y_iy_j d_{ij} \ge 0$
- Infinitely many linear constraints, but can optimize anyway by linear programming techniques.
  - For example, finding a negative eigenvector serves as a separation oracle
  - best current (interior point) algorithms have $O(n^{3.5}\log 1/\epsilon)$
  - 1000s of variables can be solved in practice
  - potential problem: irrationals. but we only care about approximating anyway
- Then square root can be computed by “Cholesky Factorization”
One issue: results in $n$-dimensional vectors, where $n$ is number of constraints. We want boolean values!
Requiring dimension of vectors to be small makes it NP-hard again.

Embeddings

We have put graph in high dimensional space
Such the $1-v_i \cdot v_j$ is “large” for edge $ij$
i.e., $v_i \cdot v_j$ is small
i.e., $i$ and $j$ are “far apart”
Draw

Rounding:

We need to return to $\pm 1$.
i.e., need a rounding scheme.
embedding gives large objective to far-apart points
to keep that value, need to cut far apart items
Given points in the plane, how separate into two groups such that far apart points get separated?
Use a random hyperplane
- How generate? Random $n$-dimensional Gaussian vector
What is odds $i$ and $j$ are separated?
- Gaussian is “spherically symmetric”: $\Pr[x_1,\ldots,x_n] \propto e^{-\sum x_i^2}$ depends only on radius
- To generate 2 Gaussians, pick a random $\theta$ and exponential squared-radius in polar coordinates, then look at $x$ and $y$
- change coordinates so $i$ and $j$ are in 2-dimensional basis plane
- values of Gaussian in other direction irrelevant
- all that matters is Gaussian within the plane
- i.e., look at the random line the gausian generates, check if separates $v_i$ and $v_j$
- odds depend on angle between
- in particular, probability of separation is $\arccos(v_i \cdot v_j)/\pi$
- So, expected cut value is $\sum \arccos(v_i \cdot v_j)/\pi$
Compare to optimum:
- Our objective function is $\sum \frac{1-v_i\cdot v_j}{2}$
- which is better than opt
- so our approximation is better than worst case ratio
- What is worst case ratio? \[ \min \frac{\arccos(x)/\pi}{(1-x)/2} \ge .878 \]
In other words, .878 approx to opt.

Summary

“Semidefinite programming” lets you relax quadratic programs to vector programs
Gives .878 approximation to max cut
Can derandomize using conditional expectations
in practice, solving SDPs is slow and hard, but can be done
Can you approx better?
- This rounding scheme is tight
- But maybe there is a better SDP?
- 5-cycle has integrality gap $.884\ldots$
- other worse examples converge to .878
- Khot: can't beat $.878$ in $P$ assuming Unique Games Conjecture
- Hastad proved cannot beat 16/17 unless P=NP

2 hours from here to end

Graph Coloring

3-coloring

define
NP-complete
no good approximation
wigderson: $\sqrt{n}$
explain how: greedy plus large independent sets

Relax:

$k$-vector coloring: ensure dot products $-1/(k-1)$
prove for 3-coloring by embedding in plane
write as semidefinite program

Rounding?

Pull out a large independent set, repeat
how pull independent set? random hyperplane, shifted
gives independent set of size $n/\Delta^{1/3}$
so gives $\Delta^{1/3}$ by repeated removal
so get $n^{1/3}$
Improve with wigderson:
- If $\Delta > n^{3/4}$, remove a neighbor set
- Consumes at most $n^{1/4}\log n$ colors
- when $\Delta < n^{3/4}$, use semidefinite
- Result: $O(n^{1/4}\log n)$ colors.

Finding the independent set:

Pick a random $r$, and take all vertices $v$ with $r\cdot v \ge c$

Min Ratio Cut

Lead-In: Max-Flow Min-Cut

Probably cover after introducing ratio cut.

Find $s$-$t$ mincut

LP-relaxation
- $z_v$ indicator of being on $t$-side of cut
- $y_{vw}$ indicator of edge being cut
- constraint: if edge not cut, endpoints must have same indicator $$ \begin{eqnarray*} D&=&\min \sum_{vw} y_{vw} u_{vw}\\ y_{vw} &\ge &0\\ z_w &\le z_v + y_{vw}\\ z_t &= &1\\ z_s &= 0 \end{eqnarray*} $$
Relax?
- Think of $y_{vw}$ as “lengths”
- Main constraint is then a triangle inequality
- Which means $z_v$ are distances from $s$
Randomized rounding:
- pick a random distance in $(0,1)$
- Expected number of cut edges is equal to objective
- i.e. round with approx ratio 1!
- so in fact, any distance must work.
- never go below 1, so cannot go above it either.

Motivation

Common task: computation/simulation over a mesh.

vertices have state
state influences neighbors
whole system evolves over time
examples: weather simulation, wind tunnel, differential equations

Suppose you want to parallelize

Update state by checking neighbors
Each node handled by some processor
if neighbors on other processors, communication required
Goal: good partition of nodes to processors
Require: same number of neighbors per processor (balanced work)
Require: small number of neighbor links across processors (low communication)

Graph bisection

Special case for 2 processors
Balanced cut---half vertices on each
Minimize number of crossing edges.

$b$-balanced separator

Loosen up a bit
Require only $\ge bn$ vertices per side, for some $b\le 1/2$

Divide and conquer

Given hard graph problem, find separator/bisection
Solve each half
patch clumsily at cut
most of problem in pieces
so clumsy patch doesn't add much cost
But patch paid at each level of divide and conquer---adds up
So divide and conquer must finish fast
So needs to be balanced
Example: min linear arrangement (VLSI)
- line up nodes to minimize edges crossing worst perpendicular cut
- Given opt solution, half-way down is a balanced cut
- so min balanced cut no worse
- find, lay out two halves on left and right side of line recursively
- worst cut on LHS combines edges entirely in LHS with edges crossing to RHS
- So, worst cut is less than worst(LHS,RHS)+bisection
- So is $O(\log n \cdot $bisection$)=O(\log n \cdot )$linear arrangement$)$.

Ratio Cut

Definition:

partition $(S,S')$ minimizing $$ \begin{eqnarray*} \frac{|(S \times S') \cap E|}{|S|\cdot|S'|} \end{eqnarray*} $$
Also known as “sparsest cut”.
since larger size at least $n/2$, this is $\Theta(\mbox{crossing}/n\cdot \mbox{smaller side vertices})$
i.e. proportional to average crossing degree of smaller side

Use to find separator

suppose bisection of value $b$
can find $(1/3,2/3)$ separator of value $O(b\log n)$
to do so, approx sparsest cut
Remove smaller side
Repeat till larger side smaller than $2/3 n$

why good?

consider what remains of opt bisection on large side
no more edges
each side still has $n/6$ vertices
so good ratio
so always remove good ratio
so total removed has good ratio

given above, can find 1/3-2/3 cut that is close to minimum value ILP

generalize $s$-$t$ min-cut
indicator variable $y_{vw}$ says edge $vw$ cut
indicator variable $z_{vw}$ says $v$, $w$ separated by cut
“triangle inequalities” constraints $z_{vw}$ based on $y_{vw}$
want to minimize ratio $\sum y_{vw}/\sum z_{vw}$

Relaxation:

instead of minimizing ratio, minimize numerator while requiring denominator exceed $1$ (equivalent by scaling).
Assign lengths to edges such that
- Sum of pairwise distances is 1 (or more---doesn't hurt)
- Sum of edge lengths is minimized
Why is this a relaxation?
- Consider any $S$
- Assign length $1/|S|\cdot|S'|$ to each edge in cut
- Sum of pairwise distance is that times number of cut pairs, i.e. 1
- Sum of lengths is value of that ratio cut
- So LP opt below min ratio cut
Write as LP
- edge lengths $d_{ij}$
- distances $d_{ij} \ge 0$
- triangle constraints $d_{ik} \le d_{ij}+d_{jk}$
- constrain $\sum d_{ij} \ge 1$ (over all pairs)
- minimize “volume” (over existing edges) $\sum_{ij \in E}y_{ij}$

Embeddings

Goal

Above relaxation gives turns graph into a metric space.
A cut is also a metric, but a special kind
We want to “round” arbitrary metric to cut metric

How do we measure error in rounding?

An embedding $\phi$ from $(X,d)$ to $(X',d')$
- is a contraction if $d'(\phi(u),\phi(v)) \le d(u,v)$
- has distortion $c=\max(d(u,v)/d'(\phi(u),\phi(v)))$
- is an isometry if distortion 1, i.e. $d'(\phi(u),\phi(v)) = d(u,v)$
Standard Metrics over $\Re^n$
- $\ell_1(x,y) = \sum |x_i-y_i|$
- $\ell_2(x,y) = \sqrt{\sum (x_i-y_i)^2}$
- $\ell_\infty(x,y) = \max |x_i-y_i|$
- $\ell_p(x,y) = ({\sum (x_i-y_i)^p})^{1/p}$

Every metric (isometric) embeddable in $\ell_\infty$

Just make one coordinate for each point in $X$
Set $\phi(u)_x = d(u,x)$
Then $\phi(u)_x-\phi(v)_x = d(u,x)-d(v,x) \le d(u,v)$ by triangle inequalty
But $\phi(v)_u = d(u,v)$
So max is $d(u,v)$

Isometric Embedding in $\ell_2$ tractable

suppose embeddings into $v_1,v_2,\ldots$
wlog map $v_1=0$
$d^2_{ij} = (v_i-v_j)^2 = v_i^2+v_j^2-2v_iv_j$
But $v_i^2 = (v_i-v_1)^2 = d_{i0}^2$
So $v_i\cdot v_j = \frac12(d_{1j}^2+d_{1i}^2-d^2_{ij})$
Write matrix of dot products
take Cholesky factorization
Similarly, can find minimum distortion by optimizing SDP.

We are actually going to round via embedding in $\ell_1$:

Suppose have $\ell_1$ embedding with ratio $\beta$ between “edge volume” (sum of edge lengths) and “all-pairs volume” (sum of all-pairs distances)
i.e. such that edge-volume $\le \beta \cdot$ total-volume (note $\beta< 1$)
Claim can round to cut with same ratio
For coordinate $j$ define $\delta_j^{u,v}(t) = 1$ iff $t$ between $u_j$ and $v_j$, and 0 otherwise.
Each $j,t$ defines a cut (exactly like $s$-$t$ cut case)
Integrating over $t$ gives $|u_j-v_j|$
So summing over dimensions $j$ gives $\ell_1$ metric sum (for edge or total volume)
know edge-volume $\le \beta \cdot$ total-volume
So must be some specific $t,j$ where this ratio achieved
- i.e $\int f(x) \le \beta \int g(x)$
- so know $f(x) \le \beta g(x)$ somewhere
This is a cut achieving the ratio.
Actually, don't need to integrate, since $\delta$ is constant except at $n$ places ($j$ coords of points)
so if our LP gave us an $\ell_1$ metric we'd be done
unfortunately it gives us an arbitrary metric.

Embedding

If our metric were $\ell_1$, done
So, try embedding
Bourgain: can embed any metric into $\log^2n$ dimensions (unimportant) with $O(\log n)$ distortion
Conclude $O(\log n)$ approx for ratio cut

Algorithm

Define embedding
- For each power of $2$, pick $L=O(\log n)$ random subsets of size $n/2^t$
- Gives collection $A_i$ of $L\log n$ sets
- Set $\phi(u)_i = d(u,A)$
- then divide by $L\log n$ (scaling)
- this is like $\ell_{\infty}$ embedding, but distance to sets instead of points
This is a contraction:
- contraction for any one $A$: if $s \in A$ determines $d(u,A)$, and $t\in A$ determines $d(v,A)$, then $$ \begin{eqnarray*} d(u,s) &\le & d(u,t)\\ &\le &d(u,v)+d(v,t)\\ d(u,s)-d(v,t) &\le &d(u,v)\\ d(u,A)-d(v,A) &\le &d(u,v) \end{eqnarray*} $$
- so each coordinate is at most $d(u,v)$
- dividing by $L\log n$ coordinates scales to a contraction.
Distortion:
- we need to prove it doesn't contract too much
- let $\rho_t$ be minimum radius such that both $B(u,\rho_t)$ and $B(v,\rho_t)$ have $2^t$ points in them.
- if $\rho_t < d_{uv}/3$ then these radii define disjoint balls
- wlog, $B(u,\rho_t)$ has exactly $2^t$ points;
- meanwhile, $B(v,\rho_{t-1})$ has at least $2^{t-1}$ points
- then with constant probability, a sample of size $n/2^t$ has points in $B(v,\rho_{t-1})$ but not $B(u,\rho_t)$
- in other words, distance from $u$ is less than $\rho_t$ but from $v$ is more than $\rho_{t-1}$
- so contributes at least $\rho_t-\rho_{t-1}$ to difference in $\phi_i$ coordinate
- we pick $L$ samples, so chernoff bound says we are near expectation whp
- true for all $t$ until $\rho_t \ge d(u,v)/3$
- so, summing over all $i$, difference in coords is $\Omega(L\sum \rho_t-\rho_{t-1}) = \Omega(L d_{uv})$
- we then scale by $1/L\log n$
- so distance at least $d(u,v)/\log n$
Note: $L$ didn't matter for distortion, just dimension
But we do achieve $O(\log^2 n)$-dimensional embedding

Gap is tight:

Take constant degree expander
Set edge lengths $1/\log n$
Most vertices $\Omega(\log n)$ distant
So sum of distances is $\Omega(n^2)$
While sum of edge lengths is only $O(n\log n)$
So ratio is $O((\log n)/n)$
But any cut with $k$ vertices on small side has $\Omega(k)$ edges leaving
So best cut ratio is $\Omega(k/kn)=\Omega(1/n)$
So $\log n$ gap in our rounding is tight.

Extensions

Generalization to multicommodity flow/cut
$\Theta(\log k)$ gap for $k$ commodities
Arora Rao Vazirani use a more sophisticated SDP approach to approximate ration cut to $O(\sqrt{\log n})$
Other powerful metric embedding methods for near neighbor search, tree metrics for online algorithms