Max Cut by Semidefinite Programming
Quadratic formulation
- No obvious linear
- How about quadratic? \begin{eqnarray*} \min \sum_{ij} \frac{1-x_ix_j}{2}\\ x_i \in \pm 1 \end{eqnarray*}
- Relax integrality? Doesn't matter: \begin{eqnarray*} \min \sum_{ij \in E} \frac{1-x_ix_j}{2}\\ x_i^2 &= &1 \end{eqnarray*}
- Unfortunately, proves quadratic programming is NP-complete, even nonintegral.
Vector relaxation:
- Relax x_i to a vector v_i
- Product becomes dot product \begin{eqnarray*} \min \sum_{ij \in E} \frac{1-v_i \cdot v_j}{2}\\ \|v_i\| &= &1 \end{eqnarray*}
- This is a relaxation, so opt is better.
- Note: no constraints possible on individual coordinates
- Remarkable fact: solving this is tractable!
- Consider matrix of vectors V
- Matrix of dot products is D=V^2
- In other words, D is positive semidefinite (all eigenvalues nonnegative)
- Conversely, every positive semidefinite matrix D has a “square root” V with V^2=D.
- What does positive semidefinite mean? That (\forall y) yDy^T \ge 0
- Can write this down as a linear constraint on D! \sum y_iy_j d_{ij} \ge 0
- Infinitely many linear constraints, but can optimize anyway by
linear programming techniques.
- For example, finding a negative eigenvector serves as a separation oracle
- best current (interior point) algorithms have O(n^{3.5}\log 1/\epsilon)
- 1000s of variables can be solved in practice
- potential problem: irrationals. but we only care about approximating anyway
- Then square root can be computed by “Cholesky Factorization”
- One issue: results in n-dimensional vectors, where n is number of constraints. We want boolean values!
- Requiring dimension of vectors to be small makes it NP-hard again.
Embeddings
- We have put graph in high dimensional space
- Such the 1-v_i \cdot v_j is “large” for edge ij
- i.e., v_i \cdot v_j is small
- i.e., i and j are “far apart”
- Draw
Rounding:
- We need to return to \pm 1.
- i.e., need a rounding scheme.
- embedding gives large objective to far-apart points
- to keep that value, need to cut far apart items
- Given points in the plane, how separate into two groups such that far apart points get separated?
- Use a random hyperplane
- How generate? Random n-dimensional Gaussian vector
- What is odds i and j are separated?
- Gaussian is “spherically symmetric”: \Pr[x_1,\ldots,x_n] \propto e^{-\sum x_i^2} depends only on radius
- To generate 2 Gaussians, pick a random \theta and exponential squared-radius in polar coordinates, then look at x and y
- change coordinates so i and j are in 2-dimensional basis plane
- values of Gaussian in other direction irrelevant
- all that matters is Gaussian within the plane
- i.e., look at the random line the gausian generates, check if separates v_i and v_j
- odds depend on angle between
- in particular, probability of separation is \arccos(v_i \cdot v_j)/\pi
- So, expected cut value is \sum \arccos(v_i \cdot v_j)/\pi
- Compare to optimum:
- Our objective function is \sum \frac{1-v_i\cdot v_j}{2}
- which is better than opt
- so our approximation is better than worst case ratio
- What is worst case ratio? \min \frac{\arccos(x)/\pi}{(1-x)/2} \ge .878
- In other words, .878 approx to opt.
Summary
- “Semidefinite programming” lets you relax quadratic programs to vector programs
- Gives .878 approximation to max cut
- Can derandomize using conditional expectations
- in practice, solving SDPs is slow and hard, but can be done
- Can you approx better?
- This rounding scheme is tight
- But maybe there is a better SDP?
- 5-cycle has integrality gap .884\ldots
- other worse examples converge to .878
- Khot: can't beat .878 in P assuming Unique Games Conjecture
- Hastad proved cannot beat 16/17 unless P=NP
2 hours from here to end
Graph Coloring
3-coloring
- define
- NP-complete
- no good approximation
- wigderson: \sqrt{n}
- explain how: greedy plus large independent sets
Relax:
- k-vector coloring: ensure dot products -1/(k-1)
- prove for 3-coloring by embedding in plane
- write as semidefinite program
Rounding?
- Pull out a large independent set, repeat
- how pull independent set? random hyperplane, shifted
- gives independent set of size n/\Delta^{1/3}
- so gives \Delta^{1/3} by repeated removal
- so get n^{1/3}
- Improve with wigderson:
- If \Delta > n^{3/4}, remove a neighbor set
- Consumes at most n^{1/4}\log n colors
- when \Delta < n^{3/4}, use semidefinite
- Result: O(n^{1/4}\log n) colors.
Finding the independent set:
- Pick a random r, and take all vertices v with r\cdot v \ge c
Min Ratio Cut
Lead-In: Max-Flow Min-Cut
Probably cover after introducing ratio cut.
Find s-t mincut
- LP-relaxation
- z_v indicator of being on t-side of cut
- y_{vw} indicator of edge being cut
- constraint: if edge not cut, endpoints must have same indicator \begin{eqnarray*} D&=&\min \sum_{vw} y_{vw} u_{vw}\\ y_{vw} &\ge &0\\ z_w &\le z_v + y_{vw}\\ z_t &= &1\\ z_s &= 0 \end{eqnarray*}
- Relax?
- Think of y_{vw} as “lengths”
- Main constraint is then a triangle inequality
- Which means z_v are distances from s
- Randomized rounding:
- pick a random distance in (0,1)
- Expected number of cut edges is equal to objective
- i.e. round with approx ratio 1!
- so in fact, any distance must work.
- never go below 1, so cannot go above it either.
Motivation
Common task: computation/simulation over a mesh.
- vertices have state
- state influences neighbors
- whole system evolves over time
- examples: weather simulation, wind tunnel, differential equations
Suppose you want to parallelize
- Update state by checking neighbors
- Each node handled by some processor
- if neighbors on other processors, communication required
- Goal: good partition of nodes to processors
- Require: same number of neighbors per processor (balanced work)
- Require: small number of neighbor links across processors (low communication)
Graph bisection
- Special case for 2 processors
- Balanced cut---half vertices on each
- Minimize number of crossing edges.
b-balanced separator
- Loosen up a bit
- Require only \ge bn vertices per side, for some b\le 1/2
Divide and conquer
- Given hard graph problem, find separator/bisection
- Solve each half
- patch clumsily at cut
- most of problem in pieces
- so clumsy patch doesn't add much cost
- But patch paid at each level of divide and conquer---adds up
- So divide and conquer must finish fast
- So needs to be balanced
- Example: min linear arrangement (VLSI)
- line up nodes to minimize edges crossing worst perpendicular cut
- Given opt solution, half-way down is a balanced cut
- so min balanced cut no worse
- find, lay out two halves on left and right side of line recursively
- worst cut on LHS combines edges entirely in LHS with edges crossing to RHS
- So, worst cut is less than worst(LHS,RHS)+bisection
- So is O(\log n \cdot bisection)=O(\log n \cdot )linear arrangement).
Ratio Cut
Definition:
- partition (S,S') minimizing \begin{eqnarray*} \frac{|(S \times S') \cap E|}{|S|\cdot|S'|} \end{eqnarray*}
- Also known as “sparsest cut”.
- since larger size at least n/2, this is \Theta(\mbox{crossing}/n\cdot \mbox{smaller side vertices})
- i.e. proportional to average crossing degree of smaller side
Use to find separator
- suppose bisection of value b
- can find (1/3,2/3) separator of value O(b\log n)
- to do so, approx sparsest cut
- Remove smaller side
- Repeat till larger side smaller than 2/3 n
why good?
- consider what remains of opt bisection on large side
- no more edges
- each side still has n/6 vertices
- so good ratio
- so always remove good ratio
- so total removed has good ratio
- generalize s-t min-cut
- indicator variable y_{vw} says edge vw cut
- indicator variable z_{vw} says v, w separated by cut
- “triangle inequalities” constraints z_{vw} based on y_{vw}
- want to minimize ratio \sum y_{vw}/\sum z_{vw}
Relaxation:
- instead of minimizing ratio, minimize numerator while requiring denominator exceed 1 (equivalent by scaling).
- Assign lengths to edges such that
- Sum of pairwise distances is 1 (or more---doesn't hurt)
- Sum of edge lengths is minimized
- Why is this a relaxation?
- Consider any S
- Assign length 1/|S|\cdot|S'| to each edge in cut
- Sum of pairwise distance is that times number of cut pairs, i.e. 1
- Sum of lengths is value of that ratio cut
- So LP opt below min ratio cut
- Write as LP
- edge lengths d_{ij}
- distances d_{ij} \ge 0
- triangle constraints d_{ik} \le d_{ij}+d_{jk}
- constrain \sum d_{ij} \ge 1 (over all pairs)
- minimize “volume” (over existing edges) \sum_{ij \in E}y_{ij}
Embeddings
Goal
- Above relaxation gives turns graph into a metric space.
- A cut is also a metric, but a special kind
- We want to “round” arbitrary metric to cut metric
How do we measure error in rounding?
- An embedding \phi from (X,d) to (X',d')
- is a contraction if d'(\phi(u),\phi(v)) \le d(u,v)
- has distortion c=\max(d(u,v)/d'(\phi(u),\phi(v)))
- is an isometry if distortion 1, i.e. d'(\phi(u),\phi(v)) = d(u,v)
- Standard Metrics over \Re^n
- \ell_1(x,y) = \sum |x_i-y_i|
- \ell_2(x,y) = \sqrt{\sum (x_i-y_i)^2}
- \ell_\infty(x,y) = \max |x_i-y_i|
- \ell_p(x,y) = ({\sum (x_i-y_i)^p})^{1/p}
Every metric (isometric) embeddable in \ell_\infty
- Just make one coordinate for each point in X
- Set \phi(u)_x = d(u,x)
- Then \phi(u)_x-\phi(v)_x = d(u,x)-d(v,x) \le d(u,v) by triangle inequalty
- But \phi(v)_u = d(u,v)
- So max is d(u,v)
Isometric Embedding in \ell_2 tractable
- suppose embeddings into v_1,v_2,\ldots
- wlog map v_1=0
- d^2_{ij} = (v_i-v_j)^2 = v_i^2+v_j^2-2v_iv_j
- But v_i^2 = (v_i-v_1)^2 = d_{i0}^2
- So v_i\cdot v_j = \frac12(d_{1j}^2+d_{1i}^2-d^2_{ij})
- Write matrix of dot products
- take Cholesky factorization
- Similarly, can find minimum distortion by optimizing SDP.
We are actually going to round via embedding in \ell_1:
- Suppose have \ell_1 embedding with ratio \beta between “edge volume” (sum of edge lengths) and “all-pairs volume” (sum of all-pairs distances)
- i.e. such that edge-volume \le \beta \cdot total-volume (note \beta< 1)
- Claim can round to cut with same ratio
- For coordinate j define \delta_j^{u,v}(t) = 1 iff t between u_j and v_j, and 0 otherwise.
- Each j,t defines a cut (exactly like s-t cut case)
- Integrating over t gives |u_j-v_j|
- So summing over dimensions j gives \ell_1 metric sum (for edge or total volume)
- know edge-volume \le \beta \cdot total-volume
- So must be some specific t,j where this ratio achieved
- i.e \int f(x) \le \beta \int g(x)
- so know f(x) \le \beta g(x) somewhere
- This is a cut achieving the ratio.
- Actually, don't need to integrate, since \delta is constant except at n places (j coords of points)
- so if our LP gave us an \ell_1 metric we'd be done
- unfortunately it gives us an arbitrary metric.
Embedding
- If our metric were \ell_1, done
- So, try embedding
- Bourgain: can embed any metric into \log^2n dimensions (unimportant) with O(\log n) distortion
- Conclude O(\log n) approx for ratio cut
Algorithm
- Define embedding
- For each power of 2, pick L=O(\log n) random subsets of size n/2^t
- Gives collection A_i of L\log n sets
- Set \phi(u)_i = d(u,A)
- then divide by L\log n (scaling)
- this is like \ell_{\infty} embedding, but distance to sets instead of points
- This is a contraction:
- contraction for any one A: if s \in A determines d(u,A), and t\in A determines d(v,A), then \begin{eqnarray*} d(u,s) &\le & d(u,t)\\ &\le &d(u,v)+d(v,t)\\ d(u,s)-d(v,t) &\le &d(u,v)\\ d(u,A)-d(v,A) &\le &d(u,v) \end{eqnarray*}
- so each coordinate is at most d(u,v)
- dividing by L\log n coordinates scales to a contraction.
- Distortion:
- we need to prove it doesn't contract too much
- let \rho_t be minimum radius such that both B(u,\rho_t) and B(v,\rho_t) have 2^t points in them.
- if \rho_t < d_{uv}/3 then these radii define disjoint balls
- wlog, B(u,\rho_t) has exactly 2^t points;
- meanwhile, B(v,\rho_{t-1}) has at least 2^{t-1} points
- then with constant probability, a sample of size n/2^t has points in B(v,\rho_{t-1}) but not B(u,\rho_t)
- in other words, distance from u is less than \rho_t but from v is more than \rho_{t-1}
- so contributes at least \rho_t-\rho_{t-1} to difference in \phi_i coordinate
- we pick L samples, so chernoff bound says we are near expectation whp
- true for all t until \rho_t \ge d(u,v)/3
- so, summing over all i, difference in coords is \Omega(L\sum \rho_t-\rho_{t-1}) = \Omega(L d_{uv})
- we then scale by 1/L\log n
- so distance at least d(u,v)/\log n
- Note: L didn't matter for distortion, just dimension
- But we do achieve O(\log^2 n)-dimensional embedding
Gap is tight:
- Take constant degree expander
- Set edge lengths 1/\log n
- Most vertices \Omega(\log n) distant
- So sum of distances is \Omega(n^2)
- While sum of edge lengths is only O(n\log n)
- So ratio is O((\log n)/n)
- But any cut with k vertices on small side has \Omega(k) edges leaving
- So best cut ratio is \Omega(k/kn)=\Omega(1/n)
- So \log n gap in our rounding is tight.
Extensions
- Generalization to multicommodity flow/cut
- \Theta(\log k) gap for k commodities
- Arora Rao Vazirani use a more sophisticated SDP approach to approximate ration cut to O(\sqrt{\log n})
- Other powerful metric embedding methods for near neighbor search, tree metrics for online algorithms