Max Cut by Semidefinite Programming
Quadratic formulation
- No obvious linear
- How about quadratic? min∑ij1−xixj2xi∈±1
- Relax integrality? Doesn't matter: min∑ij∈E1−xixj2x2i=1
- Unfortunately, proves quadratic programming is NP-complete, even nonintegral.
Vector relaxation:
- Relax xi to a vector vi
- Product becomes dot product min∑ij∈E1−vi⋅vj2‖vi‖=1
- This is a relaxation, so opt is better.
- Note: no constraints possible on individual coordinates
- Remarkable fact: solving this is tractable!
- Consider matrix of vectors V
- Matrix of dot products is D=V2
- In other words, D is positive semidefinite (all eigenvalues nonnegative)
- Conversely, every positive semidefinite matrix D has a “square root” V with V2=D.
- What does positive semidefinite mean? That (∀y)yDyT≥0
- Can write this down as a linear constraint on D! ∑yiyjdij≥0
- Infinitely many linear constraints, but can optimize anyway by
linear programming techniques.
- For example, finding a negative eigenvector serves as a separation oracle
- best current (interior point) algorithms have O(n3.5log1/ϵ)
- 1000s of variables can be solved in practice
- potential problem: irrationals. but we only care about approximating anyway
- Then square root can be computed by “Cholesky Factorization”
- One issue: results in n-dimensional vectors, where n is number of constraints. We want boolean values!
- Requiring dimension of vectors to be small makes it NP-hard again.
Embeddings
- We have put graph in high dimensional space
- Such the 1−vi⋅vj is “large” for edge ij
- i.e., vi⋅vj is small
- i.e., i and j are “far apart”
- Draw
Rounding:
- We need to return to ±1.
- i.e., need a rounding scheme.
- embedding gives large objective to far-apart points
- to keep that value, need to cut far apart items
- Given points in the plane, how separate into two groups such that far apart points get separated?
- Use a random hyperplane
- How generate? Random n-dimensional Gaussian vector
- What is odds i and j are separated?
- Gaussian is “spherically symmetric”: Pr[x1,…,xn]∝e−∑x2i depends only on radius
- To generate 2 Gaussians, pick a random θ and exponential squared-radius in polar coordinates, then look at x and y
- change coordinates so i and j are in 2-dimensional basis plane
- values of Gaussian in other direction irrelevant
- all that matters is Gaussian within the plane
- i.e., look at the random line the gausian generates, check if separates vi and vj
- odds depend on angle between
- in particular, probability of separation is arccos(vi⋅vj)/π
- So, expected cut value is ∑arccos(vi⋅vj)/π
- Compare to optimum:
- Our objective function is ∑1−vi⋅vj2
- which is better than opt
- so our approximation is better than worst case ratio
- What is worst case ratio? minarccos(x)/π(1−x)/2≥.878
- In other words, .878 approx to opt.
Summary
- “Semidefinite programming” lets you relax quadratic programs to vector programs
- Gives .878 approximation to max cut
- Can derandomize using conditional expectations
- in practice, solving SDPs is slow and hard, but can be done
- Can you approx better?
- This rounding scheme is tight
- But maybe there is a better SDP?
- 5-cycle has integrality gap .884…
- other worse examples converge to .878
- Khot: can't beat .878 in P assuming Unique Games Conjecture
- Hastad proved cannot beat 16/17 unless P=NP
2 hours from here to end
Graph Coloring
3-coloring
- define
- NP-complete
- no good approximation
- wigderson: √n
- explain how: greedy plus large independent sets
Relax:
- k-vector coloring: ensure dot products −1/(k−1)
- prove for 3-coloring by embedding in plane
- write as semidefinite program
Rounding?
- Pull out a large independent set, repeat
- how pull independent set? random hyperplane, shifted
- gives independent set of size n/Δ1/3
- so gives Δ1/3 by repeated removal
- so get n1/3
- Improve with wigderson:
- If Δ>n3/4, remove a neighbor set
- Consumes at most n1/4logn colors
- when Δ<n3/4, use semidefinite
- Result: O(n1/4logn) colors.
Finding the independent set:
- Pick a random r, and take all vertices v with r⋅v≥c
Min Ratio Cut
Lead-In: Max-Flow Min-Cut
Probably cover after introducing ratio cut.
Find s-t mincut
- LP-relaxation
- zv indicator of being on t-side of cut
- yvw indicator of edge being cut
- constraint: if edge not cut, endpoints must have same indicator D=min∑vwyvwuvwyvw≥0zw≤zv+yvwzt=1zs=0
- Relax?
- Think of yvw as “lengths”
- Main constraint is then a triangle inequality
- Which means zv are distances from s
- Randomized rounding:
- pick a random distance in (0,1)
- Expected number of cut edges is equal to objective
- i.e. round with approx ratio 1!
- so in fact, any distance must work.
- never go below 1, so cannot go above it either.
Motivation
Common task: computation/simulation over a mesh.
- vertices have state
- state influences neighbors
- whole system evolves over time
- examples: weather simulation, wind tunnel, differential equations
Suppose you want to parallelize
- Update state by checking neighbors
- Each node handled by some processor
- if neighbors on other processors, communication required
- Goal: good partition of nodes to processors
- Require: same number of neighbors per processor (balanced work)
- Require: small number of neighbor links across processors (low communication)
Graph bisection
- Special case for 2 processors
- Balanced cut---half vertices on each
- Minimize number of crossing edges.
b-balanced separator
- Loosen up a bit
- Require only ≥bn vertices per side, for some b≤1/2
Divide and conquer
- Given hard graph problem, find separator/bisection
- Solve each half
- patch clumsily at cut
- most of problem in pieces
- so clumsy patch doesn't add much cost
- But patch paid at each level of divide and conquer---adds up
- So divide and conquer must finish fast
- So needs to be balanced
- Example: min linear arrangement (VLSI)
- line up nodes to minimize edges crossing worst perpendicular cut
- Given opt solution, half-way down is a balanced cut
- so min balanced cut no worse
- find, lay out two halves on left and right side of line recursively
- worst cut on LHS combines edges entirely in LHS with edges crossing to RHS
- So, worst cut is less than worst(LHS,RHS)+bisection
- So is O(logn⋅bisection)=O(logn⋅)linear arrangement).
Ratio Cut
Definition:
- partition (S,S′) minimizing |(S×S′)∩E||S|⋅|S′|
- Also known as “sparsest cut”.
- since larger size at least n/2, this is Θ(crossing/n⋅smaller side vertices)
- i.e. proportional to average crossing degree of smaller side
Use to find separator
- suppose bisection of value b
- can find (1/3,2/3) separator of value O(blogn)
- to do so, approx sparsest cut
- Remove smaller side
- Repeat till larger side smaller than 2/3n
why good?
- consider what remains of opt bisection on large side
- no more edges
- each side still has n/6 vertices
- so good ratio
- so always remove good ratio
- so total removed has good ratio
- generalize s-t min-cut
- indicator variable yvw says edge vw cut
- indicator variable zvw says v, w separated by cut
- “triangle inequalities” constraints zvw based on yvw
- want to minimize ratio ∑yvw/∑zvw
Relaxation:
- instead of minimizing ratio, minimize numerator while requiring denominator exceed 1 (equivalent by scaling).
- Assign lengths to edges such that
- Sum of pairwise distances is 1 (or more---doesn't hurt)
- Sum of edge lengths is minimized
- Why is this a relaxation?
- Consider any S
- Assign length 1/|S|⋅|S′| to each edge in cut
- Sum of pairwise distance is that times number of cut pairs, i.e. 1
- Sum of lengths is value of that ratio cut
- So LP opt below min ratio cut
- Write as LP
- edge lengths dij
- distances dij≥0
- triangle constraints dik≤dij+djk
- constrain ∑dij≥1 (over all pairs)
- minimize “volume” (over existing edges) ∑ij∈Eyij
Embeddings
Goal
- Above relaxation gives turns graph into a metric space.
- A cut is also a metric, but a special kind
- We want to “round” arbitrary metric to cut metric
How do we measure error in rounding?
- An embedding ϕ from (X,d) to (X′,d′)
- is a contraction if d′(ϕ(u),ϕ(v))≤d(u,v)
- has distortion c=max(d(u,v)/d′(ϕ(u),ϕ(v)))
- is an isometry if distortion 1, i.e. d′(ϕ(u),ϕ(v))=d(u,v)
- Standard Metrics over ℜn
- ℓ1(x,y)=∑|xi−yi|
- ℓ2(x,y)=√∑(xi−yi)2
- ℓ∞(x,y)=max|xi−yi|
- ℓp(x,y)=(∑(xi−yi)p)1/p
Every metric (isometric) embeddable in ℓ∞
- Just make one coordinate for each point in X
- Set ϕ(u)x=d(u,x)
- Then ϕ(u)x−ϕ(v)x=d(u,x)−d(v,x)≤d(u,v) by triangle inequalty
- But ϕ(v)u=d(u,v)
- So max is d(u,v)
Isometric Embedding in ℓ2 tractable
- suppose embeddings into v1,v2,…
- wlog map v1=0
- d2ij=(vi−vj)2=v2i+v2j−2vivj
- But v2i=(vi−v1)2=d2i0
- So vi⋅vj=12(d21j+d21i−d2ij)
- Write matrix of dot products
- take Cholesky factorization
- Similarly, can find minimum distortion by optimizing SDP.
We are actually going to round via embedding in ℓ1:
- Suppose have ℓ1 embedding with ratio β between “edge volume” (sum of edge lengths) and “all-pairs volume” (sum of all-pairs distances)
- i.e. such that edge-volume ≤β⋅ total-volume (note β<1)
- Claim can round to cut with same ratio
- For coordinate j define δu,vj(t)=1 iff t between uj and vj, and 0 otherwise.
- Each j,t defines a cut (exactly like s-t cut case)
- Integrating over t gives |uj−vj|
- So summing over dimensions j gives ℓ1 metric sum (for edge or total volume)
- know edge-volume ≤β⋅ total-volume
- So must be some specific t,j where this ratio achieved
- i.e ∫f(x)≤β∫g(x)
- so know f(x)≤βg(x) somewhere
- This is a cut achieving the ratio.
- Actually, don't need to integrate, since δ is constant except at n places (j coords of points)
- so if our LP gave us an ℓ1 metric we'd be done
- unfortunately it gives us an arbitrary metric.
Embedding
- If our metric were ℓ1, done
- So, try embedding
- Bourgain: can embed any metric into log2n dimensions (unimportant) with O(logn) distortion
- Conclude O(logn) approx for ratio cut
Algorithm
- Define embedding
- For each power of 2, pick L=O(logn) random subsets of size n/2t
- Gives collection Ai of Llogn sets
- Set ϕ(u)i=d(u,A)
- then divide by Llogn (scaling)
- this is like ℓ∞ embedding, but distance to sets instead of points
- This is a contraction:
- contraction for any one A: if s∈A determines d(u,A), and t∈A determines d(v,A), then d(u,s)≤d(u,t)≤d(u,v)+d(v,t)d(u,s)−d(v,t)≤d(u,v)d(u,A)−d(v,A)≤d(u,v)
- so each coordinate is at most d(u,v)
- dividing by Llogn coordinates scales to a contraction.
- Distortion:
- we need to prove it doesn't contract too much
- let ρt be minimum radius such that both B(u,ρt) and B(v,ρt) have 2t points in them.
- if ρt<duv/3 then these radii define disjoint balls
- wlog, B(u,ρt) has exactly 2t points;
- meanwhile, B(v,ρt−1) has at least 2t−1 points
- then with constant probability, a sample of size n/2t has points in B(v,ρt−1) but not B(u,ρt)
- in other words, distance from u is less than ρt but from v is more than ρt−1
- so contributes at least ρt−ρt−1 to difference in ϕi coordinate
- we pick L samples, so chernoff bound says we are near expectation whp
- true for all t until ρt≥d(u,v)/3
- so, summing over all i, difference in coords is Ω(L∑ρt−ρt−1)=Ω(Lduv)
- we then scale by 1/Llogn
- so distance at least d(u,v)/logn
- Note: L didn't matter for distortion, just dimension
- But we do achieve O(log2n)-dimensional embedding
Gap is tight:
- Take constant degree expander
- Set edge lengths 1/logn
- Most vertices Ω(logn) distant
- So sum of distances is Ω(n2)
- While sum of edge lengths is only O(nlogn)
- So ratio is O((logn)/n)
- But any cut with k vertices on small side has Ω(k) edges leaving
- So best cut ratio is Ω(k/kn)=Ω(1/n)
- So logn gap in our rounding is tight.
Extensions
- Generalization to multicommodity flow/cut
- Θ(logk) gap for k commodities
- Arora Rao Vazirani use a more sophisticated SDP approach to approximate ration cut to O(√logn)
- Other powerful metric embedding methods for near neighbor search, tree metrics for online algorithms