\documentclass[12pt]{article} \usepackage{wide,amsmath} $\newcommand{\indx}{{\it index}}$ \def\xhat{{\hat x}} \def\what{{\hat w}} \def\zhat{{\hat z}} \def\Phat{{\hat P}} \parindent0pt

## Max Cut by Semidefinite Programming

• No obvious linear
• How about quadratic? $$\begin{eqnarray*} \min \sum_{ij} \frac{1-x_ix_j}{2}\\ x_i \in \pm 1 \end{eqnarray*}$$
• Relax integrality? Doesn't matter: $$\begin{eqnarray*} \min \sum_{ij \in E} \frac{1-x_ix_j}{2}\\ x_i^2 &= &1 \end{eqnarray*}$$
• Unfortunately, proves quadratic programming is NP-complete, even nonintegral.

Vector relaxation:

• Relax $x_i$ to a vector $v_i$
• Product becomes dot product $$\begin{eqnarray*} \min \sum_{ij \in E} \frac{1-v_i \cdot v_j}{2}\\ \|v_i\| &= &1 \end{eqnarray*}$$
• This is a relaxation, so opt is better.
• Note: no constraints possible on individual coordinates
• Remarkable fact: solving this is tractable!
• Consider matrix of vectors $V$
• Matrix of dot products is $D=V^2$
• In other words, $D$ is positive semidefinite (all eigenvalues nonnegative)
• Conversely, every positive semidefinite matrix $D$ has a “square root” $V$ with $V^2=D$.
• What does positive semidefinite mean? That $(\forall y) yDy^T \ge 0$
• Can write this down as a linear constraint on $D$! $\sum y_iy_j d_{ij} \ge 0$
• Infinitely many linear constraints, but can optimize anyway by linear programming techniques.
• For example, finding a negative eigenvector serves as a separation oracle
• best current (interior point) algorithms have $O(n^{3.5}\log 1/\epsilon)$
• 1000s of variables can be solved in practice
• potential problem: irrationals. but we only care about approximating anyway
• Then square root can be computed by “Cholesky Factorization”
• One issue: results in $n$-dimensional vectors, where $n$ is number of constraints. We want boolean values!
• Requiring dimension of vectors to be small makes it NP-hard again.

Embeddings

• We have put graph in high dimensional space
• Such the $1-v_i \cdot v_j$ is “large” for edge $ij$
• i.e., $v_i \cdot v_j$ is small
• i.e., $i$ and $j$ are “far apart”
• Draw

Rounding:

• We need to return to $\pm 1$.
• i.e., need a rounding scheme.
• embedding gives large objective to far-apart points
• to keep that value, need to cut far apart items
• Given points in the plane, how separate into two groups such that far apart points get separated?
• Use a random hyperplane
• How generate? Random $n$-dimensional Gaussian vector
• What is odds $i$ and $j$ are separated?
• Gaussian is “spherically symmetric”: $\Pr[x_1,\ldots,x_n] \propto e^{-\sum x_i^2}$ depends only on radius
• To generate 2 Gaussians, pick a random $\theta$ and exponential squared-radius in polar coordinates, then look at $x$ and $y$
• change coordinates so $i$ and $j$ are in 2-dimensional basis plane
• values of Gaussian in other direction irrelevant
• all that matters is Gaussian within the plane
• i.e., look at the random line the gausian generates, check if separates $v_i$ and $v_j$
• odds depend on angle between
• in particular, probability of separation is $\arccos(v_i \cdot v_j)/\pi$
• So, expected cut value is $\sum \arccos(v_i \cdot v_j)/\pi$
• Compare to optimum:
• Our objective function is $\sum \frac{1-v_i\cdot v_j}{2}$
• which is better than opt
• so our approximation is better than worst case ratio
• What is worst case ratio? $\min \frac{\arccos(x)/\pi}{(1-x)/2} \ge .878$
• In other words, .878 approx to opt.

Summary

• “Semidefinite programming” lets you relax quadratic programs to vector programs
• Gives .878 approximation to max cut
• Can derandomize using conditional expectations
• in practice, solving SDPs is slow and hard, but can be done
• Can you approx better?
• This rounding scheme is tight
• But maybe there is a better SDP?
• 5-cycle has integrality gap $.884\ldots$
• other worse examples converge to .878
• Khot: can't beat $.878$ in $P$ assuming Unique Games Conjecture
• Hastad proved cannot beat 16/17 unless P=NP

2 hours from here to end

## Graph Coloring

3-coloring

• define
• NP-complete
• no good approximation
• wigderson: $\sqrt{n}$
• explain how: greedy plus large independent sets

Relax:

• $k$-vector coloring: ensure dot products $-1/(k-1)$
• prove for 3-coloring by embedding in plane
• write as semidefinite program

Rounding?

• Pull out a large independent set, repeat
• how pull independent set? random hyperplane, shifted
• gives independent set of size $n/\Delta^{1/3}$
• so gives $\Delta^{1/3}$ by repeated removal
• so get $n^{1/3}$
• Improve with wigderson:
• If $\Delta > n^{3/4}$, remove a neighbor set
• Consumes at most $n^{1/4}\log n$ colors
• when $\Delta < n^{3/4}$, use semidefinite
• Result: $O(n^{1/4}\log n)$ colors.

Finding the independent set:

• Pick a random $r$, and take all vertices $v$ with $r\cdot v \ge c$

## Min Ratio Cut

Probably cover after introducing ratio cut.

Find $s$-$t$ mincut

• LP-relaxation
• $z_v$ indicator of being on $t$-side of cut
• $y_{vw}$ indicator of edge being cut
• constraint: if edge not cut, endpoints must have same indicator $$\begin{eqnarray*} D&=&\min \sum_{vw} y_{vw} u_{vw}\\ y_{vw} &\ge &0\\ z_w &\le z_v + y_{vw}\\ z_t &= &1\\ z_s &= 0 \end{eqnarray*}$$
• Relax?
• Think of $y_{vw}$ as “lengths”
• Main constraint is then a triangle inequality
• Which means $z_v$ are distances from $s$
• Randomized rounding:
• pick a random distance in $(0,1)$
• Expected number of cut edges is equal to objective
• i.e. round with approx ratio 1!
• so in fact, any distance must work.
• never go below 1, so cannot go above it either.

### Motivation

Common task: computation/simulation over a mesh.

• vertices have state
• state influences neighbors
• whole system evolves over time
• examples: weather simulation, wind tunnel, differential equations

Suppose you want to parallelize

• Update state by checking neighbors
• Each node handled by some processor
• if neighbors on other processors, communication required
• Goal: good partition of nodes to processors
• Require: same number of neighbors per processor (balanced work)
• Require: small number of neighbor links across processors (low communication)

Graph bisection

• Special case for 2 processors
• Balanced cut---half vertices on each
• Minimize number of crossing edges.

$b$-balanced separator

• Loosen up a bit
• Require only $\ge bn$ vertices per side, for some $b\le 1/2$

Divide and conquer

• Given hard graph problem, find separator/bisection
• Solve each half
• patch clumsily at cut
• most of problem in pieces
• so clumsy patch doesn't add much cost
• But patch paid at each level of divide and conquer---adds up
• So divide and conquer must finish fast
• So needs to be balanced
• Example: min linear arrangement (VLSI)
• line up nodes to minimize edges crossing worst perpendicular cut
• Given opt solution, half-way down is a balanced cut
• so min balanced cut no worse
• find, lay out two halves on left and right side of line recursively
• worst cut on LHS combines edges entirely in LHS with edges crossing to RHS
• So, worst cut is less than worst(LHS,RHS)+bisection
• So is $O(\log n \cdot$bisection$)=O(\log n \cdot )$linear arrangement$)$.

### Ratio Cut

Definition:

• partition $(S,S')$ minimizing $$\begin{eqnarray*} \frac{|(S \times S') \cap E|}{|S|\cdot|S'|} \end{eqnarray*}$$
• Also known as “sparsest cut”.
• since larger size at least $n/2$, this is $\Theta(\mbox{crossing}/n\cdot \mbox{smaller side vertices})$
• i.e. proportional to average crossing degree of smaller side

Use to find separator

• suppose bisection of value $b$
• can find $(1/3,2/3)$ separator of value $O(b\log n)$
• to do so, approx sparsest cut
• Remove smaller side
• Repeat till larger side smaller than $2/3 n$

why good?

• consider what remains of opt bisection on large side
• no more edges
• each side still has $n/6$ vertices
• so good ratio
• so always remove good ratio
• so total removed has good ratio
given above, can find 1/3-2/3 cut that is close to minimum value ILP
• generalize $s$-$t$ min-cut
• indicator variable $y_{vw}$ says edge $vw$ cut
• indicator variable $z_{vw}$ says $v$, $w$ separated by cut
• “triangle inequalities” constraints $z_{vw}$ based on $y_{vw}$
• want to minimize ratio $\sum y_{vw}/\sum z_{vw}$

Relaxation:

• instead of minimizing ratio, minimize numerator while requiring denominator exceed $1$ (equivalent by scaling).
• Assign lengths to edges such that
• Sum of pairwise distances is 1 (or more---doesn't hurt)
• Sum of edge lengths is minimized
• Why is this a relaxation?
• Consider any $S$
• Assign length $1/|S|\cdot|S'|$ to each edge in cut
• Sum of pairwise distance is that times number of cut pairs, i.e. 1
• Sum of lengths is value of that ratio cut
• So LP opt below min ratio cut
• Write as LP
• edge lengths $d_{ij}$
• distances $d_{ij} \ge 0$
• triangle constraints $d_{ik} \le d_{ij}+d_{jk}$
• constrain $\sum d_{ij} \ge 1$ (over all pairs)
• minimize “volume” (over existing edges) $\sum_{ij \in E}y_{ij}$

### Embeddings

Goal

• Above relaxation gives turns graph into a metric space.
• A cut is also a metric, but a special kind
• We want to “round” arbitrary metric to cut metric

How do we measure error in rounding?

• An embedding $\phi$ from $(X,d)$ to $(X',d')$
• is a contraction if $d'(\phi(u),\phi(v)) \le d(u,v)$
• has distortion $c=\max(d(u,v)/d'(\phi(u),\phi(v)))$
• is an isometry if distortion 1, i.e. $d'(\phi(u),\phi(v)) = d(u,v)$
• Standard Metrics over $\Re^n$
• $\ell_1(x,y) = \sum |x_i-y_i|$
• $\ell_2(x,y) = \sqrt{\sum (x_i-y_i)^2}$
• $\ell_\infty(x,y) = \max |x_i-y_i|$
• $\ell_p(x,y) = ({\sum (x_i-y_i)^p})^{1/p}$

Every metric (isometric) embeddable in $\ell_\infty$

• Just make one coordinate for each point in $X$
• Set $\phi(u)_x = d(u,x)$
• Then $\phi(u)_x-\phi(v)_x = d(u,x)-d(v,x) \le d(u,v)$ by triangle inequalty
• But $\phi(v)_u = d(u,v)$
• So max is $d(u,v)$

Isometric Embedding in $\ell_2$ tractable

• suppose embeddings into $v_1,v_2,\ldots$
• wlog map $v_1=0$
• $d^2_{ij} = (v_i-v_j)^2 = v_i^2+v_j^2-2v_iv_j$
• But $v_i^2 = (v_i-v_1)^2 = d_{i0}^2$
• So $v_i\cdot v_j = \frac12(d_{1j}^2+d_{1i}^2-d^2_{ij})$
• Write matrix of dot products
• take Cholesky factorization
• Similarly, can find minimum distortion by optimizing SDP.

We are actually going to round via embedding in $\ell_1$:

• Suppose have $\ell_1$ embedding with ratio $\beta$ between “edge volume” (sum of edge lengths) and “all-pairs volume” (sum of all-pairs distances)
• i.e. such that edge-volume $\le \beta \cdot$ total-volume (note $\beta< 1$)
• Claim can round to cut with same ratio
• For coordinate $j$ define $\delta_j^{u,v}(t) = 1$ iff $t$ between $u_j$ and $v_j$, and 0 otherwise.
• Each $j,t$ defines a cut (exactly like $s$-$t$ cut case)
• Integrating over $t$ gives $|u_j-v_j|$
• So summing over dimensions $j$ gives $\ell_1$ metric sum (for edge or total volume)
• know edge-volume $\le \beta \cdot$ total-volume
• So must be some specific $t,j$ where this ratio achieved
• i.e $\int f(x) \le \beta \int g(x)$
• so know $f(x) \le \beta g(x)$ somewhere
• This is a cut achieving the ratio.
• Actually, don't need to integrate, since $\delta$ is constant except at $n$ places ($j$ coords of points)
• so if our LP gave us an $\ell_1$ metric we'd be done
• unfortunately it gives us an arbitrary metric.

Embedding

• If our metric were $\ell_1$, done
• So, try embedding
• Bourgain: can embed any metric into $\log^2n$ dimensions (unimportant) with $O(\log n)$ distortion
• Conclude $O(\log n)$ approx for ratio cut

Algorithm

• Define embedding
• For each power of $2$, pick $L=O(\log n)$ random subsets of size $n/2^t$
• Gives collection $A_i$ of $L\log n$ sets
• Set $\phi(u)_i = d(u,A)$
• then divide by $L\log n$ (scaling)
• this is like $\ell_{\infty}$ embedding, but distance to sets instead of points
• This is a contraction:
• contraction for any one $A$: if $s \in A$ determines $d(u,A)$, and $t\in A$ determines $d(v,A)$, then $$\begin{eqnarray*} d(u,s) &\le & d(u,t)\\ &\le &d(u,v)+d(v,t)\\ d(u,s)-d(v,t) &\le &d(u,v)\\ d(u,A)-d(v,A) &\le &d(u,v) \end{eqnarray*}$$
• so each coordinate is at most $d(u,v)$
• dividing by $L\log n$ coordinates scales to a contraction.
• Distortion:
• we need to prove it doesn't contract too much
• let $\rho_t$ be minimum radius such that both $B(u,\rho_t)$ and $B(v,\rho_t)$ have $2^t$ points in them.
• if $\rho_t < d_{uv}/3$ then these radii define disjoint balls
• wlog, $B(u,\rho_t)$ has exactly $2^t$ points;
• meanwhile, $B(v,\rho_{t-1})$ has at least $2^{t-1}$ points
• then with constant probability, a sample of size $n/2^t$ has points in $B(v,\rho_{t-1})$ but not $B(u,\rho_t)$
• in other words, distance from $u$ is less than $\rho_t$ but from $v$ is more than $\rho_{t-1}$
• so contributes at least $\rho_t-\rho_{t-1}$ to difference in $\phi_i$ coordinate
• we pick $L$ samples, so chernoff bound says we are near expectation whp
• true for all $t$ until $\rho_t \ge d(u,v)/3$
• so, summing over all $i$, difference in coords is $\Omega(L\sum \rho_t-\rho_{t-1}) = \Omega(L d_{uv})$
• we then scale by $1/L\log n$
• so distance at least $d(u,v)/\log n$
• Note: $L$ didn't matter for distortion, just dimension
• But we do achieve $O(\log^2 n)$-dimensional embedding

Gap is tight:

• Take constant degree expander
• Set edge lengths $1/\log n$
• Most vertices $\Omega(\log n)$ distant
• So sum of distances is $\Omega(n^2)$
• While sum of edge lengths is only $O(n\log n)$
• So ratio is $O((\log n)/n)$
• But any cut with $k$ vertices on small side has $\Omega(k)$ edges leaving
• So best cut ratio is $\Omega(k/kn)=\Omega(1/n)$
• So $\log n$ gap in our rounding is tight.

Extensions

• Generalization to multicommodity flow/cut
• $\Theta(\log k)$ gap for $k$ commodities
• Arora Rao Vazirani use a more sophisticated SDP approach to approximate ration cut to $O(\sqrt{\log n})$
• Other powerful metric embedding methods for near neighbor search, tree metrics for online algorithms