Markov Chains for Sampling
Sampling: Given complex state space
 Want to sample from it
 Use some Markov Chain
 Run for a long time
 end up “near” stationary distribution
 Reduces sampling to local moves (easier)
 no need for global description of state space
 Allows sample from exponential state space
Formalize: what is “near” and “long time”?
 Stationary distribution $\pi$
 arbitrary distribution $q$
 relative pointwise distance (r.p.d.) $\max_j q_j\pi_j/\pi_j$
 Intuitively close.
 Formally, suppose r.p.d. $\delta$.
 Then $(1\delta)\pi \le q$
 So can express distribution $q$ as “with probability $1\delta$, sample from $\pi$. Else, do something wierd.
 So if $\delta$ small, “as if” sampling from $\pi$ each time.
 If $\delta$ poly small, can do poly samples without goof
 Gives “almost stationary” sample from Markov Chain
 Mixing Time: time to reduce r.p.d to some $\epsilon$
Eigenvalues
Method 1 for mixing time: Eigenvalues.
 Consider transition matrix $P$.
 Eigenvalues $\lambda_1\ge \cdots \ge\lambda_n$
 Corresponding Eigenvectors $e_1,\ldots,e_n$.
 Any vector $q$ can be written as $\sum a_i e_i$
 Then $qP = \sum a_i\lambda_i e_i$
 and $qP^k = \sum a_i \lambda_{i}^k e_i$
 so sufficient to understand eigenvalues and vectors.
 Complex eigenvectors?
 PerronFrobenius Theorem says for aperiodic irreducible nonnegative square matrix (our case) the largest eigenvector is unique
 And has all nonnegative values
 (If periodic, then number of eigenvectors equals the period, but they are all rotations of the same (possibly complex) eigenvector.)
 Is any $\lambda_i > 1$?
 If so, $e_i P = \lambda_i P$
 let $M$ be max entry of $e_i$ (in absolute value)
 if $\lambda_i > 1$, then some $e_i P$ entry is $\lambda_i M > M$
 any entry of $e_i P$ is a convex combo of values at most $M$, so max value $M$, contradiction.
 Deduce: all eigenvalues of stochastic matrix at most 1.
 How many $\lambda_i=1$?
 Stationary distribution ($e_1=\pi$)
 if any others, could add a little bit of it to $e_1$, get second stationary distribution
 What about $1$? Only if periodic.
 so all other coordinates of eigenvalue decomposition decay as $\lambda_i^k$.
 So if can show other $\lambda_i$ small, converge to stationary distribution fast.
 In particular, if $\lambda_2 < 11/poly$, get polynomial mixing time
Expanders:
Definition
 bipartite
 $n$ vertices, regular degree $d$
 $\Gamma(S) \ge (1+c(12S/n))S$
Take random walk on $(n,d,c)$ expander with constant $c$
 add self loops (with probability $1/2$ to deal with periodicity.
 uniform stationary distribution
 lemma: second eigenvalue $1O(1/d)$ \[ \lambda_2 \le 1\frac{c^2}{d(2048+4c^2)} \]
 Intuition on convergence: because neighborhoods grow, position becomes unpredictable very fast.
 proof: messy math
 Converse theorem: if $\lambda_2 \le 1\epsilon$, get expander with \[ c \ge 4(\epsilon\epsilon^2) \]
 Walks that mix fast are on expanders.
Do expanders exist? Yes! proof: probabilistic method.

Definition: $(n,d,\alpha,c)$ ORconcentrator
 bipartite $2n$ vertices
 degree at most $d$ in $L$
 expansion $c$ on sets $< \alpha n$.

claim: $(n,18,1/3,2)$concentrator exists
 Construct by sampling $d$ random neighbors with replacement
 $E_s$: Specific size $s$ set has $< cs$ neighbors.
 fix $S$ of size $s$. $T$ of size $< cs$.
 prob. $S$ goes to $T$ at most $(cs/n)^{ds}$
 $\binom{n}{cs}$ sets $T$
 $\binom{n}{s}$ sets $S$
 $$ \begin{eqnarray*} \Pr[] &\le &\binom{n}{s}\binom{n}{cs}(cs/n)^{ds}\\ &\le &(en/s)^s(en/cs)^{cs}(cs/n)^{ds}\\ &= &[(s/n)^{dc1}e^{c+1}c^{dc}]^s\\ &\le &[(1/3)^{dc1}e^{c+1}c^{dc}]^s\\ &\le &[(c/3)^d(3e)^{c+1}]^s \end{eqnarray*} $$
 Take $c=2, d=18$, get $[(2/3)^{18}(3e)^3]^{s} < 2^{s}$
 sum over $s$, get $< 1$
 Construct by sampling $d$ random neighbors with replacement
GabberGalil expanders:
 But in this case, can do better deterministically.
 Gabber Galil expanders.
 Let $n=2m^2$. Vertices are $(x,y)$ where $x,y \in Z_m$ (one set per side)
 5 neighbors: $(x,y), (x, x+y), (x,x+y+1),(x+y,y),(x+y+1,y)$ (add mod $m$)
 or 7 neighbors of similar form.
 Theorem: this $d=5$ graph has $c=(2\sqrt{3})/4$, degree 7 has twice the expansion.
 in other words, $c$ and $d$ are constant.
 meaning $\lambda_2=1\epsilon$ for some constant $\epsilon$
 So random walks on this expander mix very fast: for polynomially small r.p.d., $O(\log n)$ steps of random walk suffice.
 Note also that $n$ can be huge, since only need to store one vertex ($O(\log n)$ bits).
Expanders:
Definition
 bipartite
 $n$ vertices, regular degree $d$
 $\Gamma(S) \ge (1+c(12S/n))S$
Take random walk on $(n,d,c)$ expander with constant $c$
 add self loops (with probability $1/2$ to deal with periodicity.
 uniform stationary distribution
 lemma: second eigenvalue $1O(1/d)$ \[ \lambda_2 \le 1\frac{c^2}{d(2048+4c^2)} \]
 Intuition on convergence: because neighborhoods grow, position becomes unpredictable very fast.
 proof: messy math
Converse theorem: if $\lambda_2 \le 1\epsilon$, get expander with \[ c \ge 4(\epsilon\epsilon^2) \] Walks that mix fast are on expanders.
GabberGalil expanders:
 Do expanders exist? Yes! proof: probabilistic method.
 But in this case, can do better deterministically.
 Gabber Galil expanders.
 Let $n=2m^2$. Vertices are $(x,y)$ where $x,y \in Z_m$ (one set per side)
 5 neighbors: $(x,y), (x, x+y), (x,x+y+1),(x+y,y),(x+y+1,y)$ (add mod $m$)
 or 7 neighbors of similar form.
 Theorem: this $d=5$ graph has $c=(2\sqrt{3})/4$, degree 7 has twice the expansion.
 in other words, $c$ and $d$ are constant.
 meaning $\lambda_2=1\epsilon$ for some constant $\epsilon$
 So random walks on this expander mix very fast: for polynomially small r.p.d., $O(\log n)$ steps of random walk suffice.
 Note also that $n$ can be huge, since only need to store one vertex ($O(\log n)$ bits).
Application: conserving randomness.
 Consider an BPP algorithm (gives right answer with probability $99/100$ (constant irrelevant) using $n$ bits.
 $t$ independent trials with majority rule reduce failure probability to $2^{O(t)}$ (chernoff), but need $tn$ bits
 in case of $RP$, used $2$point sampling to get error $O(1/t)$ with $2n$ bits and $t$ trials.
 Use walk instead.
 vertices are $N=2^n$ ($n$bit) random strings for algorithm.
 edges as degree7 expander
 only $1/100$ of vertices are bad.
 what is probability majority of time spent there?
 in limit, spend $1/100$ of time there
 how fast converge to limit? How long must we run?
 Power the markov chain so $\lambda_2^{\beta}\le 1/10$ (constant number of steps)
 use random seeds encountered every $\beta$ steps.
 number of bits needed:
 $O(n)$ for stationary starting point
 $3\beta$ more per trial,
 Theorem: after $7k$ samples, probability majority wrong is
$1/2^k$. So error $1/2^n$ with $O(n)$ bits! (compare to naive)
 Let $B$ be powered transition matrix
 let $p^{(i)}$ be distribution of sample $i$, namely $p^0B^i$
 Let $W$ be indicator matrix for good witnesses, namely $1$ at diagonal $i$ if $i$ is a witness. $\Wbar$ completmentary set $IW$.
 $\p^iW\_1$ is probability $p^i$ is witness set. similar for nonwitness.
 Consider a sequence of $7k$ results “witness or not”
 represent as matrices $S=(S_1,\ldots,S_{7k}) \in \{W,\Wbar\}^{7k}$
 claim \[ \Pr[S] = \p^{(0)}(BS_1)(BS_2)\cdots(BS_{7k})\_1. \] (draw layered graph. sums prob. of paths through correct sequence of witness/nonwitness)
 Use $2$norm since easier to work with.
 Note $\A\_1 \le \sqrt{N}\A\_2$
 For fixed sum of values, minimize sum of square by setting all equal
 ie, for sum $\alpha$, set all equal to $\alpha/N$
 2norm $\alpha/\sqrt{N}$
 defer: $\pBW\_2 \le \p\_2$ and $\pB\Wbar\_2 \le \frac15\p\_2$
 deduce if more than $7k/2$ bad witnesses, $$ \begin{eqnarray*} \p^0\prod BS_i\_1 &\le & \sqrt{N}\p^0\prod BS_i\\\ &\le & \sqrt{N}(\frac15)^{7k/2}\p^0\\\ &= &(\frac15)^{7k/2} \end{eqnarray*} $$ (since $\p_0\=1/\sqrt{N}$)
 At same time, only $2^{7k}$ bad sequences, so error prob. $2^{7k}5^{7k/2} \le 2^{k}$
 proof of lemma:
 write $p=\sum a_ie_i$ with $e_1=\pi$
 obviously $\pBW\ \le \pB\$ since $W$ just zeros some stuff out.
 But $\pB\=\sqrt{\sum a_i^2\lambda_i^2}\le\sum a_i^2 = \p\$
 Now write $p=a_1\pi+y$ where $y \cdot \pi=0$
 argue that $\\pi B\Wbar\\le \\pi \/10\le \p\/10$ and $\yB\Wbar\\le\y\/10\le\p\/10$, done by pythagorous
 First $\pi$:
 recall $\pi B=\pi$ is uniform vector, all coords $1/N$, so norm $1/\sqrt{N}$
 $\Wbar$ has only $1/100$ of coordintes nonzero, so
 $\e_1\Wbar\ = \sqrt{(N/100)(1/N)}=1/10$
 Now $y$: just note $\yB\ \le \y\/10$ since $\lambda_2 \le 1/10$. Then $\Wbar$ zeros out.
 summary: $\pi$ part likely to be in good witness set, $y$ part unlikely to be relevant.