\documentclass[12pt]{article} \usepackage{wide,me} \parindent0pt
$\newcommand{\Wbar}{{\overline W}}$

### Markov Chains for Sampling

Sampling:
• Given complex state space
• Want to sample from it
• Use some Markov Chain
• Run for a long time
• end up “near” stationary distribution
• Reduces sampling to local moves (easier)
• no need for global description of state space
• Allows sample from exponential state space

Formalize: what is “near” and “long time”?

• Stationary distribution $\pi$
• arbitrary distribution $q$
• relative pointwise distance (r.p.d.) $\max_j |q_j-\pi_j|/\pi_j$
• Intuitively close.
• Formally, suppose r.p.d. $\delta$.
• Then $(1-\delta)\pi \le q$
• So can express distribution $q$ as “with probability $1-\delta$, sample from $\pi$. Else, do something wierd.
• So if $\delta$ small, “as if” sampling from $\pi$ each time.
• If $\delta$ poly small, can do poly samples without goof
• Gives “almost stationary” sample from Markov Chain
• Mixing Time: time to reduce r.p.d to some $\epsilon$

### Eigenvalues

Method 1 for mixing time: Eigenvalues.

• Consider transition matrix $P$.
• Eigenvalues $\lambda_1\ge \cdots \ge\lambda_n$
• Corresponding Eigenvectors $e_1,\ldots,e_n$.
• Any vector $q$ can be written as $\sum a_i e_i$
• Then $qP = \sum a_i\lambda_i e_i$
• and $qP^k = \sum a_i \lambda_{i}^k e_i$
• so sufficient to understand eigenvalues and vectors.
• Complex eigenvectors?
• Perron-Frobenius Theorem says for aperiodic irreducible nonnegative square matrix (our case) the largest eigenvector is unique
• And has all nonnegative values
• (If periodic, then number of eigenvectors equals the period, but they are all rotations of the same (possibly complex) eigenvector.)
• Is any $|\lambda_i| > 1$?
• If so, $e_i P = \lambda_i P$
• let $M$ be max entry of $e_i$ (in absolute value)
• if $\lambda_i > 1$, then some $e_i P$ entry is $\lambda_i M > M$
• any entry of $e_i P$ is a convex combo of values at most $M$, so max value $M$, contradiction.
• Deduce: all eigenvalues of stochastic matrix at most 1.
• How many $\lambda_i=1$?
• Stationary distribution ($e_1=\pi$)
• if any others, could add a little bit of it to $e_1$, get second stationary distribution
• What about $-1$? Only if periodic.
• so all other coordinates of eigenvalue decomposition decay as $\lambda_i^k$.
• So if can show other $\lambda_i$ small, converge to stationary distribution fast.
• In particular, if $\lambda_2 < 1-1/poly$, get polynomial mixing time

### Expanders:

Definition

• bipartite
• $n$ vertices, regular degree $d$
• $|\Gamma(S)| \ge (1+c(1-2|S|/n))|S|$
n factor $c$ more neighbors, at least until $S$ near $n/2$.

Take random walk on $(n,d,c)$ expander with constant $c$

• add self loops (with probability $1/2$ to deal with periodicity.
• uniform stationary distribution
• lemma: second eigenvalue $1-O(1/d)$ $\lambda_2 \le 1-\frac{c^2}{d(2048+4c^2)}$
• Intuition on convergence: because neighborhoods grow, position becomes unpredictable very fast.
• proof: messy math
Deduce: mixing time in expander is $O(\log n)$ to get $\epsilon$ r.p.d. (since $\pi_i = 1/n$)
• Converse theorem: if $\lambda_2 \le 1-\epsilon$, get expander with $c \ge 4(\epsilon-\epsilon^2)$
• Walks that mix fast are on expanders.

Do expanders exist? Yes! proof: probabilistic method.

• Definition: $(n,d,\alpha,c)$ OR-concentrator
• bipartite $2n$ vertices
• degree at most $d$ in $L$
• expansion $c$ on sets $< \alpha n$.
• claim: $(n,18,1/3,2)$-concentrator exists
• Construct by sampling $d$ random neighbors with replacement
• $E_s$: Specific size $s$ set has $< cs$ neighbors.
• fix $S$ of size $s$. $T$ of size $< cs$.
• prob. $S$ goes to $T$ at most $(cs/n)^{ds}$
• $\binom{n}{cs}$ sets $T$
• $\binom{n}{s}$ sets $S$
• $$\begin{eqnarray*} \Pr[] &\le &\binom{n}{s}\binom{n}{cs}(cs/n)^{ds}\\ &\le &(en/s)^s(en/cs)^{cs}(cs/n)^{ds}\\ &= &[(s/n)^{d-c-1}e^{c+1}c^{d-c}]^s\\ &\le &[(1/3)^{d-c-1}e^{c+1}c^{d-c}]^s\\ &\le &[(c/3)^d(3e)^{c+1}]^s \end{eqnarray*}$$
• Take $c=2, d=18$, get $[(2/3)^{18}(3e)^3]^{-s} < 2^{-s}$
• sum over $s$, get $< 1$

Gabber-Galil expanders:

• But in this case, can do better deterministically.
• Gabber Galil expanders.
• Let $n=2m^2$. Vertices are $(x,y)$ where $x,y \in Z_m$ (one set per side)
• 5 neighbors: $(x,y), (x, x+y), (x,x+y+1),(x+y,y),(x+y+1,y)$ (add mod $m$)
• or 7 neighbors of similar form.
• Theorem: this $d=5$ graph has $c=(2-\sqrt{3})/4$, degree 7 has twice the expansion.
• in other words, $c$ and $d$ are constant.
• meaning $\lambda_2=1-\epsilon$ for some constant $\epsilon$
• So random walks on this expander mix very fast: for polynomially small r.p.d., $O(\log n)$ steps of random walk suffice.
• Note also that $n$ can be huge, since only need to store one vertex ($O(\log n)$ bits).

### Expanders:

Definition

• bipartite
• $n$ vertices, regular degree $d$
• $|\Gamma(S)| \ge (1+c(1-2|S|/n))|S|$
factor $c$ more neighbors, at least until $S$ near $n/2$.

Take random walk on $(n,d,c)$ expander with constant $c$

• add self loops (with probability $1/2$ to deal with periodicity.
• uniform stationary distribution
• lemma: second eigenvalue $1-O(1/d)$ $\lambda_2 \le 1-\frac{c^2}{d(2048+4c^2)}$
• Intuition on convergence: because neighborhoods grow, position becomes unpredictable very fast.
• proof: messy math
Deduce: mixing time in expander is $O(\log n)$ to get $\epsilon$ r.p.d. (since $\pi_i = 1/n$)

Converse theorem: if $\lambda_2 \le 1-\epsilon$, get expander with $c \ge 4(\epsilon-\epsilon^2)$ Walks that mix fast are on expanders.

Gabber-Galil expanders:

• Do expanders exist? Yes! proof: probabilistic method.
• But in this case, can do better deterministically.
• Gabber Galil expanders.
• Let $n=2m^2$. Vertices are $(x,y)$ where $x,y \in Z_m$ (one set per side)
• 5 neighbors: $(x,y), (x, x+y), (x,x+y+1),(x+y,y),(x+y+1,y)$ (add mod $m$)
• or 7 neighbors of similar form.
• Theorem: this $d=5$ graph has $c=(2-\sqrt{3})/4$, degree 7 has twice the expansion.
• in other words, $c$ and $d$ are constant.
• meaning $\lambda_2=1-\epsilon$ for some constant $\epsilon$
• So random walks on this expander mix very fast: for polynomially small r.p.d., $O(\log n)$ steps of random walk suffice.
• Note also that $n$ can be huge, since only need to store one vertex ($O(\log n)$ bits).

### Application: conserving randomness.

• Consider an BPP algorithm (gives right answer with probability $99/100$ (constant irrelevant) using $n$ bits.
• $t$ independent trials with majority rule reduce failure probability to $2^{-O(t)}$ (chernoff), but need $tn$ bits
• in case of $RP$, used $2$-point sampling to get error $O(1/t)$ with $2n$ bits and $t$ trials.
• vertices are $N=2^n$ ($n$-bit) random strings for algorithm.
• edges as degree-7 expander
• only $1/100$ of vertices are bad.
• what is probability majority of time spent there?
• in limit, spend $1/100$ of time there
• how fast converge to limit? How long must we run?
• Power the markov chain so $\lambda_2^{\beta}\le 1/10$ (constant number of steps)
• use random seeds encountered every $\beta$ steps.
• number of bits needed:
• $O(n)$ for stationary starting point
• $3\beta$ more per trial,
• Theorem: after $7k$ samples, probability majority wrong is $1/2^k$. So error $1/2^n$ with $O(n)$ bits! (compare to naive)
• Let $B$ be powered transition matrix
• let $p^{(i)}$ be distribution of sample $i$, namely $p^0B^i$
• Let $W$ be indicator matrix for good witnesses, namely $1$ at diagonal $i$ if $i$ is a witness. $\Wbar$ completmentary set $I-W$.
• $\|p^iW\|_1$ is probability $p^i$ is witness set. similar for nonwitness.
• Consider a sequence of $7k$ results “witness or not”
• represent as matrices $S=(S_1,\ldots,S_{7k}) \in \{W,\Wbar\}^{7k}$
• claim $\Pr[S] = \|p^{(0)}(BS_1)(BS_2)\cdots(BS_{7k})\|_1.$ (draw layered graph. sums prob. of paths through correct sequence of witness/nonwitness)
• Use $2$-norm since easier to work with.
• Note $\|A\|_1 \le \sqrt{N}\|A\|_2$
• For fixed sum of values, minimize sum of square by setting all equal
• ie, for sum $\alpha$, set all equal to $\alpha/N$
• 2-norm $\alpha/\sqrt{N}$
• defer: $\|pBW\|_2 \le \|p\|_2$ and $\|pB\Wbar\|_2 \le \frac15\|p\|_2$
• deduce if more than $7k/2$ bad witnesses, $$\begin{eqnarray*} \|p^0\prod BS_i\|_1 &\le & \sqrt{N}\|p^0\prod BS_i\|\\ &\le & \sqrt{N}(\frac15)^{7k/2}\|p^0\|\\ &= &(\frac15)^{7k/2} \end{eqnarray*}$$ (since $\|p_0\|=1/\sqrt{N}$)
• At same time, only $2^{7k}$ bad sequences, so error prob. $2^{7k}5^{-7k/2} \le 2^{-k}$
• proof of lemma:
• write $p=\sum a_ie_i$ with $e_1=\pi$
• obviously $\|pBW\| \le \|pB\|$ since $W$ just zeros some stuff out.
• But $\|pB\|=\sqrt{\sum a_i^2\lambda_i^2}\le\sum a_i^2 = \|p\|$
• Now write $p=a_1\pi+y$ where $y \cdot \pi=0$
• argue that $\|\pi B\Wbar\|\le \|\pi \|/10\le \|p\|/10$ and $\|yB\Wbar\|\le\|y\|/10\le\|p\|/10$, done by pythagorous
• First $\pi$:
• recall $\pi B=\pi$ is uniform vector, all coords $1/N$, so norm $1/\sqrt{N}$
• $\Wbar$ has only $1/100$ of coordintes nonzero, so
• $\|e_1\Wbar\| = \sqrt{(N/100)(1/N)}=1/10$
• Now $y$: just note $\|yB\| \le \|y\|/10$ since $\lambda_2 \le 1/10$. Then $\Wbar$ zeros out.
• summary: $\pi$ part likely to be in good witness set, $y$ part unlikely to be relevant.