6.856 Lecture Notes

Markov Chains

Markov chains:

Powerful tool for sampling from complicated distributions
rely only on local moves to explore state space.
Many use Markov chains to model events that arise in nature.
We create Markov chains to explore and sample from problems.

2SAT:

Fix some assignment $A$
let $f(k)$ be expected time to get all $n$ variables to match $A$ if $k$ currently match.
Then $f(n)=0$, $f(0)=1+f(1)$, and $f(k)=1+\frac12(f(k+1)+f(k-1)$.
Rewrite: $f(0)-f(1) = 1$ and $f(k)-f(k+1) = 2+f(k-1)-f(k)$
So $f(k)-f(k+1) = 2k+1$
deduce $f(0) = 1+3+\cdots+(2n-1) = n^2$
so, find with probability $1/2$ in $2n^2$ time.
With high probability, find in $O(n^2\log n)$ time.

More general formulation: Markov chain

State space $S$
markov chain begins in a start state $X_0$, moves from state to state, so output of chain is a sequence of states $X_0,X_1,\ldots = \{X_t\}_{t=0}^\infty$
movement controlled by matrix of transition probabilities $p_{ij}=$ probability next state will be $j$ given current is $i$.
thus, $\sum_j p_{ij} = 1$ for every $i \in S$
implicit in definition is memorylessness property: \[ \Pr[X_{t+1} = j \mid X_0 = i_0,X_1=i_1,\ldots,X_t = i] = \Pr[X_{t+1} = j \mid X_t = i] = p_{ij}. \]
Initial state $X_0$ can come from any probability distribution, or might be fixed (trivial prob. dist.)
Dist for $X_0$ leads to dist over sequences $\{X_t\}$
Suppose $X_t$ has distribution $q$ (vector, $q_i$ is prob. of state $i$). Then $X_{t+1}$ has dist $qP$. Why?
Observe $\Pr[X_{t+r} = j \mid X_t = i] = P^r_{ij}$

Graph of MC:

Vertex for every state
Edge $(i,j)$ if $p_{ij} > 0$
Edge weight $p_{ij}$
weighted outdegree 1
Possible state sequences are paths through the graph

Stationary distribution:

a $\pi$ such that $\pi P = \pi$
left eigenvector, eigenvalue 1
steady state behavior of chain: if in stationary, stay there.
note stationary distribution is a sample from state space, so if can get right stationary distribution, can sample
lots of chains have them.
to say which, need definitions.

Things to rule out:

infinite directed line (no stationary)
disconnected graph (multiple)
2-cycle (back and forth is not stationary)

Irreducibility

any state can each any other state
i.e. path between any two states
i.e. single strong component in graph

Persistence/Transience:

$r_{ij}^{(t)}$ is probability first hit state $j$ at $t$, given start state $i$.
$f_{ij}$ is probability eventually reach $j$ from $i$, so $\sum r_{ij}^{(t)}$
expected time to reach is hitting time $h_{ij} = \sum tr_{ij}^{(t)}$
If $f_{ij}< 1$ then $h_{ij}=\infty$ since might never reach. Converse not always true.
If $f_{ii}< 1$, state is transient. Else persistent. If $h_{ii}=\infty$, null persistent.

Persistence in finite graphs:

graph has strong components
final strong component has no outgoing edges
Nonfinal components:
- once leave nonfinal component, cannot return
- if nonfinal, nonzero probability of leaving in $n$ steps.
- so guaranteed to leave eventually
- so, vertices in nonfinal components are transient
Final components
- if final, will stay in that component
- If two vertices in same strong component, have path between them
- so nonzero probability of reaching in (say) $n$ steps.
- so, vertices in final components are persistent
- geometric distribution on time to reach, so expected time finite. Not null-persistent
Conclusion:
- In finite chain, no null-persistent states
- In finite irreducible chain, all states non-null persistent (no transient states)

Periodicity:

Periodicity of a state is max $T$ such that some state only has nonzero probability at times $a+Ti$ for integer $i$
Chain periodic if some state has periodicity $>1$
In graph, all cycles containing state have length multiple of $T$
Easy to eliminate: add self loops
slows down chain, otherwise same

Ergodic:

aperiodic and non-null persistent
means might be in state at any time in (sufficiently far) future

Fundamental Theorem of Markov chains: Any irreducible, finite, aperiodic Markov chain satisfies:

All states ergodic (reachable at any time in future)
unique stationary distribution $\pi$, with all $\pi_i > 0$
$f_{ii}=1$ and $h_{ii}=1/\pi_i$
number of times visit $i$ in $t$ steps approaches $t\pi_i$ in limit of $t$.

Justify all except uniqueness here.

Finite irreducible aperiodic implies ergodic (since finite irreducible implies non-null persistent)

Intuitions for quantities:

$h_{ii}$ is expected return time
So hit every $1/h_{ii}$ steps on average
So $h_{ii} = 1/\pi_i$
If in stationary dist, $t\pi_i$ visits follows from linearity of expectation

Random walks on undirected graphs:

general Markov chains are directed graphs. But undirected have some very nice properties.
take a connected, non-bipartite undirected graph on $n$ vertices
states are vertices.
move to uniformly chosen neighber.
So $p_{uv} = 1/d(u)$ for every neighbor $v$
stationary distribution: $\pi_v = d(v)/2m$
unqiqueness says this is only one
deduce $h_{vv}=2m/d(v)$

Definitions:

Hitting time $h_{uv}$ is expected time to reach $u$ from $v$
commute time is $h_{uv}+h_{vu}$
$C_u(G)$ is expected time to visit all vertices of $G$, starting at $u$
cover time is $\max_u C_u(G)$ (so in fact is max over any starting distribution).
let's analyze max cover time

Examples:

clique: commute time $n$, cover time $\Theta(n\log n)$
line: commute time between ends is $\Theta(n^2)$
lollipop: $h_{uv}=\Theta(n^3)$ while $h_{vu} = \Theta(n^2)$ (big difference!)
also note: lollipop has edges added to line, but higher cover time: adding edges can increase cover time even though improves connectivity.

general graphs: adjacent vertices:

lemma: for adjcaent $(u,v)$, $h_{uv}+h_{vu} \le 2m$
proof: new markov chain on edge traversed following vertex MC
- transition matrix is doubly stochastic: column sums are 1 (exactly $d(v)$ edges can transit to edge $(v,w)$, each does so with probability $1/d(v)$)
- In homework, show such matrices have uniform stationary distribution.
- Deduce $\pi_e = 1/2m$. Thus $h_{ee}=2m$.
So consider suppose original chain on vertex $v$.
- suppose arrived via $(u,v)$
- expected to traverse $(u,v)$ again in $2m$ steps
- at this point will have commuted $u$ to $v$ and back.
- so conditioning on arrival method, commute time $2m$ (thanks to memorylessness)

General graph cover time:

theorem: cover time $O(mn)$
proof: find a spanning tree
consider a dfs of tree-crosses each edge once in each direction, gives order $v_1,\ldots,v_{2n-1}$
time for the vertices to be visited in this order is upper bounded by commute time
but vertices adjacent, so commute times $O(m)$
total time $O(mn)$
tight for lollipop, loose for line.

Tighter analysis:

analogue with electrical networks
- Assume unit edge resistance
- Kirchoff's law: current (rate of transitions) conservation
- Ohm's law
- Gives effective resistance $R_{uv}$ between two vertices.
Theorem: $C_{uv} = 2mR_{uv}$
(tightens previous theorem, since $R_{uv} \le 1$)
Proof:
- Suppose put $d(x)$ amperes into every $x$, remove $2m$ from $v$
- $\phi_{uv}$ voltage at $u$ with respect to $v$
- Ohm: Current from $u$ to neighboring $w$ is $\phi_{uv}-\phi_{wv}$
- Kirchoff: $d(u) = \sum_{w \in N(u)}$ currents $=\sum_{w \in N(u)} \phi_{uv}-\phi_{wv}=d(u)\phi_{uv}-\sum \phi_{wv}$
- Also, $h_{uv} = \sum (1/d(u))(1+h_{wv})$, ie $d(u)h_{uv} = d(u)+ \sum_w h_{wv}$
- same soln to both linear equations, so $\phi_{uv} = h_{uv}$
- By same arg, $h_{vu}$ is voltage at $v$ wrt $u$, if insert $2m$ at $u$ and remove $d(x)$ from every $x$
- add linear systems, find $h_{uv}+h_{vu}$ is voltage difference when insert $2m$ at $u$ and remove at $v$.
- now apply ohm.

Applications

Testing graph connectivity in logspace.

Deterministic algorithm (matrix squaring) gives $\log^2 n$ space
Smarter algorithms gives $\log^{4/3} n$ space
Randomized logspace achieves one-sided error
derandomized $\log n$ recently achieved by Reingold
proves $SL=L$ where $SL$ is nondeterministic logspace algorithms and $L$ is deterministic ones.

universal traversal sequences.

Define labelled graph
UTS covers any labelled graph
deterministic construction known for cycle only
we showed cover time $O(n^3)$
so probability takes more than $2n^3$ to cover is $1/2$
repeat $k$ times. Prob fail $1/2^k$
How many graphs? $(nd)^{O(nd)}$
So set $k=O(nd\log nd)$
probabilistic method

Nisan $n^{O(\log n)}$ via pseudorandom generator that fools logspace machines.

Regev.

Bipartite Matching

Goel, Kapralov, Khanna 2010.

An amazing algorithm for regular bipartite graphs.

general bipartite graphs: $\Olog(m\sqrt{n})$
based on augmenting paths
use regularity to find them super fast
by taking a random walk from left to right side!
$O(n\log n)$---sublinear!

Construction

degree $d$
suppose have $2k$ unmatched nodes ($k$ per side)
need augmenting path to add one.
direct all edges left to right
contract $n-k$ matching edges
now any directed path through merged vertex is following back edge
so any path from (unmatch) left vertex to (unmatched) right vertex is augmenting path
merged vertices have in and outdegree $d-1$
add source $s$ with $d$ parallel edges to every unmatched left vertex (so indegree equals outdegree $d$)
add sink $t$ with $d$ parallel edge from every unmatched right vertex (so indegree equals outdegree $d$)
merge $s$ and $t$ (so indegree equals outdegree $dk$)
conclude: every vertex has same in and out degree (Eulerian)

Algorithm

Start a random walk at $s$.
When you return to $s$ you must have reached $t$
so you found an augmenting path!

Analysis

like undirected graphs, Eulerian graphs have stationary distribution uniform over $m$ edges
since for degree $d$ vertex $v$ $$ \begin{align*} \Pr[(v,w)] &= \sum_{(u,v)} \Pr[(u,v)]\cdot (1/d)\\ &=\sum_{(u,v)} (1/m)\cdot(1/d)\\ &= d\cdot(1/m)\cdot(1/d)\\ &=1/m \end{align*} $$
so, vertex probability is indegree over total degree
thus, by fundamental theorem, expected return time is total degree over individual degree
total degree is $nd-(n-k)$ original edges plus $2kd$ edges for $s$ and $t$ for total at most $(n+2k)d$
degree of $s$ is $kd$
ratio (expected time) $2+n/k$
sum over $k$: entire time is $2n+O(n\log n)$