6.856 Lecture Notes

Basic Probability Tools

This lecture will review everything you learned in 6.042.

Basic tools in probability
Expectations
High probability events
Deviations from expectation

Deviations from Expectation

Sometimes expectation isn't enough. Want to study deviations--- probability and magnitude of deviation from expectation.

Example: coupon collection/stable marriage.

Probability didn't get $k^{th}$ coupon after $r$ rounds is $(1-1/n)^r \le e^{-r/n}$
which is $n^{-\beta}$ for $r=\beta n\ln n$
so probability didn't get some coupon is at msot $n\cdot n^{-\beta} = n^{1-\beta}$ (using union bound)
we say “time is $O(n\ln n)$ with high probability” because we can make probability $n^{-\beta}$ for any desired $\beta$ by changing constant that doesn't affect assymptotic claim.
sometime say “with high probability” when prove it for some $\beta > 1$ even if didn't prove it for all.
Saying “almost never above $O(n\ln n)$” is a much stronger statement than saying “$O(n\ln n)$ on average.”

Lower bound:

After $n\lg n$ steps, $1-1/n$ chance of getting given coupon
So chance of getting all $< (1-1/n)^n \approx 1/e$ (note negative correlation)
So reasonable chance of not being faster.
Can show bound is very tight.

Do Exact Balls in Bins here next time

Tail Bounds---Markov Inequality

At other times, don't want to get down and dirty with problem. So have developed set of bounding techniques that are basically problem independent.

few assumptions, so applicable almost anywhere
but for same reason, don't give as tight bounds
the more you require of problem, the tighter bounds you can prove.

Markov inequality.

$\Pr[Y \ge t] \le E[Y]/t$
$\Pr[Y \ge tE[Y]] \le 1/t$.
Only requires an expectation! So very widely applicable.

Example: coupong collecting $\Pr[> tn\log n]=O(1/t)$.

Application: $ZPP=RP \cap coRP$.

If $RP \cap coRP$
- just run both
- if neither affirms, run again
- Each iter has probability $1/2$ to affirm
- So expected iterations 2:
- So $ZPP$.
If $ZPP$
- suppose expected time $T(n)$
- Run for time $2T(n)$, then stop and give default answer
- Probability of default answer at most $1/2$ (Markov)
- So, $RP$.
- If flip default answer, $coRP$

Does this mean can switch Las Vegas to Monte Carlo? Yes.

Monte Carlo to Las Vegas? No, unless have checker.

Chebyshev.

Can make Markov much stronger by generalizing: $\Pr[h(Y) > t] \le E[h(Y)]/t$ for any positive $h$.

Better than Markov because uses more info: variance.

Remind variance, standard deviation. $\sigma_X^2 = E[(X-\mu_X)^2]=E[X^2]-\mu_X^2$
For many distributions, this amount of variation is “right amount”
$E[XY] = E[X]E[Y]$ if independent
variance of independent variables: sum of variances
- $E[(\sum X_i)^2] = E[\sum X_i^2 + 2\sum X_iX_j]$
- second terms expand as $E[X_i]E[X_j]$
- so canceled by $\mu_{X_i}\mu_X{j}$ terms in $(\sum \mu_{X_i})^2$
$\Pr[|X-\mu| \ge t\sigma] = \Pr [(X-\mu)^2 \ge t^2\sigma^2] \le 1/t^2$
Or $\Pr[|X-\mu| \ge d] \le \sigma^2/d^2$
So chebyshev predicts won't stray beyond stdev.
binomial distribution. variance $np(1-p)$. stdev $\sqrt{n}$.
requires (only) a mean and variance. less applicable but more powerful than markov
Real applications later.

Example: coupon collecting.

Start with geometric random variable with success prob. $p$
mean $1/p$
Variance:
- $E[Y^2] = E[Y^2\mid 1 \mbox{ step}]\cdot p + E[Y^2\mid >1 \mbox{ step}]\cdot (1-p)$
- $E[Y^2]=p+E[(Y+1)^2](1-p)=p+(E[Y^2]+2E[Y]+1)(1-p)$
- $Z=(1-p)z+(2-p)/p$
- $Z=(2-p)/p^2$
- variance $(1-p)/p^2< 1/p^2$
Coupon collection is sum of independent vars: variance upper bound \[ \sum_n \left(\frac{n}{n-i+1}\right)^2=n^2 \sum\frac1{i^2}=O(n^2) \]
so $\Pr[2nH_n] < O(n^2)/(nH_n)^2=O(1/\ln^2 n)$

Median Finding

In homework, simple randomized linear-time algorithm

But, what's the constant?
Optimum deterministic: $ \ge (2+\epsilon)n$
Can we do better randomized?

Sampling idea

List $L$
median of sample looks like median of whole. neighborhood.
analysis via Chebyshev bound

Approximation

choose $s$ samples with replacement
(replacement simplifies analysis)
what guess for median?
sort, take $x_{s/2}$
will it be near median?
$\Pr[< x_{n/4}]$ is probability choose $s/2$ items less than $x_{n/4}$
each choice has probability $1/4$
so expect to choose only $s/4$ in smallest quarter
variance is $s(1-1/4)/4=3s/16$
$\sigma = \sqrt{3s}/4$
$s/2$ such small choices would be deviation by $s/4=\sqrt{s/3}\sigma$
so probability $3/s$ by Chebychev
same analysis shows unlikely to exceed $x_{3n/4}$
combine with union bound
conclude probably in middle half
similarly conclude any constant error has probability $O(1/s)$

Exact Algorithm?

choose $s$ samples with replacement
take fences before and after sample median
keep items between fences. sort.
count items on both sides of fences
find right element in between
- $s=n^{3/4}$ samples $x_1,\ldots, x_s$ in sorted order.
- Sample with replacement to keep analysis clean
- lemma: $x_r$ near $rn/s$ position of original list.
- To find median element, look at two positions in sorted order of sample: $a$ at pos. $s/2-\sqrt{n}$ and $b$ at $s/2+\sqrt{n}$

Analysis

works so long as median is between fences
fast so long as few items between fences
Will prove (i) in class, (ii) in homework
Expected number preceding median is $s/2$.
What about variance:
- each sample indicator has variance $p(1-p)=1/4$
- so $s$ choices have variance $< s/4$
- so $\sigma = \sqrt{s/4}\le n^{3/8}/2$
- chebyshev: prob. less than $s/2-\sqrt{n}$ involves deviation by $n^{1/8}\sigma$ so probability $O(n^{-1/4}$).

Running time:

Sorting takes $O(n^{3/4}\log n)=o(n)$ time
Comparing to fences takes $2n$ time
Fail probability $O(n^{-1/4})$

Want to improve?

usually, repetition works
but here, we're looking at the constant
even repeating once kills the constant
but we almost always succeed first time
(later we'll see failure probability much smaller than shown here)

Randomized is strictly better:

Gives important constant factor improvement
Optimum deterministic: $ \ge (2+\epsilon)n$
We claimed $2n$ above
But actually $\le (3/2) n+o(n)$ w.h.p. since half of items will fail first comparison

Pairwise Independence

pseudorandom generators.

Motivation.
Idea of randomness as (complexity theoretic) resource like space or time.
pseudorandom generators turn a little “real” randomness into lots of “fake” randomness”
can be used in many randomized algorithms without change to algorithm or analysis

Pairwise independence

$\Pr[X_i = a \wedge X_j=b] = \Pr[X_i=a]\Pr[X_j=b] \forall i,j,a,b$
equivalently $\Pr[X_i=a \mid X_j=b] = \Pr[X_i = a]$
sometime full independence unnecessary
e.g. pairwise independent vars sufficient for Chebychev bounds

More bits

Given $t$ independent random vars $\mod n$
For each subset, define random variable as sum over subset, $\mod n$
Any two subsets differ on a variable, so pairwise independent
from $t$ independent random bits, get $2^t-1$ pairwise independent
So with $2\log p$ random bits get $p^2$ pairwise independent bits

Generating larger numbers over $Z_p$.

Want random numbers in range $[1,\ldots,p]$
pick random $a,b$
$i^{th}$ random number $ai+b$
Works because invertible over field
Note that each $ai+b$ is uniform random since $b$ is
Must stop after $p$ values since arithmetic loops back
Uses $2\log p$ random bits to generate $p$ random values in $1,\ldots,p$.
If want over nonprime field, use “slightly larger” $p$

Application: median

median only depended on chebyshev
which only depended on pairwise independence
so $O(\log n)$ bits suffice for result

Application: conserving Random Bits

Suppose $RP$ algorithm using $n$ bits.
Reduce error to $2^{-t}$ using $t$ independent trials with $tn$ bits
What do with $2n$ bits?
two direct draws:
- error prob. 1/4.
pseudorandom generator
- set $p \sim 2^k$ so algorithm takes one integer in $Z_p$
- Note all bits in this one integer are independent.
- random variable for number of correct outcomes
- $\mu = t/2$.
- $\sigma = \sqrt{t}/2$.
- error if no cert,
- i.e. $Y-E[Y] \ge t/2$,
- prob. $1/t$ by Chebyshev
- even though only pairwise independent
- so long as $t \le p$ i.e. $t \le 2^k$
Trades time for randomness:
- with $tn$ random bits can get error $2^{-t}$ with $t$ trials
- with $2n$ random bits, can get same probability with $2^t$ trials