Basic Probability Tools
This lecture will review everything you learned in 6.042.
- Basic tools in probability
- Expectations
- High probability events
- Deviations from expectation
Deviations from Expectation
Sometimes expectation isn't enough. Want to study deviations--- probability and magnitude of deviation from expectation.
Example: coupon collection/stable marriage.
- Probability didn't get $k^{th}$ coupon after $r$ rounds is $(1-1/n)^r \le e^{-r/n}$
- which is $n^{-\beta}$ for $r=\beta n\ln n$
- so probability didn't get some coupon is at msot $n\cdot n^{-\beta} = n^{1-\beta}$ (using union bound)
- we say “time is $O(n\ln n)$ with high probability” because we can make probability $n^{-\beta}$ for any desired $\beta$ by changing constant that doesn't affect assymptotic claim.
- sometime say “with high probability” when prove it for some $\beta > 1$ even if didn't prove it for all.
- Saying “almost never above $O(n\ln n)$” is a much stronger statement than saying “$O(n\ln n)$ on average.”
Lower bound:
- After $n\lg n$ steps, $1-1/n$ chance of getting given coupon
- So chance of getting all $< (1-1/n)^n \approx 1/e$ (note negative correlation)
- So reasonable chance of not being faster.
- Can show bound is very tight.
Do Exact Balls in Bins here next time
Tail Bounds---Markov Inequality
At other times, don't want to get down and dirty with problem. So have developed set of bounding techniques that are basically problem independent.
- few assumptions, so applicable almost anywhere
- but for same reason, don't give as tight bounds
- the more you require of problem, the tighter bounds you can prove.
Markov inequality.
- $\Pr[Y \ge t] \le E[Y]/t$
- $\Pr[Y \ge tE[Y]] \le 1/t$.
- Only requires an expectation! So very widely applicable.
Example: coupong collecting $\Pr[> tn\log n]=O(1/t)$.
Application: $ZPP=RP \cap coRP$.
- If $RP \cap coRP$
- just run both
- if neither affirms, run again
- Each iter has probability $1/2$ to affirm
- So expected iterations 2:
- So $ZPP$.
- If $ZPP$
- suppose expected time $T(n)$
- Run for time $2T(n)$, then stop and give default answer
- Probability of default answer at most $1/2$ (Markov)
- So, $RP$.
- If flip default answer, $coRP$
Does this mean can switch Las Vegas to Monte Carlo? Yes.
Monte Carlo to Las Vegas? No, unless have checker.
Chebyshev.
Can make Markov much stronger by generalizing: $\Pr[h(Y) > t] \le E[h(Y)]/t$ for any positive $h$.
Better than Markov because uses more info: variance.
- Remind variance, standard deviation. $\sigma_X^2 = E[(X-\mu_X)^2]=E[X^2]-\mu_X^2$
- For many distributions, this amount of variation is “right amount”
- $E[XY] = E[X]E[Y]$ if independent
- variance of independent variables: sum of variances
- $E[(\sum X_i)^2] = E[\sum X_i^2 + 2\sum X_iX_j]$
- second terms expand as $E[X_i]E[X_j]$
- so canceled by $\mu_{X_i}\mu_X{j}$ terms in $(\sum \mu_{X_i})^2$
- $\Pr[|X-\mu| \ge t\sigma] = \Pr [(X-\mu)^2 \ge t^2\sigma^2] \le 1/t^2$
- Or $\Pr[|X-\mu| \ge d] \le \sigma^2/d^2$
- So chebyshev predicts won't stray beyond stdev.
- binomial distribution. variance $np(1-p)$. stdev $\sqrt{n}$.
- requires (only) a mean and variance. less applicable but more powerful than markov
- Real applications later.
Example: coupon collecting.
- Start with geometric random variable with success prob. $p$
- mean $1/p$
- Variance:
- $E[Y^2] = E[Y^2\mid 1 \mbox{ step}]\cdot p + E[Y^2\mid >1 \mbox{ step}]\cdot (1-p)$
- $E[Y^2]=p+E[(Y+1)^2](1-p)=p+(E[Y^2]+2E[Y]+1)(1-p)$
- $Z=(1-p)z+(2-p)/p$
- $Z=(2-p)/p^2$
- variance $(1-p)/p^2< 1/p^2$
- Coupon collection is sum of independent vars: variance upper bound \[ \sum_n \left(\frac{n}{n-i+1}\right)^2=n^2 \sum\frac1{i^2}=O(n^2) \]
- so $\Pr[2nH_n] < O(n^2)/(nH_n)^2=O(1/\ln^2 n)$
Median Finding
In homework, simple randomized linear-time algorithm
- But, what's the constant?
- Optimum deterministic: $ \ge (2+\epsilon)n$
- Can we do better randomized?
Sampling idea
- List $L$
- median of sample looks like median of whole. neighborhood.
- analysis via Chebyshev bound
Approximation
- choose $s$ samples with replacement
- (replacement simplifies analysis)
- what guess for median?
- sort, take $x_{s/2}$
- will it be near median?
- $\Pr[< x_{n/4}]$ is probability choose $s/2$ items less than $x_{n/4}$
- each choice has probability $1/4$
- so expect to choose only $s/4$ in smallest quarter
- variance is $s(1-1/4)/4=3s/16$
- $\sigma = \sqrt{3s}/4$
- $s/2$ such small choices would be deviation by $s/4=\sqrt{s/3}\sigma$
- so probability $3/s$ by Chebychev
- same analysis shows unlikely to exceed $x_{3n/4}$
- combine with union bound
- conclude probably in middle half
- similarly conclude any constant error has probability $O(1/s)$
Exact Algorithm?
- choose $s$ samples with replacement
- take fences before and after sample median
- keep items between fences. sort.
- count items on both sides of fences
- find right element in between
- $s=n^{3/4}$ samples $x_1,\ldots, x_s$ in sorted order.
- Sample with replacement to keep analysis clean
- lemma: $x_r$ near $rn/s$ position of original list.
- To find median element, look at two positions in sorted order of sample: $a$ at pos. $s/2-\sqrt{n}$ and $b$ at $s/2+\sqrt{n}$
Analysis
- works so long as median is between fences
- fast so long as few items between fences
- Will prove (i) in class, (ii) in homework
- Expected number preceding median is $s/2$.
- What about variance:
- each sample indicator has variance $p(1-p)=1/4$
- so $s$ choices have variance $< s/4$
- so $\sigma = \sqrt{s/4}\le n^{3/8}/2$
- chebyshev: prob. less than $s/2-\sqrt{n}$ involves deviation by $n^{1/8}\sigma$ so probability $O(n^{-1/4}$).
Running time:
- Sorting takes $O(n^{3/4}\log n)=o(n)$ time
- Comparing to fences takes $2n$ time
- Fail probability $O(n^{-1/4})$
Want to improve?
- usually, repetition works
- but here, we're looking at the constant
- even repeating once kills the constant
- but we almost always succeed first time
- (later we'll see failure probability much smaller than shown here)
Randomized is strictly better:
- Gives important constant factor improvement
- Optimum deterministic: $ \ge (2+\epsilon)n$
- We claimed $2n$ above
- But actually $\le (3/2) n+o(n)$ w.h.p. since half of items will fail first comparison
Pairwise Independence
pseudorandom generators.
- Motivation.
- Idea of randomness as (complexity theoretic) resource like space or time.
- pseudorandom generators turn a little “real” randomness into lots of “fake” randomness”
- can be used in many randomized algorithms without change to algorithm or analysis
Pairwise independence
- $\Pr[X_i = a \wedge X_j=b] = \Pr[X_i=a]\Pr[X_j=b] \forall i,j,a,b$
- equivalently $\Pr[X_i=a \mid X_j=b] = \Pr[X_i = a]$
- sometime full independence unnecessary
- e.g. pairwise independent vars sufficient for Chebychev bounds
More bits
- Given $t$ independent random vars $\mod n$
- For each subset, define random variable as sum over subset, $\mod n$
- Any two subsets differ on a variable, so pairwise independent
- from $t$ independent random bits, get $2^t-1$ pairwise independent
- So with $2\log p$ random bits get $p^2$ pairwise independent bits
Generating larger numbers over $Z_p$.
- Want random numbers in range $[1,\ldots,p]$
- pick random $a,b$
- $i^{th}$ random number $ai+b$
- Works because invertible over field
- Note that each $ai+b$ is uniform random since $b$ is
- Must stop after $p$ values since arithmetic loops back
- Uses $2\log p$ random bits to generate $p$ random values in $1,\ldots,p$.
- If want over nonprime field, use “slightly larger” $p$
Application: median
- median only depended on chebyshev
- which only depended on pairwise independence
- so $O(\log n)$ bits suffice for result
Application: conserving Random Bits
- Suppose $RP$ algorithm using $n$ bits.
- Reduce error to $2^{-t}$ using $t$ independent trials with $tn$ bits
- What do with $2n$ bits?
- two direct draws:
- error prob. 1/4.
- pseudorandom generator
- set $p \sim 2^k$ so algorithm takes one integer in $Z_p$
- Note all bits in this one integer are independent.
- random variable for number of correct outcomes
- $\mu = t/2$.
- $\sigma = \sqrt{t}/2$.
- error if no cert,
- i.e. $Y-E[Y] \ge t/2$,
- prob. $1/t$ by Chebyshev
- even though only pairwise independent
- so long as $t \le p$ i.e. $t \le 2^k$
- Trades time for randomness:
- with $tn$ random bits can get error $2^{-t}$ with $t$ trials
- with $2n$ random bits, can get same probability with $2^t$ trials