Motivation:
till now, our algorithms start with input, work with it
(exception: data structures—come back later)
now, suppose input arrives a little at a time, need instant response
eg stock market, paging
question: what is a “good” algorithm.
depends on what we measure.
if knew whole input \(\sigma\) in advance, easy to optimize \(C_{MIN}(\sigma)\)
but online, early decision without complete info can mess you up
ski rental problem: rent 1, buy \(T\). don’t know how often we are going to ski.
notice that on some inputs, can’t do well! (stock market that only goes down, thrashing in paging)
problem isn’t to decide fast, rather what to decide.
Definition: competitive ratio
compare to full knowledge optimum
\(k\)-competitive if for all sequences etc. \(C_A(\sigma) \le kC_{MIN}(\sigma)\)
sometimes, to ignore edge effects, \(C_A(\sigma) \le kC_{MIN}(\sigma)+ O(1)\).
idea: “regret ratio”
analyze ski rental
Only choice is after how many days to buy.
If I rent \(d\) days and then buy (total cost of \(d+T\)) what is worst competitive ratio for me?
Before I buy skis, ratio is 1 if adversary knows should be renting, and keeps getting worse if adversary has bought. Either way, ratio not improving.
Once I have bought, my cost doesn’t increase and adversary’s doesn’t decrease, so ratio doesn’t worsen
combine: worst ratio is at moment I have bought skis (adversary will stop there).
I pay \(d+T\), adversary pays \(\min(d+1,T)\)
when \(d+1 \le T\) this is \((d+T)/T\) which decreases with \(T\), so best at \(d+1=T\).
Ratio \((d+T)/\min(d+1,T)\) minimized at \(d+1=T\).
ratio is \(2-1/T<2\).
we think of competitve analysis as a (zero sum) game between algorithm and adversary. Want to find the best strategy for algorithm.
supposed to be competitive against all sequences. So, can imagine that adversary is adapting to algorithm’s choices (to get worst sequence)
Known or unknown duration. But assume know which offer is last.
Need fluctuation ratio \(\phi\) between largest \(M\) and smallest \(m\) price.
even if know \(\phi\) but don’t know \(m\), can just run alg after seeing first price
Selling peanuts:
Break into \(\log \phi\) groups of equal amounts
Sell group \(i\) for value \(m \cdot 2^i\)
One group sold for at least half of max price
So achieve \(\log \phi\) competitive
Selling (one) car: Best deterministic algorithm: agree to first price exceeding \(\sqrt{Mm}\)
\(\sqrt{\phi}\) competitive
note have to know when last offer
Can achieve \(\log \phi\) randomized
Consider powers of 2 between \(m\) and \(M\)
Choose one at random
sell all at first bid exceeding
with prob \(1/\log \phi\), pick the power of 2 that is within factor 2 of highest offered price.
Define \(P||\max C_j\) to minimize load.
NP-complete to solve exactly!
Always assign to least loaded machine:
any alg has 2 lower bounds: average load and maximum job size.
Suppose \(M_1\) has max load \(L\), let \(p_j\) be biggest job.
claim every machine has \(L-p_j\) (else wouldn’t have assigned last job to \(M_1\)
thus total load at least \(\sum p_i = m(L-p_j)+p_j\)
thus OPT \(\ge L-p_j+p_j/m\)
but OPT\(\ge p_j\), so \((2-1/m)OPT \ge L\)
More recent algs do somewhat better:
keep some machines small
algorithms not too bad, proofs awful!
Bartal et al ’95: 2-1/70=1.986
Karger et al ’96: 1.945
Albers ’97: 1.923
define: on miss, must evict
LRU, FIFO, LIFO, Flush when full, Least Freq Use
LIFO, LFU not competitive
LRU, FIFO \(k\)-competitive.
will see this is best possible (det)
Easy argument: \(k+1\) competitive
break into maximal phases of \(k+1\) faults
if phase has only \(k\) distinct pages, at most \(k\) faults
thus, \(k+1\) distinct requests
OPT faults on one
With work, we can improve to \(k\) (worth it for tightness):
break into phases: maximal groups containing \(k\) faults
(so group starts with a fault)
let \(q\) be last page before current phase
so \(q\) initially in memory
case 1: LRU faults on \(q\)
then must see \(k\) other pages to evict \(q\)
so see \(k+1\) requests including \(q\) request
so OPT must fault
case 2: LRU faults on a different page, twice
again, to evict that page must see \(k\) other pages between faults
so \(k+1\) distinct pages
so OPT faults
case 3: no fault on \(q\) or twice on any page
\(q\) is in cache at start and stays there
\(k\) other requests must occur to cause \(k\) faults
so \(k+1\) request in total
Generalize: LRU is \(k/(k-h+1)\) competitive against \(h\)-page adversary
Observations:
proved without knowing optimum
instead, derived lower bound on cost of any algorithm
same argument applies to FIFO.
Lower bound: no online algorithm beats \(k\)-competitive.
set of \(k+1\) pages
always ask for the one \(A\) doesn’t have
faults every time.
so, just need to show can get away with 1 fault every \(k\) steps
have \(k\) pages, in memory. When fault, look ahead, one of \(k+1\) isn’t used in next \(k\), so evict it.
one fault every \(k\) steps
so \(A\) is only \(k\)-competitive.
Observations:
Lower Bound can be proven without knowing OPT, often is.
competitive analysis doesn’t distinguish LRU and FIFO, even though know different in practice.
still trying to refine competitive analysis to measure better: SODA paper: “LRU is better than FIFO”
applies even if just have \(k+1\) pages!
Optimal offline algorithm: Longest Forward Distance
evict page that will be asked for farthest in future.
suppose MIN is better than LFD. Will make NEW, as good, agrees more with LFD.
Let \(\sigma_i\) be first divergence of MIN and LFD (at page fault)
LFD discards \(q\), MIN discards \(p\) (so \(p\) will be accessed before \(q\) after time \(i\))
Let \(t\) be time MIN discards \(q\)
revise schedule so MIN and LFD agree up to \(t\), yielding NEW
NEW discards \(q\) at \(i\), like LFD
so MIN and NEW share \(k-1\) pages. will preserve till merge
in fact, \(q\) is unique page that MIN has that NEW doesn’t
case 1: \(\sigma_i,\ldots,\sigma_t,\ldots,p,\ldots,q\)
until reach \(q\)
let \(e\) be unique page NEW has that MIN doesn’t (init \(e=p\))
when get \(\sigma_l\ne e\), evict same page from both
note \(\sigma_l \ne q\), so MIN does fault when NEW does
both fault, and preserves invariant
when \(\sigma_l=e\), only MIN faults
when get to \(q\), both fault, but NEW evicts \(e\) and converges to MIN.
clearly, NEW no worse than MIN
case 2: \(t\) after \(q\)
follow same approach as above till hit \(q\)
since MIN didn’t discard \(q\) yet, it doesn’t fault at \(q\), but
since \(p\) requested before \(q\), had \(\sigma_l=e\) at least once, so MIN did worse than NEW. (MIN doesn’t have \(p\) till faults)
so, fault for NEW already paid for
still same.
prove that can get to LFD without getting worse.
so LFD is optimal.
Studying heuristics for reorganizing a list after you access it
a natural heuristic: move accessed item towards front
maybe all the way to front?
is it a good heuristic?
compare to omniscient algorithms that can move accessed item any amount towards front (or not at all)
so e.g. won’t move item to front if not accessed again soon
Potential function: number of inversions.
amortized cost: real plus potential
summing telescopes potential and leaves real cost
suppose search for item \(x_j\) at \(j\) in opt, at \(k\) in MTF
suppose \(v\) items precede in MTF but not OPT
then \(k-v-1\) precede in BOTH
so \(k-v-1 \le j-1\) so \(k-v \le j\)
MTF creates \(k-v-1\) new inversions and kills \(v\) old ones,
so amortized cost is \(k+(k-v-1)-v \le 2(k-v)\le 2j\)
now do opt’s move.
moving \(x_j\) towards front only decreases inversions (since \(x_j\) already at front in MTF)
so cannot increase potential
so doesn’t increase amortized cost of access
Strengthen
we can allow adversary to make arbitrary transposes
if we charge the adversary 1 for each transpose
since then any increase in potential (and thus our amortized cost) is upper bounded by the adversary’s cost
so we remain 2-competitive vs. algorithm that can move accessed item anywhere and then do any transposes it likes
Lower bound:
suppose \(n\) items in list
nasty algorithm: always request last in list
generates a sequence of length \(m\)
total cost \(mn\)
consider alg that picks random order and doesn’t change
expected access time for each item is \((n-1)/2\)
expected total cost \(m(n-1)/2\)
some algorithm achieves that cost
offline alg can use that one
ratio \(2n/(n-1) \rightarrow 2\)
Note: our competitive bound assumes opt is paying 1 for transposes
if opt can rearrange anything it sees, can’t beat \(\Omega(n/\log n)\) competitive
“Order by Next Request” heuristic—rearrange everything you pass
Munro 2000
achieves cost equal to entropy
so consider a uniform random sequence
entropy is \(O(\log n)\)
an online algorithm will pay \(n/2\) in expectation every time.
so \(\Omega(n/\log n)\) competitive
2011 End lecture 20
An online algorithm is a two-player zero sum game between algorithm and adversary. Well known that optimal strategies require randomization.
A randomized online algorithm is a probability distribution over deterministic online algorithms.
idea: if adversary doesn’t know what you are doing, can’t mess you up.
idea: can’t see adversary’s “traps”, but have certain probability of wiggling out of them.
in practice, don’t randomly pick 1 det algorithm at start. Instead, make random choices on the way. But retrospectively, gives 1 deterministic algorithm.
Algorithm is \(k\)-competitive if for any \(\sigma\), \(E[C_A(\sigma)] \le k\cdot OPT+O(1)\).
Adversaries:
oblivous: knows probability distribution but not coin tosses. Might as well pick input in advance.
fully adaptive: knows all coin tosses. So algorithm is deterministic for it.
adaptive: knows coin tosses up to present—picks sequence based on what did.
clearly adaptive stronger than oblivious.
oblivious adversary plausible in many cases (eg paging)
problematic if online behavior affects nature (eg, paging an alg that changes behavior if it sees itself thrashing)
our lower bound applies to adaptive
but in other cases, can do better against adaptive
we’ll show much better vs. oblivous
Idea: evict random page?
\(k\)-competitive against adaptive adversary
but uses no memory
trading space for randomness
Marking algorithm:
initially, all pages marked (technicality)
on fault, if all marked, unmark all
evict random unmarked page
mark new page.
Fiat proved: Marking is \(O(\log k)\) competitive for \(k\) pages.
Phases:
first starts on first fault
ends when get \(k+1^{st}\) distinct page request.
so a phase has \(k\) distinct pages
cost of \(M\) is cost of phases
note: defined by input, independent of coin tosses by \(M\)
but, marking tracks:
by induction, unmark iff at end of phase
by induction, all pages requested in phase stay marked till end of phase
so, pay for page (if at all) only on first request in phase.
by induction, at end of phase memory contains the \(k\) pages requested during the phase.
Analysis:
ignore all but first request to a page (doesn’t affect \(M\), helps offline)
compare phase-by-phase cost
phase \(i\) starts with \(S_i\) (ends with \(S_{i+1}\))
request clean if not in \(S_i\). \(M\) must fault, but show offline pays too
request stale if in \(S_i\). \(M\) faults if evicted during phase. Show unlikely.
Online cost:
Expected cost of stale request:
suppose had \(s\) stale and \(c\) clean requests so far.
at this point, we have \(s+c\) pages marked in memory \(\Rightarrow\) all the original locations of the \(s\) stale requests are taken (possibly some by clean requests that bumped out stale pages) and additional \(c\) out of these marked pages (some of them can be stale) needed to bump out the remaining \(k-s\) pages.
what prob current request was evicted? By symmetry, \(c/(k-s)\)
this is expected cost of stale request.
Cost of phase.
Suppose has \(c_i\) clean requests, \(k-c_i\) stale.
Pay \(c_i\) for clean.
for stale requests, pay at most \[c_i(\frac1k+\frac1{k-1}+\cdots+\frac{1}{c_i+1}) =c_i(H_k-H_{c_i})\]
so total at most \(c_i\log k\)
Offline cost:
potential function \(\Phi_i=\) difference between \(M\) and \(O\) (offline) at start of phase \(i\).
got \(c_i\) clean requests, not in \(M\)’s memory. So at least \(c_i-\Phi_i\) not in \(O\)’s memory.
at end of round, \(M\) has all \(k\) most recent requests. So \(O\) is missing \(\Phi_{i+1}\) of \(k\) this round’s requests. Must have evicted (thus paid for) them.
so, \(C_i(O) \ge \max(c_i-\Phi_i,\Phi_{i+1}) \ge \frac12(c_i+\Phi_i-\Phi_{i+1})\).
sum over all phases; telescopes. Deduce \(C_i \ge \frac12 \sum c_i\).
Summary: If online pays \(x\log k\), offline pays \(x/2\). So, \((\log k)\)-competitive.
Online algorithms is a two player game:
online wants to optimize ratio, adversary want it to be bad
online chooses random alg, adversary chooses random input
leads to payoff matrix—expected value of game
number in matrix is ratio for alg on that input
in general, optimal strategy of both sides is randomized
Von Neumann proved equality of minimax and maximins
notice: player who picks strategy second can use deterministic choice
note when one player’s strategy known, other player can play deterministically to meet optimum.
above, assumed adversary knew online’s strategy, so he played deterministically
for lower bound, we let adversary have randomized strategy, look for best deterministic counter!
If give random input for which no deterministic alg does well, we get a lower bound.
Formalize:
say online \(A\) is \(k\)-competitive against an input distribution \(p_\sigma\) if \(E_\sigma(C_A(\sigma)) \le cE_\sigma(C_{OPT}(\sigma))\) (note: OPT gets to see sequence before going)
Theorem: if for some distribution no deterministic alg is \(k\)-competitive, than no randomized algorithm is \(k\)-competitive.
to prove, suppose have \(k\)-competitive randomized alg, show det \(k\)-competitive against any \(\sigma\).
consider payoff \(E_{A} [C_A(\sigma) - kC_{OPT}(\sigma)]\)
by assumption, some dist on \(A\) achieves non-positive payoff.
remains true even if choose best possible randomized strategy on \(\sigma\)
once do so, have deterministic counter \(A\)
so for any \(p_\sigma\) on \(\sigma\), some \(A\) such \(E_{\sigma}[C_A(\sigma)-kC_{OPT}(\sigma)] \le 0\)
in other words, \(A\) is \(k\)-competitive against \(p_\sigma\).
For paging:
set of \(k+1\) pages
uniform random sequence of requests
any deterministic (or randomized!) algorithm has an expected \(1/{k}\) fault per request. So cost \(n/k\) if seq length \(n\)
what is offline cost? on fault, look ahead to page that is farthest in future.
phase ends when all \(k+1\) pages requested
offline faults once per phase
how long is a phase? coupon collection. \(\Omega(k\log k)\).
intuitively, number of faults is \(n/k\log k\)
formally, use “renewal theory” that works because phase lengths are i.i.d.
deduce expected faults \(n/k\log k\), while online is \(n/k\)
\(\log k\) gap, so online not \(\log k\)-competitive.
This analysis applies to an oblivious adversary. What about other models?
a fully adaptive adversary knows all your coin flips, so for them you are deterministic.
it’s known that if you can be \(c\)-competitive versus a partially adaptive adversary and \(d<c\)-competitive against an oblivious one then you can be \(dc\)-competitive deterministically.
so, since we know \(d=O(\log k)\) for paging, we conclude you can’t beat \(k/\log k\) against a partially adaptive adversary. Note sure if that has been achieved.
Manasse et al. 1988.
Definition:
metric space with \(k\) servers on points
request is a point in space
must move a server, cost is distance.
eg taxi company
paging is special case: all distances \(1\), servers on “memory pages”
also multi-head disks
compute offline by dynamic program or reduction to min cost flow
Greedy doesn’t work:
2 servers, 1 far away, other flips between 2 points.
need an algorithm that moves a far away server sometimes in case a certain region is popular
Fancy algorithmics:
HARMONIC: randomized, move with probability inversely proportional to distance from goal: \(O(k^k)\) [Fiat, Rabani, Ravid]
WORK FUNCTION: track what offline algorithms would have done (computationally very expensive) and then do your best to move into a similar configuration.
define \(W_i(X)\) to be opt cost of serving first \(i\) requests and ending with servers in state \(X\)
choose configuration \(X_i\) to optimize \(\min_X W_i(X)+d(X_{i-1},X)\)
you do this when you are already in state \(X_{i-1}\) that you picked in the previous request
in 2001, work-function was proven 2k-competitive using a black magic potential function
conjectured \(k\)-competitive.
exponential time to compute a move!
questions remain on finding an algorithm that does little work per input.
A truly “On Line” algorithm.
greedy algorithm bad if requests alternate \(a\) near \(b\) but server on distant \(c\).
Intuition
What is optimum action right now if you know the future?
Move one of nearest items
Since moving anything else will eventually cross a nearest
so might as well move nearest instead, then replace it later
Problem: which of two nearest to move?
double coverage algorithm (DC):
If request outside convex hull, move nearest point to it.
else, move nearest point on each side towards it equal distance till one hits.
\(k\)-competitive.
let \(M\) be min-cost matching of opt points to DC points
let \(d\) is distance between servers in DC
\(\Phi=kM+\sum_{i < j}d(s_i,s_j)\)
analyze change when (first) adversary moves, (then) DC moves
show:
if adversary moves \(d\), increases \(\Phi\) by \(\le kd\)
if DC moves moves \(d\), decreases \(\Phi\) by \(d\)
deduce: DC is \(k\)-competitive
assume start in same configuration at starting node
then \(\Phi=0\)
\(\Phi\) always positive
if DC moves more than \(k\) times OPT, get \(\delta\Phi\) negative
but can’t have negative \(\Phi\)
if start in different configuration
adds a constant (dependent on metric space) to \(\Phi\)
so asymptotically \(k\)-competitive.
Analysis:
adv moves \(d\) just increases \(M\) by \(d\), so \(\Delta \Phi \le kd\)
Consider DC move to outside hull,
adversary already has a point at destination;
WLOG moving point matched to it (else matches something else; uncross).
so \(\Delta M=-d\)
\(\Delta \Sigma = (k-1)d\).
claim follows: \(\Delta \phi = -kd+(k-1)d = -d\)
consider DC move to inside hull
Consider \(M\)
one of moving points is matched to request.
So that move decreases \(M\).
Other move may increase \(M\) at most by same amount,
so no change to \(M\).
Now consider \(\Sigma\).
Moves of two points cancel out with respect to other points
but they get \(2d\) units closer.
Generalizes to trees.
Which yields randomized \(O(\log^3 n \log^2 k)\) for general \(k\)-server via tree embeddings.
Can we do better? No, because paging is special case.