6.854 Notes #1

Shortest path/MST motivation. Discuss Prim/Dijkstra algorithm.

Note: lots more decrease-key than delete.

Response: *balancing*

trade off costs of operations

making different parts equal time.

\(d\)-heaps:

\(m\log_d n + nd\log_d n\).

set \(d=m/n\)

\(O(m\log_{m/n} n)\)

already nice, gives \(O(n^2)\) (matching trivial alg) on complete graph.

Fredman-Tarjan, JACM 34(3) 1987.

`http://dl.acm.org/citation.cfm?id=28874`

Key principles:

Lazy: don’t work till you must

If you must work, use your work to “simplify” data structure too

force user (adversary) to spend lots of time to make you work

analysis via potential function measuring “complexity” of structure.

user has to do lots of insertions to raise potential,

so you can spread cost of complex ops over many insertions.

potential says how much work

*adversary*has done that you haven’t had to deal with yet.You can charge your work against that work.

another perspective: procrastinate. if you don’t do the work, might never need to.

Why the name? Wait and see.

Lazy approach:

During insertion, do minimum, i.e. nothing.

Indeed, for only insert, don’t even need to remember item!

but we also want delete min, so save items

For first delete-min, cost is \(n\)

So, amortized cost 1.

Problem?

issue with second and further delete mins

\(n\) delete mins cost \(n^2\)—means amortized \(n\)

Solution: Use your work to simplify

As do comparisons, remember outcomes

point from loser to winner

creates “heap ordered tree” (HOT)

not full or balanced, but heap ordered

in fact, this tree will be a

**caterpillar**: one spine with legsbut we can consider general trees with this property

remembers everything we figured out while finding min

Benefit: on delete-min, new min is child of current root

find by scanning children

but if we build a star, no benefit over brute force

can we force few children?

Can we be lazy and simultaneously prepare for future laziness?

Problem is if we start with min and it beats everything

it plays too often

how can we ensure nobody plays too often?

Tournament

pairs compete

winners pair to play each other

winners of winners pair

etc.

Still \(n-1\) comparisons

But now each node plays only \(\log n\) times

What is resulting HOT tree structure?

*binomial trees*`x x x x x x x x x x x x \ \ \ \ \ \ \ \ x x x x x--- x--- \ \ \ \ \ - \ - x x x x \ \ \ \ x x x--------- \ \ \ \ - ------ x x x--- \ \ \ \ \ - x x x \ \ x`

degree \(d\) has \(2^d\) descendants

so max degree \(\log n\)

Generalize

As we interleave inserts and deletes, can’t run a tournament from clean start

but can achieve same result

So max degree \(O(\log n)\) for \(n\) items

how?

record each node’s degree

i.e., its round in the tournament so far

only link HOTs of same degree

same “union by rank” idea as union find

induction: if only link heaps of same degree, than any degree-\(d\) heap has \(2^{d}\) nodes.

creates “binomial trees” saw above

“Binomial heaps” do this aggressively—when delete items, split up trees to preserve exact tree shapes.

but we’ll be sloppier/faster

Consolidation algorithm for list of arbitrary HOTS

create buckets

bucket HOTs by degree (maintain degree in node)

start at smallest bucket

link pairs till \(<2\) left.

next bucket.

at end, collect items from buckets and discard buckets

Combine with lazy insert

keep pointer to min root (or just scan on delete min)

on insert, just add new (degree 0) HOT to set of HOTs

on delete min, output min and consolidate HOTs

also need to add children of min as new HOTs

Formalize notion that scan through inserted items is “paid for” by consolidation

Heap ordered trees implementation

definition

represent using left-child, parent, and sibling pointers (both directions)

keep double linked list of HOTs

and a pointer to min HOT root.

so in constant time, can find min

in constant time, can add item

in contant time, can link two HOT lists (Fibonacci heaps are

*mergeable*in constant time)time to delete-min equal to number of roots, and simplifies struct.

Insert solution idea: adversary has to do many insertions to make consolidation expensive.

analysis:

**potential function**\(\phi\) equal to number of roots.amortized\(_i=\) real\(_i+\phi_i-\phi_{i-1}\)

then \(\sum a_i=\sum r_i+\phi_n-\phi_0\)

upper bounds real cost if \(\phi_n \ge \phi_0\).

sufficient that \(\phi_n \ge 0\) and \(\phi_0\) fixed

insertion real cost 1, potential cost 1. total 2.

deletion: take \(r\) roots and add \(c\) children, then do \(r+c\) scan work.

\(r\) roots at start, \(\log n\) roots at end. So, \(r-\log n\) potential decrease

so, total work \(O(c+\log n)=O(\log n)\)

Result: \(O(1)\) amortized insert, \(O(\log n)\) amortized delete

What about decrease-key?

basically easy: cut off node from parent, make root.

constant time decrease-key

what goes wrong?

may violate exponential-in-degree property

“saving private ryan”

fix: if a node loses more than one child, cut it from parent, make it a root (adds 1 to root potential—ok).

implement using “mark bit” in node if has lost 1 child (clear when becomes root)

may cause “cascading cut” until reach unmarked node

why 2 children? We’ll see.

Analysis: must show

cascading cuts “free”

tree size is exponential in degree

Second potential function: number of mark bits.

if cascading cut hits \(r\) nodes, clears \(r\) mark bits

adds 1 mark bit where stops

amortized cost: \(O(1)\) per decrease key

note: if cut without marking, couldn’t pay for cascade!

this is binomial heaps approach. may do same \(O(\log n)\) consolidation and cutting over and over.

Wait, problem

new root per cut

adds to first potential function

and thus to amortized cost

fix: double new potential function.

use one unit to pay for cut, one to pay for increase in 1st potential

so, doesn’t harm first potential function analysis

Analysis of tree size:

node \(x\). consider

*current*children in order were added (indexed from 1, not 0).claim: \(i^{th}\) remaining child (in addition order) has degree at least \(i-2\)

proof:

Let \(y\) be \(i^{th}\) added child

When added, the \(i-1\) items preceding it in the add-order were already there

i.e., \(x\) had degree \(\ge i-1\)

So \(i^{th}\) child \(y\) had (same) degree \(\ge i-1\)

\(y\) could lose only 1 child before getting cut

let \(S_k\) be minimum number of descendants (inc self) of degree \(k\) node. Deduce \(S_0 =1\), \(S_1=2\), and \[\begin{aligned} S_k &\ge \sum_{i=0}^{k-2} S_i\end{aligned}\]

to upper bound, solve equality \[\begin{aligned} S_k &= \sum_{i=0}^{k-2} S_i\\ S_k - S_{k-1} &= S_{k-2}\end{aligned}\]

deduce \(S_k \ge F_{k+2}\) fibonacci numbers

reason for name

we know \(F_k \ge \phi^k\)

so biggest possible \(k=\log_\phi n\)

Practical?

non-amortized versions with same bounds exist (good for realtime apps).

Constants not that bad

ie fib heaps reduces comparisons on moderate sized problems

we’ve done experiments with graph problems where fib wins

but, regular heaps are in an array

fib heaps use lots of pointer manipulations

lose locality of reference, mess up cache.

work on “implicit heaps” that don’t use pointers

minimum spanning tree (and shortest path) easy in \(O(m+n\log n)\).

More sophisticated MST:

why \(n\log n\)? Because deleting from size-\(n\) heap

idea: keep heap small to reduce cost.

choose a parameter \(k\)

run prim till region has \(k\) neighbors (i.e. heap members)

set aside and start over elsewhere.

heap size bounded by \(k\), delete by \(\log k\)

“contract” regions (a la Kruskal) and start over.

Formal:

phase starts with \(t\) vertices.

pick max region size \(k\) (TBD)

unmark all vertices and repeat following

choose unmarked vertex

Prim until attach to marked vertex or heap reaches size \(k\)

ie \(k\) edges incident on current region

mark all vertices in region

contract graph in \(O(m)\) time and repeat

Analysis:

time for phase: \(m\) decrease keys, \(t\) delete-mins from size-\(k\) heaps, so \(O(m+t\log k)=O(m)\).

balance:

have to spend \(m\)

so can make \(t\log k = m\) “for free” (no asymptotic change)

set \(k=2^{m/t}\).

number of phases:

At end of phase, each compressed vertex “owns” \(k\) edges (one or both endpoints)

so next number of vertices \(t' \le m/k\)

so \(k' = 2^{m/t'} \ge 2^k\)

when reach \(k=n\), done (last pass)

what is starting \(k\)?

Initially \(t=n\)

so need \(n\log k = O(m)\) for \(O(m)\)-time phase

so can take \(k=2^{m/n}\)

(note if \(m>n\log n\) then \(k>n\) and one phase suffices to finish)

number of phases: \(\beta(m,n) = \min\set{i \mid \log^{(i)} n \le m/n} \le \log^* n\).

Remarks:

subsequently improved to \(O(m\log\beta(m,n))\) using edge packets

chazelle improved to \(O(m\alpha(n)\log\alpha(n))\) using “error-prone heaps”

ramachandran gave optimal algorithm (runtime not clear)

randomization gives linear.

fails for Dijkstra.

Why? Because cannot contract.