\documentclass{article} \usepackage{me} \setlength{\parindent}{0pt}
This material takes 1:15

Heaps

Shortest path/MST motivation. Discuss Prim/Dijkstra algorithm.

Note: lots more decrease-key than delete.

Response: balancing

• trade off costs of operations
• making different parts equal time.

$d$-heaps:

• $m\log_d n + nd\log_d n$.
• set $d=m/n$
• $O(m\log_{m/n} n)$
• already nice, gives $O(n^2)$ (matching trivial alg) on complete graph.

Fibonacci Heaps

Fredman-Tarjan, JACM 34(3) 1987.

 http://www.acm.org/pubs/citations/journals/jacm/1987-34-3/p596-fredman/

Key principles:

• Lazy: don't work till you must
• If you must work, use your work to “simplify” data structure too
• force user (adversary) to spend lots of time to make you work
• analysis via potential function measuring “complexity” of structure.
• user has to do lots of insertions to raise potential,
• so you can spread cost of complex ops over many insertions.
• potential says how much work adversary has done that you haven't had to deal with yet.
• You can charge your work against that work.
• another perspective: procrastinate. if you don't do the work, might never need to.
• Why the name? Wait and see.

Lazy approach:

• During insertion, do minimum, i.e. nothing.
• Indeed, for only insert, don't even need to remember item!
• but we also want delete min, so save items
• For first delete-min, cost is $n$
• So, amortized cost 1.

Problem?

• issue with second and further delete mins
• $n$ delete mins cost $n^2$---means amortized $n$

Solution: Use your work to simplify

• As do comparisons, remember outcomes
• point from loser to winner
• creates “heap ordered tree” (HOT)
• not full or balanced, but heap ordered
• in fact, this tree will be a caterpillar: one spine with legs
• but we can consider general trees with this property
• remembers everything we figured out while finding min

Benefit: on delete-min, new min is child of current root

• find by scanning children
• but if we build a star, no benefit over brute force
• can we force few children?

Can we be lazy and simultaneously prepare for future laziness?

• Problem is if we start with min and it beats everything
• it plays too often
• how can we ensure nobody plays too often?
• Tournament
• pairs compete
• winners pair to play each other
• winners of winners pair
• etc.
• Still $n-1$ comparisons
• But now each node plays only $\log n$ times
• What is resulting HOT tree structure?
• binomial trees
x  x    x  x    x  x    x  x

x       x       x       x
\       \       \       \
\       \       \       \
x       x       x       x

x---             x---
\  \             \  \
\  -             \  -
x  x             x  x
\                \
\                \
x                 x

x---------
\  \     \
\  -     ------
x  x          x---
\          \  \
\          \  -
x           x  x
\
\
x

• degree $d$ has $2^d$ descendants
• so max degree $\log n$

Generalize

• As we interleave inserts and deletes, can't run a tournament from clean start
• but can achieve same result
• So max degree $O(\log n)$ for $n$ items
• how?
• record each node's degree
• i.e., its round in the tournament so far
• only link HOTs of same degree
• same “union by rank” idea as union find

induction: if only link heaps of same degree, than any degree-$d$ heap has $2^{d}$ nodes.

• creates “binomial trees” saw above
• “Binomial heaps” do this aggressively---when delete items, split up trees to preserve exact tree shapes.
• but we'll be sloppier/faster

Consolidation algorithm for list of arbitrary HOTS

• create buckets
• bucket HOTs by degree (maintain degree in node)
• start at smallest bucket
• link pairs till $< 2$ left.
• next bucket.
• at end, collect items from buckets and discard buckets

Combine with lazy insert

• on insert, just add new (degree 0) HOT to set of HOTs
• on delete min, output min and consolidate HOTs
• Formalize notion that scan through inserted items is “paid for” by consolidation

Heap ordered trees implementation

• definition
• represent using left-child, parent, and sibling pointers (both directions)
• keep double linked list of HOTs
• and a pointer to min HOT root.
• so in constant time, can find min
• in constant time, can add item
• in contant time, can link two HOT lists (Fibonacci heaps are mergeable in constant time)
• time to delete-min equal to number of roots, and simplifies struct.

Insert solution idea: adversary has to do many insertions to make consolidation expensive.

• analysis: potential function $\phi$ equal to number of roots.
• amortized$_i=$ real$_i+\phi_i-\phi_{i-1}$
• then $\sum a_i=\sum r_i+\phi_n-\phi_0$
• upper bounds real cost if $\phi_n \ge \phi_0$.
• sufficient that $\phi_n \ge 0$ and $\phi_0$ fixed
• insertion real cost 1, potential cost 1. total 2.
• deletion: take $r$ roots and add $c$ children, then do $r+c$ scan work.
• $r$ roots at start, $\log n$ roots at end. So, $r-\log n$ potential decrease
• so, total work $O(c+\log n)=O(\log n)$
Result: $O(1)$ amortized insert, $O(\log n)$ amortized delete

• basically easy: cut off node from parent, make root.
• constant time decrease-key
• what goes wrong?
• may violate exponential-in-degree property

“saving private ryan”

• fix: if a node loses more than one child, cut it from parent, make it a root (adds 1 to root potential---ok).
• implement using “mark bit” in node if has lost 1 child (clear when becomes root)
• may cause “cascading cut” until reach unmarked node
• why 2 children? We'll see.

Analysis: must show

• tree size is exponential in degree

Second potential function: number of mark bits.

• if cascading cut hits $r$ nodes, clears $r$ mark bits
• adds 1 mark bit where stops
• amortized cost: $O(1)$ per decrease key
• note: if cut without marking, couldn't pay for cascade!
• this is binomial heaps approach. may do same $O(\log n)$ consolidation and cutting over and over.
• Wait, problem
• new root per cut
• adds to first potential function
• and thus to amortized cost
• fix: double new potential function.
• use one unit to pay for cut, one to pay for increase in 1st potential
• so, doesn't harm first potential function analysis

Analysis of tree size:

• node $x$. consider current children in order were added (indexed from 1, not 0).
• claim: $i^{th}$ remaining child (in addition order) has degree at least $i-2$
• proof:
• Let $y$ be $i^{th}$ added child
• When added, the $i-1$ items preceding it in the add-order were already there
• i.e., $x$ had degree $\ge i-1$
• So $i^{th}$ child $y$ had (same) degree $\ge i-1$
• $y$ could lose only 1 child before getting cut
• let $S_k$ be minimum number of descendants (inc self) of degree $k$ node. Deduce $S_0 =1$, $S_1=2$, and \begin{align*} S_k &\ge \sum_{i=0}^{k-2} S_i \end{align*}
• to upper bound, solve equality \begin{align*} S_k &= \sum_{i=0}^{k-2} S_i\\ S_k - S_{k-1} &= S_{k-2} \end{align*}
• deduce $S_k \ge F_{k+2}$ fibonacci numbers
• reason for name
• we know $F_k \ge \phi^k$
• so biggest possible $k=\log_\phi n$

Practical?

• non-amortized versions with same bounds exist (good for realtime apps).
• ie fib heaps reduces comparisons on moderate sized problems
• we've done experiments with graph problems where fib wins
• but, regular heaps are in an array
• fib heaps use lots of pointer manipulations
• lose locality of reference, mess up cache.
• work on “implicit heaps” that don't use pointers

Minimum Spanning Tree

minimum spanning tree (and shortest path) easy in $O(m+n\log n)$.

More sophisticated MST:

• why $n\log n$? Because deleting from size-$n$ heap
• idea: keep heap small to reduce cost.
• choose a parameter $k$
• run prim till region has $k$ neighbors (i.e. heap members)
• set aside and start over elsewhere.
• heap size bounded by $k$, delete by $\log k$
• “contract” regions (a la Kruskal) and start over.

Formal:

• phase starts with $t$ vertices.
• pick max region size $k$ (TBD)
• unmark all vertices and repeat following
• choose unmarked vertex
• Prim until attach to marked vertex or heap reaches size $k$
• ie $k$ edges incident on current region
• mark all vertices in region
• contract graph in $O(m)$ time and repeat

Analysis:

• time for phase: $m$ decrease keys, $t$ delete-mins from size-$k$ heaps, so $O(m+t\log k)=O(m)$.
• balance:
• have to spend $m$
• so can make $t\log k = m$ “for free” (no asymptotic change)
• set $k=2^{m/t}$.
• number of phases:
• At end of phase, each compressed vertex “owns” $k$ edges (one or both endpoints)
• so next number of vertices $t' \le m/k$
• so $k' = 2^{m/t'} \ge 2^k$
• when reach $k=n$, done (last pass)
• what is starting $k$?
• Initially $t=n$
• so need $n\log k = O(m)$ for $O(m)$-time phase
• so can take $k=m/n$
• number of phases: $\beta(m,n) = \min\set{i \mid \log^{(i)} n \le m/n} \le \log^* n$.

Remarks:

• subsequently improved to $O(m\log\beta(m,n))$ using edge packets
• chazelle improved to $O(m\alpha(n)\log\alpha(n))$ using “error-prone heaps”
• ramachandran gave optimal algorithm (runtime not clear)
• randomization gives linear.
• fails for Dijkstra.
• Why? Because cannot contract.