## Buckets

Explot *indirect addresssing* in RAM model.

review shortest path algorithm.

In shortest paths, often have edge lengths small integers (say max $C$).

Observe heap behavior:

- heap min increasing (monotone property)
- max $C$ distinct values
- (because don't insert $k+C$ until delete $k$).

Idea: lots of things have same value. Keep in buckets.

How to exploit?

- standard heaps of buckets. $O(m\log C)$ (slow) or $O(m+n\log C)$ with Fib (messy).
- Dial's algorithm: $O(m+nC)$.
- in fact, $O(m+D)$ for max distance $D$

space?

- use array of size $C+1$
- wrap around

Can we improve this?

- where are we wasting a lot of effort?
- scanning over big empty regions
- can we arrange to leap whole regions?
- some kind of summary structure?

2-level buckets.

- make blocks of size $b$
- add layer on top: $nC/b$ block summaries
- in each summary, keep count of items in block
- insert updates block and summary: $O(1)$
- ditto decrease key
- delete min scans summaries, then scans one block
- over whole algorithm, $nC/b$ scanning summaries
- also, scan one block per delete min: $b$ work
- over $n$ delete mins, work $nb+nC/b$
- balance, optimize with $b=\sqrt{C}$

- result: SP in $O(m+n\sqrt{C})$
- as before “space trick” to keep array sizes to $C$
- if know $D$ (or binary search for it) make bucket size $\sqrt{D/n}$ and get $O(m+\sqrt{nD})$
- Generalize: 3 tiers?
- blocks, superblocks, and summary
- block size $C^{1/3}$
- runtime $O(m+nC^{1/3})$
- how far can this go? To $m+n$? no, because insert cost rises.

Tries.

- depth $k$ tree over arrays of size $\Delta$
- range $C$ (details of reduction from $nC$ omitted)
- expansion factor $\Delta=(C+1)^{1/k}$ (power of 2 simplifies)
- insert: $O(k)$ (also find, delete-non-min, decrease-key)
- delete-min: $O(k\Delta)=O(kC^{1/k})$ to find next element
- Shortest paths: $O(km+knC^{1/k})$
- Balance: $nC^{1/k}=m$ so $C=(m/n)^k$ so $k=\log(C)/\log(m/n)$
- Runtime: $m\log_{m/n}(C)$
- Space: $k\Delta=kC^{1/k}=km/n\le (m/n)\log C$ using circular array trick.

Problems: space and time

Idea: be lazy! (Denardo and Fox 1979)

- don't push items down trie until you must
- unique block (array) on each level
*active* - keep other stuff piled up in buckets in each level
- keep count of items in (buckets of) each level (not counting below)
- only expand bucket when needed (and update level counts)

Operations:

- Insert
- walk item down tree till stop in (inactive) bucket or bottom
- increment level count
- $O(k)$ work

- Decrease key
- Remove from current bucket
- put in proper bucket
- maybe descend (and revise level counts)
- new bucket must be a prior sibling since monotone
- so bucket exists
- and no higher levels need be touched

- amortization:
- items descend once per touch, but never rise,
- (critically relies on monotone property)
- so $k$ expansion per item
- alternatively, consider “potential” as “total heights of items”
- inserting at top adds $k$ potential
- then all descents are free

- Delete-min
- remove item, update bottom level count
- advance to new min
- rise to first nonempty level
- traverse to first nonempty bucket
- expand till find min

- time:
- pushdowns are accounted for in insert cost
- identifying min happens during pushdowns
- rise to nonempty bucket costs less than pushdowns from it
- so, cost to scan only one block

- cost $C^{1/k}$

- space to $kC^{1/k}$
- Times:
- $O(k)$ insert (charge expansions to insert)
- $O(1)$ decrease key
- $O(C^{1/k})$ delete-min

- paths runtime: $O(m+n(k+C^{1/k}))$, choose $k = 2\log C/\log \log C$: $O(m+n(\log C)/\log\log C)$
- great, even if e.g. $C=2^{32}$
- Further improvement: heap on top (HOT) queues get $O(m+n(\log C)^{1/3})$ time. Cherkassky, Goldberg, and Silverstein. SODA 97.
- Implementation experiments---good model for project

## VEB

Van Emde Boas, “Design and Implementation of an efficient priority queue” Math Syst. Th. 10 (1977)

Thorup, “On RAM priority queues” SODA 1996.

Idea

- idea: in bucket heaps, problem of finding next empty bucket was heap problem. Recurse!
- $b$-bit words
- $\log b$ running times
- thorup paper improves to $\log\log n$
- consequence for sorting.

Algorithm.

- array of buckets but fast lookup on top.
- queue $Q$ on $b$ bits is struct
- $Q.\min$ is current min,
*not*stored recursively - Array $Q.low[]$ of $\sqrt{U}$ queues on low order bits in bucket
- $Q.high$, vEB queue on high order bits of elements
*other than current min in queue*

- $Q.\min$ is current min,
- Insert $x$:
- if $x < Q.\min$, swap
- now insert $x$ in recursive structs
- expand $x=(x_h,x_l)$ high and low half words
- If $Q.low[x_h]$ nonempty, then insert $x_l$ in it
- else, make new queue holding $x_l$ at $Q.low[x_h]$, and insert $x_h$ in $Q.high$
- note two inserts, but one to an empty queue, so constant time

- Delete-min:
- need to replace $Q.\min$
- Look in $Q.high.\min$. if null, queue is empty.
- else, gives first nonempty bucket $x_h$
- Delete min from $Q.low[x_h]$ to finish finding $Q.\min$
- If results in empty queue, Delete-min from $Q.high$ to remove that bucket from consideration
- Note two delete mins, but second only happens when first was constant time.

Problem: space

- need constant time hash table.
- non-trivial complexity theory,
- but can manage with randomization or slight time loss.