\documentclass{article} \usepackage{me} \setlength{\parindent}{0pt}
This material takes 1 hour.

## Persistent Data Structures

Sarnak and Tarjan, "Planar Point Location using persistent trees", Communications of the ACM 29 (1986) 669--679

"Making Data Structures Persistent" by Driscoll, Sarnak, Sleator and Tarjan Journal of Computer and System Sciences 38(1) 1989

Idea: be able to query and/or modify past versions of data structure.

• ephemeral: changes to struct destroy all past info
• partial persistence: changes to most recent version, query to all past versions
• full persistence: queries and changes to all past versions (creates “multiple worlds” situtation)

Goal: general technique that can be applied to any data structure.

## Persistent Trees

Many data structures

• consist of fixed-size nodes conencted by pointers
• accessed from some root entry point
• data structure ops traverse pointers
• and modify node fields/pointers
• runtime determined by number of traverse/modify steps

Our goal: use simulation to transform any ephemeral data structure into a persistent one

• measure of success: time and space cost for each primitive traversal and primitive modification step.
• An exposed data structure operation will do some number of primitive steps
• we will add persistence to primitive steps, so any data structure can be made persistent
• we'll measure cost of simulation in slowdown and storage
• We'll limit to trees.
• But ideas generalize

Key principles:

• Lazy: don't work till you must
• If you must work, use your work to “simplify” data structure too
• force user (adversary) to spend lots of time to make you work
• analysis via potential function measuring “complexity” of structure.
• user has to do lots of changes to raise potential,
• so you can spread cost of complex ops over many insertions.
• potential says how much work adversary has done that you haven't had to deal with yet.
• You can charge your work against that work.
• another perspective: procrastinate. if you don't do the work, might never need to.

Lazy approach:

• During change, do minimum, i.e. nothing.
• Indeed, for only change, don't even need to remember change!
• but we also want lookups, so record changes
• Then apply relevant changes when a lookup happens
• So, amortized cost 1.

Problem?

• issue with second and further lookups
• so, do need to incorporate

Fat nodes method:

• replace each (single-valued) field of data structure by list of all values taken, sorted by time.
• requires $O(1)$ space per data change (unavoidable if keep old data)
• to lookup data field, need to search based on time.
• store values in binary tree
• checking/changing a data field takes $O(\log m)$ time after $m$ updates
• multiplicative slowdown of $O(\log m)$ in structure access.
• big number from data structures perspective

Path copying:

• can only reach node by traversing pointers starting from root
• changes to a node only visible to ancestors in pointer structure
• when change a node, copy it and ancestors (back to root of data structure
• keep list of roots sorted by update time
• $O(\log m)$ time to find right root (or const, if time is integers) (additive slowdown)
• same access time as original structure
• additive instead of multiplicative $O(\log m)$.
• modification time and space usage equals number of ancestors: possibly huge!
• especially if structure not balanced!

Combined Solution (trees only):

• Path copying is too aggressive
• how can we be lazy about path copying?
• fat nodes bad, but “pleasingly plump” ok.
• use fat nodes, but when get too fat, make new path copy
• in each node, store 1 extra time-stamped field
• if full, overrides one of standard fields for any accesses later than stamped time.
• access rule
• standard access, just check for overrides while following pointers
• constant factor increase in access time.
• update rule:
• when need to change/copy pointer, use extra field if available.
• otherwise, make new copy of node with new info, and recursively modify parent.
• Idea: lazy path copying
• copy only as much as you must

Analysis idea: adversary has to do many changes to make copying expensive.

• analysis: potential function $\phi$ measuring amount of “deferred work”.
• amortized$_i=$ real$_i+\phi_i-\phi_{i-1}$
• then $\sum a_i=\sum r_i+\phi_n-\phi_0$
• upper bounds real cost if $\phi_n \ge \phi_0$.
• sufficient that $\phi_n \ge 0$ and $\phi_0$ fixed

Analysis

• live node: pointed at by current root.
• potential function: number of full live nodes.
• copying a node is free (new copy not full, pays for copy space/time)
• pay for filling an extra pointer (do only once, since can stop at that point).
• amortized space per update: $O(1)$.

## Intro

Talk about splay tree inserts and deletes

• by splaying target to root
• split and join
• need to define/account for potential as create nodes and merge trees

Make point about all at once

Remind about $O(1)$ persistent pointer changes

## Application: planar point location.

• Different model
• algebraic operations are $O(1)$
• e.g. computing line intersections or lengths
• even when these return irrationals!
• sweep precision under the rug.
• planar subdivision
• division of plane into polygons
• query: “what polygon contains this point”
• define complexity of input?
• edges of polygons form $n$ segments/edges
• segments meet only at ends/vertices
• numerous special-purpose solutions
• One solution: dimension reduction
• vertical line through each vertex
• divides into slabs
• in slab, segments maintain one vertical ordering
• find query point slab by binary search
• build binary search tree for slab with “above-below” queries
• $n$ binary search trees, size $O(n^2)$, time $O(n^2\log n)$
• observation: trees all very similar
• think of $x$ axis as time, slabs as “epochs”
• at end of epoch, “delete” segments that end, “insert” those that start.
• over all time, only $n$ inserts, $n$ deletes.
• must be able to query over all times

Persistent sorted sets:

• find$(x,s,t)$ find (largest key below) $x$ in set $s$ at time $t$
• insert$(i,s,t)$ insert $i$ in $s$ at time $t$
• delete$(i,s,t)$.

## Method: persistent red-black trees

Red black trees

• aggressive rebalancers
• amortized cost $O(1)$ to change a field.
• store red/black bit in each node
• use red/black bit to rebalance.
• depth $O(\log n)$

• search: standard binary tree search; $O(1)$ slowdown for persistence
• update: causes changes in red/black fields on path to item, $O(1)$ rotations.
• result: $(\log n)$ space per insert/delete
• geometry does $O(n)$ changes, so $O(n\log n)$ space.
• $O(\log n)$ query time.

Improvement:

• red-black bits used only for updates
• only need current version of red-black bits
• don't persist old bit versions: just overwrite
• only persistent updates needed are for $O(1)$ rotations
• so $O(1)$ space per update
• so $O(n)$ space overall.

Result: $O(n)$ space, $O(\log n)$ query time for planar point location.

Extensions:

• method extends to arbitrary pointer-based structures.
• $O(1)$ cost per update for any pointer-based structure with any constant indegree and constant number of fields.
• use one extra pointer per field in node
• and one pointer per indegree
• full persistence with same bounds.