Sarnak and Tarjan, "Planar Point Location using persistent trees", Communications of the ACM 29 (1986) 669–679
"Making Data Structures Persistent" by Driscoll, Sarnak, Sleator and Tarjan Journal of Computer and System Sciences 38(1) 1989
Idea: be able to query and/or modify past versions of data structure.
ephemeral: changes to struct destroy all past info
partial persistence: changes to most recent version, query to all past versions
full persistence: queries and changes to all past versions (creates “multiple worlds” situtation)
Goal: general technique that can be applied to any data structure.
Many data structures
consist of fixed-size nodes conencted by pointers
accessed from some root entry point
data structure ops traverse pointers
and modify node fields/pointers
runtime determined by number of traverse/modify steps
Our goal: use simulation to transform any ephemeral data structure into a persistent one
measure of success: time and space cost for each primitive traversal and primitive modification step.
An exposed data structure operation will do some number of primitive steps
we will add persistence to primitive steps, so any data structure can be made persistent
we’ll measure cost of simulation in slowdown and storage
We’ll limit to trees.
But ideas generalize
Key principles:
Lazy: don’t work till you must
If you must work, use your work to “simplify” data structure too
force user (adversary) to spend lots of time to make you work
analysis via potential function measuring “complexity” of structure.
user has to do lots of changes to raise potential,
so you can spread cost of complex ops over many insertions.
potential says how much work adversary has done that you haven’t had to deal with yet.
You can charge your work against that work.
another perspective: procrastinate. if you don’t do the work, might never need to.
Lazy approach:
During change, do minimum, i.e. nothing.
Indeed, for only change, don’t even need to remember change!
but we also want lookups, so record changes
Then apply relevant changes when a lookup happens
So, amortized cost 1.
Problem?
issue with second and further lookups
so, do need to incorporate
Full copy bad.
Fat nodes method:
replace each (single-valued) field of data structure by list of all values taken, sorted by time.
requires \(O(1)\) space per data change (unavoidable if keep old data)
to lookup data field, need to search based on time.
store values in binary tree
checking/changing a data field takes \(O(\log m)\) time after \(m\) updates
multiplicative slowdown of \(O(\log m)\) in structure access.
big number from data structures perspective
Path copying:
can only reach node by traversing pointers starting from root
changes to a node only visible to ancestors in pointer structure
when change a node, copy it and ancestors (back to root of data structure
keep list of roots sorted by update time
\(O(\log m)\) time to find right root (or const, if time is integers) (additive slowdown)
same access time as original structure
additive instead of multiplicative \(O(\log m)\).
modification time and space usage equals number of ancestors: possibly huge!
especially if structure not balanced!
Combined Solution (trees only):
Path copying is too aggressive
how can we be lazy about path copying?
fat nodes bad, but “pleasingly plump” ok.
use fat nodes, but when get too fat, make new path copy
in each node, store 1 extra time-stamped field
if full, overrides one of standard fields for any accesses later than stamped time.
access rule
standard access, just check for overrides while following pointers
constant factor increase in access time.
update rule:
when need to change/copy pointer, use extra field if available.
otherwise, make new copy of node with new info, and recursively modify parent.
Idea: lazy path copying
copy only as much as you must
Analysis idea: adversary has to do many changes to make copying expensive.
analysis: potential function \(\phi\) measuring amount of “deferred work”.
amortized\(_i=\) real\(_i+\phi_i-\phi_{i-1}\)
then \(\sum a_i=\sum r_i+\phi_n-\phi_0\)
upper bounds real cost if \(\phi_n \ge \phi_0\).
sufficient that \(\phi_n \ge 0\) and \(\phi_0\) fixed
Analysis
What represents work you need to do later?
full nodes (that may need copying)
that may still be modified (current values)
live node: pointed at by current root.
potential function: number of full live nodes.
copying a node is free (new copy not full, pays for copy space/time)
pay for filling an extra pointer (do only once, since can stop at that point).
amortized space per update: \(O(1)\).
a computational geometry foreshadow
Different model
algebraic operations are \(O(1)\)
e.g. computing line intersections or lengths
even when these return irrationals!
sweep precision under the rug.
planar subdivision
division of plane into polygons
query: “what polygon contains this point”
define complexity of input?
edges of polygons form \(n\) segments/edges
segments meet only at ends/vertices
numerous special-purpose solutions
One solution: dimension reduction
vertical line through each vertex
divides into slabs
in slab, segments maintain one vertical ordering
find query point slab by binary search
build binary search tree for slab with “above-below” queries
\(n\) binary search trees, size \(O(n^2)\), time \(O(n^2\log n)\)
observation: trees all very similar
think of \(x\) axis as time, slabs as “epochs”
at end of epoch, “delete” segments that end, “insert” those that start.
over all time, only \(n\) inserts, \(n\) deletes.
must be able to query over all times
Persistent sorted sets:
find\((x,s,t)\) find (largest key below) \(x\) in set \(s\) at time \(t\)
insert\((i,s,t)\) insert \(i\) in \(s\) at time \(t\)
delete\((i,s,t)\).
Red black trees
aggressive rebalancers
amortized cost \(O(1)\) to change a field.
store red/black bit in each node
use red/black bit to rebalance.
depth \(O(\log n)\)
Add persistence
updates only in “present”
search: standard binary tree search; \(O(1)\) slowdown for persistence
update: causes changes in red/black fields on path to item, \(O(1)\) rotations.
result: \((\log n)\) space per insert/delete
geometry does \(O(n)\) changes, so \(O(n\log n)\) space.
\(O(\log n)\) query time.
Improvement:
red-black bits used only for updates
only need current version of red-black bits
don’t persist old bit versions: just overwrite
only persistent updates needed are for \(O(1)\) rotations
so \(O(1)\) space per update
so \(O(n)\) space overall.
Result: \(O(n)\) space, \(O(\log n)\) query time for planar point location.
Extensions:
method extends to arbitrary pointer-based structures.
\(O(1)\) cost per update for any pointer-based structure with any constant indegree and constant number of fields.
use one extra pointer per field in node
and one pointer per indegree
full persistence with same bounds.