6.854 Lecture Notes

\begin{center} Lecture 3: Splay Trees \end{center}

Binary Search Trees

Store a set of $n$ (distinct) keys.
Operations: Search, Insert, and Delete.
In general: Very bad worst-case performance ($\Theta(n)$ time per operation! - as bad as linear search).
Solution: Keep your BST (approx.) balanced.
LOTS of variants: AVL trees, red-black trees, scape-goat trees, 2-3 trees, etc.
Deliver $O(\log n)$ time per operation.
But:
- Require maintaining of extra state + invariants.
- Can get complicated to implement.

Splay Trees

Sleator and Tarjan, “Self Adjusting Binary Search Trees” JACM 32(3) 1985

No need to maintain any extra state.
Extremely elegant and simple -- only one non-trivial operation needs to be implemented.
Delivers $O(\log n)$ time per operation performance too. But only in amortized (as opposed to worst-case) sense!
Is actually optimal in a much more meaningful way (will explain later).

Key design principle: Be lazy! That is, don't work until you must.

Don't try to keep the tree always balanced.
Re-balance only when convenient
$\Rightarrow$ Use the work of searches to re-balance “along the way”.
Need a potential function $\Phi$ to facilitate the analysis.
- $\Phi$ aims to keep track of the amount of work we “saved” by procrastinating.
- Will charge this “saved” work against the “bursts” of work we actually will need to do from time to time. (As we are mostly procrastinating, when we finally have to do some work, it will be a lot of work!).
- Cool feature: our data structure does not need to actually keep track of the potential. This potential is implicitly captured by the complexity/”messiness” of our data structure (resulting from our procrastination).
Two important intuitions: path shortening and re-balancing.

Path shortening:

If executing a Search took too long -- fix it.
$\Rightarrow$ When you go down a long path, make it shorter on your way back!
Intuition: shortening of the path releases potential (= makes the data structure more “orderly”). This pays for long traversal (and for work done to shorten it afterwards).
Note inserting an item makes paths longer -- source of potential for later shortening.
Problem: shortening our search path might makes other paths longer!
Need to make sure that this shortening is helpful overall.

Re-balancing (via Rotations):

Why are paths in a balanced tree short? What is going on when a path is long?
When my search descends to smaller child, happy -- reduced the number of descendant by at least a half.
$\Rightarrow$ Can happen at most $O(\log n)$ times.
But what if we descend to the larger child?
Tree balanced $\Rightarrow$ Larger child has at most, say, $2/3$ of all the descendants.
$\Rightarrow$ The number of descendants reduced by at least one third.
$\Rightarrow$ Can happen at most $O(\log n)$ times, happy again!
So, tree balanced $\Rightarrow$ can divide well (and thus conquer wins us more).
Root of lack of balance/long search paths: “fat” children.
But: Fat children have a lot of potential! (They got fat because of our procrastination.)
We can use their potential to take care of them.
Simple idea: Single-rotate to root (will be cheap to access next time!).
Doesn't work---might pull up a leaf but leave whole fat subtree behind to be descended again and again.
Better idea: Use more sophisticated rotations to halve the depth of all nodes on the search path.
$\Rightarrow$ This draws up the whole fat subtree.
Note: we don't actually destroy fat children here -- we just promote them.
Unfortunate: analysis is black magic. No idea how discovered. (A possible heuristic provided in one of the homework problems.)

Key operation: Splaying a node $x$:

Moves $x$ to root via a sequence of rotations.
Goal: To shorten the original root-$x$ path.
Already know: single rotations don't work---leave original path unchanged.
Instead, use double rotations (see the figures) + possibly one single rotation at the very end.
Ideally: Splaying a path should halve its length.

Figures of rotations

Zig-zag -- same as two single rotations on $x$:

       z                x
      / \              / \
     y   D            y   z
    / \       ====>  / \ / \ 
   A   x            A  B C  D
      / \
     B   C

Zig-zig -- single rotate $y$, then $x$:

       z                x
      / \              / \
     y   D            A   y
    / \       ====>      / \
   x   C                B   z
  / \                      / \   
 A   B                    C   D

The remaining two “mirror” cases (zag-zig and zag-zag) completely analogous.

Good exercise: Contrast zig-zig to two single rotations ($x$ then $x$ again):

       z                x
      / \              / \
     y   D            A   z
    / \       ====>      / \
   x   C                y   D
  / \                  / \   
 A   B                B   C

Why is zig-zig better? Not obvious; perhaps because shifts stuff “farther right”?

Implementing $Search(x)$ operation:

Walk down tree to find item $x$.
Splay $x$.
That's it!

(Will talk about $Insert(x)$ and $Delete(x)$ later.)

Analysis

Assume that each node $x$ has a positive weight $w_x$
- These weights are just for the analysis -- they don't affect the implementation.
- For now: can think of these weights being all equal to $1$. But later different choices of weights will give us very interesting conclusions.
Define size $s(x)$ of a node $x$ is total weight of subtree nodes, i.e. “number of nodes” (if all weights equal to $1$).
Define rank $r(x)=\lg s(x)$.
- Intuitively: $r(x) \approx$ “best height of subtree at $x$”.
- Remember that this is log base $2$ -- will be important to get the constants right.

Our potential function: $\Phi=\sum r(x)$.

Some intuition:

Recall: Having fat subtree deep down is bad---makes us search deep.
$\Rightarrow$ This should be reflected by large potential.
$\Rightarrow$ And it is: because that fat subtree contributes to all ranks above it.
Also: we should make that large potential of fat subtrees vanish to pay for the search that encountered them.
$\Rightarrow$ Want to bring that fat subtree up the tree
(Note: that for a single fat subtree even single rotations would achieve this, but we want to do it for all fat subtrees that we encounter -- and these can be nested (think of a tree being a path). Hence the need for more sophisticated rotations.)

The Access Lemma

Key technical fact: \begin{center} Access Lemma: Amortized time to splay a node $x$ given root $t$ is at most $$3(r(t)-r(x))+1 = O(\log(s(t)/s(x))).$$ (Recall: Amortized cost of an operation $=$ real cost $+ \ \Delta \Phi$. ) \end{center}

Note: $x$ is the root of the tree after the splay operation and the rank of the root is always the same $\log W$, where $W=\sum_x w_x$ is the total weight of the nodes.

$\Rightarrow$ So, rank $r'(x)$ after the splay is equal to $r(t)$.

$\Rightarrow$ $r(t)-r(x)=r'(x)-r(x)$ -- the amortized cost of a splay is within a constant of its change in rank.

Intuition:

If all nodes are in “right place” then $r(t)-r(x)$ is the “ideal” depth of $x$.
If $x$ has high rank (i.e., is fat), we expect it to be high in the tree.
If this is not the case, i.e., if $x$ is deep in the tree, then:
- $x$ has to contribute a lot to the potential.
  $\Rightarrow$ $\Phi$ is large.
- Splaying $x$ will bring it to the root.
  $\Rightarrow$ The large amount of potential released as a result will make the amortized cost be still small. (That is, real cost will be large but $\Delta \Phi$ will be very negative and thus account for this overhead.)

One immediate consequence of Access Lemma:

Let all weights $w_x$ be equal to $1$.
$\Rightarrow$ Total weight $W$ is equal to $n$.
$\Rightarrow$ Rank of the root is $\log W$ and rank of every node is at least $1$.
By Access Lemma, the amortized time to (find and) splay a node $x$ is at most \[ O(\log W/w_x) = O(\log n). \]

Important detail: To make the above analysis work also for unsuccessful Search(x) operations, we need to splay the last node we visited in that search.

Proof of the Access Lemma

Will analyze one double rotation at a time.
Specifically: if $s'$ and $r'$ are the sizes and ranks after that rotation then the amortized cost of it is at most $3(r'(x)-r(x))$.
Summing up across all rotations during a splay and telescoping them will give the lemma. (Remember: $r(t)$ is equal to the final rank of $x$ after the splay.) Also, $+1$ corresponds to the extra single rotation that might be needed at the end.
{Will analyze only zig-zig (harder case) -- zig-zag is in the paper.}

r r'

z x / \ / \ y D A y / \ ====> / \ x C B z / \ / \ A B C D

Observe:

Only nodes $x,y,z$ change ranks.
Real cost equal to $2$.
$\Delta \Phi = r'(x)+r'(y)+r'(z)-r(x)-r(y)-r(z)$.
Rank of $x$ can only increase and the rank of $z$ can only decrease. I.e., $r'(x)\geq r(x)$ and $r'(z)\leq r(z)$.

Intuition:

Assume that $r'(y)\approx r(x)$ (this is not always the case).
If $x$ rank increases a lot, then $r'(x)-r(x)$ very large.
$\Rightarrow$ $3(r'(x)-r(x)) >> (r'(x)-r(x))+2$, i.e. tripled potential in our desired bound “absorbs” real cost into the change of potential.
Trouble: if rank of $x$ unchanged (recall: it can't decrease).
But that means $x$ was fat to begin with (subtree $A$ had to have most weight of the nodes).
$\Rightarrow$ Bringing $x$ up releases a lot of potential as the ranks of $y$, $z$ decrease a lot (since they no longer contain $x$).
$\Rightarrow$ That decrease in potential “cancels” real rotation cost.

Actual Math (requires more care):

The amortized cost of the operation: \[ 2+\Delta \Phi = 2+r'(x)+r'(y) + r'(z)-r(x)-r(y)-r(z). \]
Let us simplify things first: $$ \begin{align*} 2+r'(x)+r'(y) +& r'(z)-r(x)-r(y)-r(z)\\ &= 2+r'(y)+r'(z)-r(x)-r(y)& \text{ since $r'(x)=r(z)$}\\ &\le 2+r'(x)+r'(z)-r(x)-r(x)\\ &=2 +(r'(z)-r(x))+ (r'(x)-r(x)) \end{align*} $$
The last term is already a part of the bound we want to establish
So just need to show that $2+r'(z)-r(x) \le 2(r'(x)-r(x))$, i.e., that $r'(z)+r(x)-2r'(x) \le -2$.
Now, note that $r'(z)-r'(x) + r(x)-r'(x) = \lg s'(z)/s'(x) + \lg s(x)/s'(x)$
Also: $s'(z) + s(x) \le s'(x)$ since the subtree of $z$ after rotations and the subtree of $x$ before rotations are disjoint, and $x$ is the root at the end.
$\Rightarrow$ We have that \[ r'(z)+r(x)-2r'(x) =\lg s'(z)/s'(x) + \lg s(x)/s'(x)\le \log a + \log(1-a)=\log a(1-a) \] for some $0 < a < 1$.
$\Rightarrow$ This function is concave and thus maximized for $a=\frac{1}{2}$.
$\Rightarrow$ yields $r'(z)+r(x)-2r'(x) \leq -2$ as desired. (recall we are using $\log_2$)

Bounding Overall Change of Potential

Important Caveat: The Access Lemma bounds only the amortized cost of each operation.

$\Rightarrow$ If there is a sequence of $m$ operations, its total real cost is at most \[ O(m \log n) - \left(\Phi_m - \Phi_0\right)=O(m \log n) + \Phi_0 - \Phi_m, \] where $\Phi_m$ is the potential of our data structure at the end of processing the whole sequence and $\Phi_0$ is its initial potential.

Key question: How large can $\Phi_0-\Phi_m$ be?

Observe: $\Phi_0 \leq n\log W$ (corresponds to “everyone being the root of the whole tree”.)
Also, $\Phi_m \geq \sum_x \log w_x$ (corresponds to the fact that each node has itself in its subtree).
As a result: $\Phi_0-\Phi_m\leq n\log W - \sum_x w_x = \sum_x \log W/w_i$.

Note:

Even if we do only one operation we incur this large (but fixed) additive cost.
This additive cost has a natural interpretation though: it's the (upper bound on the) cost of splaying each item once. (Recall: after splaying $x$ is the root, so by Access Lemma the total amortized cost is $O(r'(x)-r(x))=O(\log W/w_i)$.)
$\Rightarrow$ Our amortized cost bound is valid if each item is splayed/requested at least once in the request sequence.

Also: Note that potential of empty tree is $0$.

$\Rightarrow$ If we start with empty tree, no problem (but for that we need inserts...)

Applications of Access Lemma

By applying the above analysis with different weights we can derive a lot of interesting facts about performance of splay trees. (Key fact to keep in mind: the behavior of the splay tree is completely independent of our choice of the weights!)

Balance theorem: Total cost for $m$ operations is $O((m+n)\log n)$ (as good as any balanced tree!)
- Set $w_x=1/n$, for each node $x$.
- Overall potential drop $\leq \sum_x \log W/w_x = \log n$.
- Amortized cost of $Search(x)$: $1+3\log n=O(\log n)$.
However, the performance guarantee that splay tree offer go way beyond the above mere “$O(\log n)$ time per operation” guarantee that “traditional” balanced BSTs give. Some important example below!
Static Optimality: Total cost for $m$ operations is as good as of any fixed tree. (Including trees that were designed while knowing what the request sequence will be!)
- If node $x$ accessed $p_i m$ times, set $w_x=p_x$.
- $W=1$
- Overall potential drop $O( \sum_x \log 1/p_x)$.
- Amortized cost of $Search(x)$: $3(1-\log p_x)+1 = O(1+\log 1/p_x)$.
- Total cost of the whole sequence of requests: \[ \sum_x p_x m \cdot O\left(1+\log 1/p_x\right) + O\left( \sum_x \log 1/p_x\right)=O\left(m \sum_x \log 1/p_x\right). \] Matches the lower bound for static access (\'a la Huffman coding)!
Static Finger Optimality: Total cost for $m$ operations is as good as of any fixed tree in which additionally we start our search from any fixed node $f$ (instead of the root).
- Set $w_x=1/(1+|x-f|)^2$, for each node $x$, where $|x-f|$ is the distance from $x$ to the finger $f$.
- $\sum w_i \le 2 \sum 1/k^2 = O(1)$.
- Overall potential drop $O(n\log n)$.
- Amortized cost of $Search(x)$: $O(\log |x-f|)$.
- In fact: Can also show that splay trees are optimal for dynamic finger too, i.e., when the next search is started from the previously found node (instead of the root). But the proof is very hard (and requires much more than just the Access Lemma).
Working Set/Cache Optimality:
- At access $j$ to node $x_j$, let $t_j$ be number of distinct nodes accessed since that node $x_j$ was previously accessed. Then time is: \[ O(n\log n+\sum \log t_j). \] Note: $O(\log t_j)$ is the time we would expect it takes to find element $x_j$ even if our tree contained only the $t_j$ nodes that were accessed since previous access to $x_j$.
Best-of-the-Above Optimality: Cost is at most as good as the best one of the choices stemming from the above optimality conditions.
Dynamic Optimality Conjecture: One of the most outstanding open problems in data structures is the Dynamic Optimality Conjecture. This conjecture states that the performance of splay tree matches (up to constant) the performance of the best dynamic BST. That is, performance of a BST that can perform rotations between requests (each rotation having a cost of $1$) and such that these rotations are chosen optimally with the full knowledge of the whole request sequence.
This conjecture is still open. The best result on this front is due to Demaine et al. (FOCS 2004) and presents a different (much more complicated) BSTs, called Tango trees, that have performance that is within $O(\log \log n)$ of that of the optimal dynamic BST.
Still, it is generally believe that Dynamic Optimality Conjecture is true and splay tree indeed is the ultimate BST construction. See a survey “In Pursuit of the Dynamic Optimality Conjecture” by John Iacono about this fascinating conjecture.

Update Operations

We have two remaining operations we need to implement: $Insert(x)$ and $Delete(x)$. There are two ways to implement them:

Insert/delete the item as in a standard BST and then just splay the inserted element/parent of deleted element.
- Splaying ensures that our work is “paid for” by the resulting potential change.
- But showing that this is the case requires a little bit of calculations.
Alternatively, consider the following operations:
- $Split(x)$ splits the BST into two BST, one containing all the elements smaller or equal to $x$ and one containing all the elements larger than $x$.
  $\Rightarrow$ Can be achieved by simply splaying $x$ and thus takes amortized $O(\log n)$ time (corresponds to taking all weights be equal to $1$ in the analysis).
- $Join$ takes two BSTs $S$ and $T$ such that all elements in $S$ are not larger than all elements in $T$ and creates a single BST contains elements of both $S$ and $T$.
  $\Rightarrow$ Can be achieved by splaying first the largest element $x$ of $S$ (note that then this element becomes a root with no right child) and adding the root of $T$ as the right child of $x$.
  $\Rightarrow$ Joining increases only the potential of the new root, and only by at most $O(\log n)$.
  $\Rightarrow$ Thus, again, the whole operation takes $O(\log n)$ amortized time (if we take all $w_x$ equal to $1$ in the analysis).
Note: For most balanced trees such $Split(x)/Join$ operations do not take only $O(\log n)$ time!
To implement $Insert(x)$:
- Perform $Split(x')$, where $x'$ is the largest element not larger than $x$. (Finding $x'$ can be easily amortized into the subsequent splay.)
- Create a new BSTs with $x$ as a root and the two BSTs obtained from splitting be its subtrees.
  $\Rightarrow$ Total potential increased by $O(\log n)$, i.e., the increase of potential of the root. (Remember: we are performing analysis with all weights being $1$ here.)
  $\Rightarrow$ Total amortized cost is $O(\log n)$.
To implement $Delete(x)$:
- Perform $Split(x)$
  $\Rightarrow$ $x$ will be the root of one of the resulting BSTs and will have no right child.
- Remove $x$ and perform $Join$ on the resulting two BSTs.
- $O(\log n)$ amortized cost, as desired.

Possible Heuristics

Top down splaying.
Choose to splay only when path is “long”. (Real cost too large so need to amortize). Drawback: must know weights.
Can choose to stop splaying after a while. Good for random access frequencies.