\documentclass[11pt]{article}
\usepackage{latexsym}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{epsfig}
\usepackage{psfig}
\usepackage[all]{xy}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf 6.851: Advanced Data Structures } \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in
\parindent 0in
\parskip 1.5ex
%\renewcommand{\baselinestretch}{1.25}
\renewcommand\emph[1]{{\bf #1}}
\begin{document}
\lecture{14 --- March 30, 2010}{Spring 2010}{Prof.\ Andr\'{e} Schulz}{Rishi Gupta}
\section{Overview}
In the last lecture we covered the Separator Decomposition, which rearranges any tree into a balanced tree, and the ART/leaf-trimming Decomposition, which is a tree decomposition used to solve the marked ancestor problem and decremental connectivity problem.
In this lecture we talk about solutions to the static and dynamic \emph{dictionary problem}. Our goal is to store a small set $S=\{1,2,\dots,n\}$ of keys from a large universe $U$, with perhaps some information associated to every key. We want to find a compact data structure with low pre-processing time that will support Query($x$) in the static case and Query($x$), Insert($x$), and Delete($x$) in the dynamic case. Query($x$) checks if $x\in S$, and returns any information associated with $x$.
In particular, {\bf FKS hashing} achieves $O(1)$ worst-case query time with $O(n)$ expected space and takes $O(n)$ construction time for the static dictionary problem. {\bf Cuckoo Hashing} achieves $O(n)$ space, $O(1)$ worst-case query and deletion time, and $O(1)$ amortized insertion time for the dynamic dictionary problem.
\section{Hashing with Chaining}
Our first attempt at a dictionary data structure is {\it hashing with chaining}. We find a \emph{hash function} $h:U\rightarrow\{1,2,\dots,m\}$, and maintain a table $T$ with $m$ rows, where $T[j] = $ a linked list (or chain) of $\{x\in S : h(x) = j\}$.
Key look up takes $\frac{\sum |T[j]|^2}{2\sum |T_j|}$ expected time, so we want to make the chains as equal as possible. We also generally want $h$ to be easy to evaluate. Luckily, such hash functions exist\footnote{It's worth noting that a random function from $U$ to $\{1,\dots,m\}$ would actually have O(1) expected query time (provided $n=O(m)$). However, such a random function would be infeasibly inefficient to store and evaluate.}:
\subsection{Universal Hashing}
\def\H{\mathcal{H}}
\begin{definition}
A class of functions $\mathcal{H}$ is $c$-universal if and only if for all $x\ne y$, $$|\{h\in\H : h(x)=h(y)\}| \le c\cdot \frac{|\H|}{m}.$$
\end{definition}
In particular, if $h$ is picked at random from $\H$, $Pr[h(x)=h(y)] \le \frac{c}{m}$.
We assume for the rest of this section that $m =O(n)$.
Define $Z_x := |\{ y\in S : h(x)=h(y)\}|$, namely, the collisions with $x$ in the set of elements we want to store. We can also write this as $Z_x = \sum_y \delta_{h(x),h(y)}$, where $\delta$ is the Kronecker delta ($\delta_{a,b} = 1$ if $a=b$ and $0$ otherwise). The average query time is simply $E[Z_x]$, and we have
$$ E[Z_x] = E\left[\sum_y \delta_{h(x),h(y)}\right] = \sum_y E[\delta_{h(x),h(y)}] = 1+ \sum_{y\ne x} Pr[h(x)=h(y)] = 1 + \frac{nc}{m} = O(1).$$
All that remains then is to find a class of such $c$-universal functions. An example of a 2-universal class is
$$ \H = \{h_a(x) = (ax\ (\text{mod }p))\ (\text{mod }m) \}_{0m$.
The drawback here is the worst case situation (ie. the longest chain) is $\Theta(\log n /\log\log n)$ \cite{gonnet}. We can actually do a bit better, by using $2d$ hash functions, and inserting the item into the shortest of the $2d$ possible lists, after which the worst case lookup time becomes $\Theta(\log\log n/\log d)$ \cite{mitzi}.
\subsection{Perfect Hashing}
If we don't care about linear space, we can build a hash function with {\it no} collisions in the static case.
Let
$$Z(h) = \sum_{\substack{x 1/2$. This means that we can pick a collision-free $h$ in O(1) trials.
This is great, except of course that $m = cn^2$ is a lot of space.
\section{FKS -- Fredman, Koml\'{o}s, Szemer\'{e}di (1984) \cite{fks}}
{\it FKS hashing} is a two-layered hashing solution to the static dictionary problem that achieves $O(1)$ worst-case query time in $O(n)$ expected space, and takes $O(n)$ time to build.
The main idea is to hash to a small table $T$ with collisions, and have every cell $T_i$ of $T$ be a collision-free hash table on the elements that map to $T_i$.
If $S$ is the original set, we let $S = S_1 \dot{\cup} S_2 \dot{\cup} \cdots \dot{\cup} S_m$, where $S_i = \{ x : h(x)=i \}$ (note that $\dot{\cup}$ denotes disjoint union). Using perfect hashing, we can find a collision-free hash function $h_i$ from $S_i$ to a table of size $O(|S_i|^2)$ in constant time. To make a query then we compute $h(x) = i$ and then $h_i(x)$.
The expected size of the data structure is
$$ E\left(\sum |S_i|^2\right) = E\left(\sum |S_i|\right) + 2\cdot E\left(\sum\binom{|S_i|}{2}\right) = n + 2\cdot Z(h) = n + O\left(\frac{n^2}{m}\right).$$
If we let $m=O(n)$, we have expected space $O(n)$ as desired, and since the creation of each $T_i$ takes constant time, the total construction time is $O(n)$.
\section{More Universal Hashing}
\begin{definition}
A class $\H$ of hash functions is \emph{$(c,k)$-universal} if for all distinct $x_1,x_2,\dots,x_k$ and for any $a_1,a_2,\dots,a_k$
$$Pr(h(x_1)=a_1 \wedge h(x_2)=a_2 \wedge \cdots \wedge h(x_k)=a_k) \le \frac{c}{m^k} $$
for all $h\in \H$.
\end{definition}
Note that $c$-universal is the same as $(c,1)$-universal.
Siegel (1989) \cite{siegel} showed that a $(c,\log n)$-universal class of hash functions exists, where the functions have $O(1)$ query time and $O(\log n)$ storage (ie.\ each hash function can be described in $O(\log n)$ bits), which we will need in the next section.
\section{Cuckoo Hashing -- Pagh and Rodler (2001) \cite{cuckoo}}
{\it Cuckoo hashing} is inspired by the Cuckoo bird, which lays its eggs in other birds' nests, bumping out the eggs that are originally there.
Cuckoo hashing solves the dynamic dictionary problem, achieving $O(1)$ worst-case time for queries and deletes, and $O(1)$ expected time for inserts.
Let $f$ and $g$ be $(c,6\log n)$-universal hash functions. As usual, $f$ and $g$ map to a table $T$ with $m$ rows. We implement the functions as follows:
\begin{itemize}
\item {\em Query(x)} -- Check $T[f(x)]$ and $T[g(x)]$ for $x$.
\item {\em Delete(x)} -- Query $x$ and delete if found.
\item {\em Insert(x)} -- If $T[f(x)]$ is empty, we put $x$ in $T[f(x)]$ and are done.
Otherwise say $y$ is originally in $T[f(x)]$. We put $x$ in $T[f(x)]$ as before, and bump $y$ to whichever of $T[f(y)]$ and $T[g(y)]$ it didn't just get bumped from. If that new location is empty, we are done. Otherwise, we place $y$ there anyway and repeat the process, moving the newly bumped element $z$ to whichever of $T[f(z)]$ and $T[g(z)]$ doesn't now contain $y$.
We continue in this manner until we're either done or reach a hard cap of bumping $6\log n$ elements. Once we've bumped $6\log n$ elements we pick a new pair of hash functions $f$ and $g$ and rehash every element in the table.
\end{itemize}
Note that at all times we maintain the invariant that each element $x$ is either at $T[f(x)]$ or $T[g(x)]$, which makes it easy to show correctness. The time analysis is harder.
It is clear that query and delete are $O(1)$ operations. The reason Insert($x$) is not horribly slow is that the number of items that get bumped is generally very small, and we rehash the entire table very rarely when $m$ is large enough. We take $m=4n$.
Since we only ever look at at most $6\log n$ elements, we can treat $f$ and $g$ as random functions. Let $x=x_1$ be the inserted element, and $x_2,x_3,\dots$ be the sequence of bumped elements in order. It is convenient to visualize the process on the {\it cuckoo graph}, which has verticies $1,2,\dots,m$ and edges $(f(x),g(x))$ for all $x\in S$. Inserting a new element can then be visualized as a walk on this graph. There are 3 patterns in which the elements can be bumped.
\begin{itemize}
\item {\em Case 1} Items $x_1,x_2,\dots,x_k$ are all distinct. The bump pattern looks something like\footnote{Diagrams courtesy of Pramook Khungurn, Lec 1 scribe notes from when the class was taught (as 6.897) in 2005}
\begin{figure}[h]
\begin{displaymath}
\xymatrix{
\ar@/^/[dr]^{x_1} & & & & & & & & \\
& \bullet \ar[r]^{x_2} & \bullet \ar[r]^{x_3} & \bullet \ar[r]^{x_4} & \bullet \ar[r]^{x_5} & \bullet \ar[r]^{x_6} & \bullet \ar[r]^{x_7} & \bullet\\
}
\end{displaymath}
\end{figure}
The probability that at least one item (ie.\ $x_2$) gets bumped is
$$ Pr(T[f(x)] \text{is occupied}) = Pr(\exists\, y : f(x)=g(y) \vee f(x) = f(y)) < \frac{2n}{m} = \frac{1}{2}.$$
The probability that at least 2 items get bumped is the probability the first item gets bumped ($<1/2$, from above) times the probability the second item gets bumped (also $<1/2$, by the same logic). By induction, we can show that the probability that at least $t$ elements get bumped is $<2^{-t}$, so the expected running time ignoring rehashing is $< \sum_t t2^{-t} = O(1)$. The probability of a full rehash in this case is $< 2^{-6\log n} = O(n^{-6})$.
\item {\em Case 2} The sequence $x_1,x_2,\dots, x_k$ at some point bumps an element $x_i$ that has already been bumped, and $x_i,x_{i-1},\dots,x_1$ get bumped in turn after which the sequence of bumping continues as in the diagram below. In this case we assume that after $x_1$ gets bumped all the bumped elements are new and distinct.
\begin{figure}[h]
\begin{displaymath}
\xymatrix{
\ar@/^/[dr]^{x_1} & & & & \bullet \ar[d]_{x_8} & \bullet \ar[l]_{x_7}\\
& \bullet \ar[r]^{x_2} \ar@/_1pc/@{..>}[dd]_{x_1} & \bullet
\ar[r]^{x_3} \ar@/^1pc/@{-->}[l]^{x_2} & \bullet \ar[r]^{x_4}
\ar@/^1pc/@{-->}[l]^{x_3} & \bullet \ar[r]^{x_5}
\ar@/^1pc/@{-->}[l]^{x_4}
& \bullet \ar[u]_{x_6}\\
& & & & &\\
& \bullet \ar@{..>}[r]^{x_9} & \bullet \ar@{..>}[r]^{x_{10}} & \bullet \ar@{..>}[r]^{x_{11}} & \bullet \ar@{..>}[r]^{x_{12}} & \bullet \\
}
\end{displaymath}
\end{figure}
The length of the sequence ($k$) is at most 3 times $\max\{\# \text{solid arrows}, \# \text{dashed arrows}\}$ in the diagram above, which is expected to be $O(1)$ by Case 1. Similarly, the probability of a full rehash is $O(2^{\frac{-6\log n}{3}}) = O(n^{-2}).$
\item {\em Case 3} Same as Case 2, except that the dotted lines again bump something that has been bumped before (diagram on next page).
\begin{figure}[h]
\begin{displaymath}
\xymatrix{
\ar@/^/[dr]^{x_1} & & & & \bullet \ar[d]_{x_8} & \bullet \ar[l]_{x_7}\\
& \bullet \ar[r]^{x_2} \ar@/_1pc/@{..>}[dd]_{x_1} & \bullet
\ar[r]^{x_3} \ar@/^1pc/@{-->}[l]^{x_2} & \bullet \ar[r]^{x_4}
\ar@/^1pc/@{-->}[l]^{x_3} & \bullet \ar[r]^{x_5}
\ar@/^1pc/@{-->}[l]^{x_4}
& \bullet \ar[u]_{x_6}\\
& & & & & \\
& \bullet \ar@{..>}[r]^{x_9} & \bullet \ar@{..>}[r]^{x_{10}} & \bullet \ar@{..>}[d]^{x_{11}} & & \\
& & \bullet \ar@{..>}[u]^{x_{13}} & \bullet \ar@{..>}[l]^{x_{12}} & & \\
}
\end{displaymath}
\end{figure}
In this case, the cost is $O(\log n)$ bumps plus the cost of a rehash. We compute the probability Case 3 happens via a counting argument. The number of Case 3 configurations involving $t$ distinct $x_i$ given some $x_1$ is ($\le n^{t-1}$ choices for the other $x_i)\cdot(