\documentclass[11pt]{article}
\usepackage{latexsym}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{epsfig}

\newcommand{\eps}{\varepsilon}


\newcommand{\handout}[5]{
  \noindent
  \begin{center}
  \framebox{
    \vbox{
      \hbox to 5.78in { {\bf 6.897: Advanced Data Structures } \hfill #2 }
      \vspace{4mm}
      \hbox to 5.78in { {\Large \hfill #5  \hfill} }
      \vspace{2mm}
      \hbox to 5.78in { {\em #3 \hfill #4} }
    }
  }
  \end{center}
  \vspace*{4mm}
}

\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}

\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}

% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in

\parindent 0in
\parskip 1.5ex
%\renewcommand{\baselinestretch}{1.25}

\begin{document}

\lecture{13 --- March 17, 2005}{Spring 2005}
	{Prof.\ Erik Demaine}{Brian Jacokes}

\section{Overview}

The sorting problem, in which we wish to sort $n$ elements according
to a given ordering, has a tight $O(n\lg{n})$ bound in the comparison
model.  In the integer sorting problem, we consider elements which are
$w$-bit integers.  By making this assumption and working in the word
RAM model, several improved results have been found for sorting:

\begin{itemize}
\item Counting sort: $O(n+u)=O(n+2^w)$

\item Radix sort: $O(n \cdot \frac{w}{\lg{n}})$

\item van Emde Boas: $O(n\lg{w})$. For $w=\lg^{O(1)}n$, this is
  $O(n\lg\lg{n})$.

\item Andersson, Hagerup, Nilsson, and Raman \cite{ahnr}: $O(n)$ for
  $w=\Omega(\lg^{2+\eps}n)$. Combined with the previous result for
  small $w$, this gives sorting in $O(n\lg\lg n)$ time.

\item Kirkpatrick and Reisch \cite{kr}: $O(n \lg
  \frac{w}{\lg{n}})$. This is $o(n\lg\lg{n})$ for $w =
  \lg^{1+o(1)}n$. You are asked to prove this result in problem 7.

\item Han \cite{h}: $O(n\lg\lg{n})$ deterministic and on the AC$^0$ RAM

\item Han and Thorup \cite{ht}: $O(n\sqrt{\lg\lg{n}})$
  randomized. Actually, one can achieve $O(n \sqrt{\lg\frac{w}{\lg
  n}})$, improving the result of \cite{kr}.
\end{itemize}

We will prove the result in \cite{ahnr}.  Combining this result with
van Emde Boas gives an $O(n\lg\lg{n})$ upper bound for all values of
$w$.  It is also worth noting that the hardness of integer sorting is
concentrated in a narrow interval for $w$, between $\lg^{1+\eps}
n$ and $\lg^2 n$. At the ends of the interval, there is a relatively
quick fall-off in the running time until it becomes linear.


\section{Signature Sort}

The \emph{signature sort} in \cite{ahnr} allows us to sort in $O(n)$
time for $w \ge (\lg^{2+\eps}n) \lg\lg{n} = \lg^{2+\eps'} n$.  We
break each integer into $\lg^{\eps}n$ equal-sized chunks, encoding
each of these chunks in $O(\lg{n})$ bits with a universal hash
function.  The result will be $n$ \emph{signatures} with
$b=O(\lg^{1+\eps}n)$ bits each. The hash codes for different chunks
will be different with high probability.

The general strategy of the algorithm is as follows:

\begin{itemize}
\item sort the signatures in linear time. This is possible because
  they are significantly smaller than a word. We develop \emph{packed
  sorting}, which takes $O(n)$ time to sort $n$ integers of $b$ bits
  each, given a word size of $w=\Omega(b\lg{n}\lg\lg{n})$.

\item build a compressed trie over the signatures. This can also be
  done in linear time.

\item recursively sort the original letters (not the hash codes) of
  the compressed trie. We are recursing on $O(n)$ letters, which have
  $\lg^{\eps} n$ times fewer bits than the original. After
  $O(\frac{1}{\eps})$ levels of recursion, we can used packed sorting
  to solve the problem in linear time.

\item find the order of the original integers based on the order of
  the letters of the compressed trie.
\end{itemize}

We defer packed sorting to Section~\ref{sec:packed}, since it is the
most technical component of the algorithm. In Section~\ref{sec:trie},
we describe how to build the compressed trie based on the list of
sorted signatures. In Section~\ref{sec:reorder}, we describe how to
reconstruct the original sorted order, based on the order of the
letters and the compressed trie.



\subsection{Constructing a Compressed Trie} \label{sec:trie}

We would like to build a tree out of these signatures in a manner
similar to van Emde Boas.  Each edge represents a chunk value, so the
tree has height $\lg^{\eps}n$, and the $n$ integers corresponding to
the signatures are stored in the leaves. Such a simple trie has $O(n
\lg^\eps n)$ complexity (used edges), so we cannot build it in linear
time.  Instead, we will only keep track of branching nodes, and we
compress nonbranching paths; see Figure~\ref{fig:trie}. This is called
a compressed trie; because there are $n$ leaves, it has at most $n-1$
internal nodes and $2n-2$ edges. To navigate the compressed trie, we
store for each node its \emph{effective depth} in the fully expanded
trie, and store the value of the first chunk along each compressed
edge.

\begin{figure}
  \centering
  \scalebox{.7}{\includegraphics{diagram1.eps}}

\caption{\textbf{(a)} A tree of signatures.  Each edge represents the
  hash value of a chunk, and the leaves are possible signature
  values. \textbf{(b)} A compressed tree of signatures.  Non-branching
  nodes are discarded, as are chunk values which leave non-branching
  nodes.}
\label{fig:trie}
\end{figure}


To build this tree in $O(n)$ time, we use an idea similar to the
Cartesian tree construction algorithm of Gabow, Bentley and Tarjan
\cite{gbt}. We do not describe the Cartesian tree here, but
reformulate the algorithm in the context of our problem.

We build the compressed trie by inserting signatures in sorted order.
We insert the first signature as a single edge below the root. To
insert signature $i+1$, we take its XOR with signature $i$ and find
the most significant set bit in the result. Starting from signature
$i$ in the tree, we walk upward until we are at the appropriate
branching node, which may require breaking an edge to create this
node; refer to Figure~\ref{fig:trieins}. We then insert a new edge
containing the differing suffix of signature $i+1$. The length of the
walk to the branching node will be within 1 of the decrease in the
length of the rightmost path in the tree. On the other hand, inserting
a suffix can only increase the rightmost path by one. Thus, we can
charge the length of the walk minus 1 to the decrease in the length of
the rightmost path, and we obtain a total time bound of $O(n)$.

\begin{figure}
  \centering
  \scalebox{.7}{\includegraphics{diagram2.eps}}

\caption{Inserting a new signature X.  \textbf{(a)} If there is
  already a branching node at the longest common prefix of X and its
  predecessor, X is simply added as one of its children. \textbf{(b)}
  A new node is added along an edge if a branching node does not exist
  at the correct position.}
\label{fig:trieins}
\end{figure}



\subsection{Reordering Children} \label{sec:reorder}

We must now recover the sorted integers based on the trie of
signatures. Because we applied a hash function to each chunk, integers
with common prefixes will also have signatures with common prefixes.
Thus, we only need to reorder the children of each node by the order
of the original letters. Then, an inorder traversal of the tree gives
the integers in sorted order.

There are $2n-2=O(n)$ edges in the tree, each with a chunk of
$\frac{w}{\lg^\eps n}$ bits.  We attach to each chunk $O(\lg n)$ bits
of auxiliary data: the index of the parent node, and the index
representing the edge itself. This small auxiliary data can be carried
around without difficulty (e.g.~it can be appended to the chunk, and
considered a part of the number). We can now recursively sort these
$O(n)$ chunks. We bottom out our recursion when the chunks contain
$O(\frac{w}{\lg n \lg\lg n})$ bits because packed sorting will apply.
This will take $\frac{1}{\eps}+1$ recursions, which keeps our total
bound at $O(n)$ because $\eps$ is independent of $n$.

We now use a stable radix sort to sort the chunks by their node index,
causing all of the information about a node's children to be
contiguous in the list of chunks.  A simple scan of the node and edge
indices will now allow us to reorder the children of each node by
chunk value in $O(n)$ time.  An inorder traversal of the resulting
tree will give us the ordering of the leaves.



\section{Packed Sorting} \label{sec:packed}

Packed sorting, due to Albers and Hagerup~\cite{ah}, can sort $n$
integers of $b$ bits in $O(n)$ time, given a word size of $w \ge
2(b+1) \lg{n}\lg\lg{n}$. We can therefore pack $\lg{n}\lg\lg{n}$
elements into one word in memory. We leave one zero bit between each
integer, and $w/2$ zero bits in the high half of the word; see
Figure~\ref{fig:pack}.

\begin{figure}
  \centering
  \scalebox{.7}{\includegraphics{diagram3.eps}}
\caption{Packing $b$-bit integers into a $w$-bit word.}
\label{fig:pack}
\end{figure}


We use an adapted version of mergesort to sort the elements. We have
four main operations that allow us to do this:

\begin{enumerate}
\item Merge a pair of sorted words with $k \le \lg{n}\lg\lg{n}$
  elements into one sorted word with $2k$ elements.  In
  Section~\ref{sec:merge}, we show how to do this in $O(\lg{k})$ time.

\item Merge sort $k \le \lg{n}\lg\lg{n}$ elements, yielding a packed
  word with elements in order. Using (1) for the merge operation, this
  takes time $T(k)=2T(\frac{k}{2})+O(\lg{k})$.  Using the master
  theorem or drawing the recursion tree shows the leaves dominate the
  running time, so $T(k) = O(k)$.

\item Merge two sorted lists of $r$ words, each word containing
  $k=\lg{n}\lg\lg{n}$ sorted elements, into one sorted list of $2r$
  sorted words.  We do this by removing the first word of each list
  and merging them using (1). The first half of the resulting word can
  be output, since its $k$ elements are necessarily the smallest of
  all those remaining. We then mask the second half of the word, which
  contains the larger $k$ elements. This word is placed at the
  beginning of the list which formerly contained the maximum element
  in the word, maintaining the sortedness of the lists. We take $O(\lg
  k)$ time to output a word, so the merge operation takes total time
  $O(r\lg{k})$.

\item Merge sort with (3) as the merge operation and (2) as the base
  case, yielding a recurrence of $T(n) = 2T(\frac{n}{2}) +
  O(\frac{n}{k} \lg{k})$, where $k=\lg n\lg\lg n$. There are $\lg
  \frac{n}{k} = O(\lg n)$ internal levels in the recursion tree, each
  taking total time $O(\frac{n}{k}\lg{k}) = O(\frac{n}{\lg n}$. So
  internal levels contribute a cost of $O(n)$. The $\frac{n}{k}$
  leaves each take $O(k)$ time, so the total cost of the leaves is
  also $O(n)$.
\end{enumerate}


\subsection{Merging Words} \label{sec:merge}

We use bitonic sorting networks and bit tricks to merge two words
together.  A \emph{bitonic sequence} is one for which a cyclic shift
will result in a sequence which increases monotonically and then
decreases monotonically; see Figure~\ref{fig:bitonic}.  A bitonic
sequence can be sorted by putting all pairs $A[i]$ and
$A[i+\frac{n}{2}]$ in the correct order for
$i=1,2,\ldots,\frac{n}{2}$, and then recursively sorting the first and
second halves of the data. Each step uses $\frac{n}{2}$ comparisons
and potential swaps, and the recursion has depth $O(\lg n)$. A proof
of correctness can be found in \cite[Chapter 27]{clrs}.

\begin{figure}
  \centering
  \includegraphics{diagram4.eps}
\caption{\textbf{(a)} A bitonic sequence. \textbf{(b)} A cyclic shift
  of (a) is also a bitonic sequence.}
\label{fig:bitonic}
\end{figure}

We can use a bitonic sort to merge two words of $k$ elements.  We
first reverse the second word and then concatenate the two words,
leaving a bitonic sequence.  Reversing a word can be done by masking
out the leftmost $\frac{k}{2}$ elements and shifting them right by
$\frac{k}{2} b$, and similarly masking out the rightmost $\frac{k}{2}$
elements and shifting them left by $\frac{k}{2}b$.  Taking the OR of
the two resulting words will give a word with the left and right
halves of the original word swapped.  We now recursely reverse the
left and right halves of the word \emph{in parallel}, so that each
level of recursion takes $O(1)$ time.  After $\lg{k}$ recursions we
reach the base case where there is only one element to be reversed, so
the total time to reverse a word of $k$ elements is $O(\lg{k})$.  The
two words may now be concatenated by shifting the first word left by
$kb$ and taking its OR with the second word. See
Figure~\ref{fig:reverse}.

\begin{figure}
  \centering
  \scalebox{.7}{\includegraphics{diagram5.eps}}

\caption{\textbf{(a)} The first recursion in reversing a
  list. \textbf{(b)} Next, the two halves are each divided into left
  and right halves which are swapped and recursively sorted.}
\label{fig:reverse}
\end{figure}


All that remains is to run the bitonic sorting algorithm on the
elements in our new word.  To do so, we must divide the elements in
two halves and swap corresponding pairs of elements which are out of
order.  Then we can recurse on the first and second halves in
parallel, performing $\lg{k}$ total recursions.  Thus we need a
constant-time operation which will perform the desired swapping.

Recall that we left an extra 0 bit before each element when we packed
them into a word.  We will mask the left half of the elements and set
this extra bit to 1 for each element, then mask the right half of the
elements and shift them left by $\frac{k}{2}b$.  If we subtract the
second word from the first, a 1 will appear in the extra bit if and
only if the element in the corresponding position of the left half is
greater than the element in the right half.  Thus we can mask the
extra bits, shift the word right by $b-1$ bits, and subtract it from
itself, resulting in a word which will mask all the elements of the
right half which belong in the left half and vice versa.  Similarly,
negating this word will mask all elements which belong in their
current position.  Simple shifts and OR operations will then produce
the desired result, a word containing $2k$ sorted elements. See
Figure~\ref{fig:swaps}.

\begin{figure}
  \centering
  \scalebox{.7}{\includegraphics{diagram6.eps}}

\caption{Sorting pairs of corresponding elements in the left and right
  halves of a word.  Extra bits are set in the left half, the right
  half of the word is shifted and subtracted, and a mask is created
  from the result.  The large elements are then masked out of both
  halves using this mask and its negation.  In a similar process, the
  small elements are found, and the two are finally appended
  together.}
\label{fig:swaps}
\end{figure}

We therefore have a constant-time operation which performs the desired
operation from bitonic sorting.  Recursively sorting both halves in
parallel will yield $\lg{k}$ levels of recursion, leading to the
$O(\lg{k})$ time for operation (1).  Packed sorting in $O(n)$ time
immediately follows, as does our $O(n)$ time result for
$w=\Omega(\lg^{2+\eps}n)$.



\bibliographystyle{alpha}
\begin{thebibliography}{77}

\bibitem{ah}
Susanne Albers, Torben Hagerup:
\emph{Improved Parallel Integer Sorting without Concurrent Writing},
Inf.~Comput. 136(1): 25-51, 1997.

\bibitem{ahnr}
A. Andersson, T. Hagerup, S. Nilsson, R. Raman,
\emph{Sorting in Linear Time?},
J. Comput. Syst. Sci. 57(1): 74-93, 1998.

\bibitem{clrs}
T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein:
\emph{Introduction to Algorithms}, Second Edition,
The MIT Press and McGraw-Hill Book Company 2001.

\bibitem{gbt}
H.N. Gabow, J.L. Bentley, R.E. Tarjan:
\emph{Scaling and Related Techniques for Geometry Problems},
STOC 1984: 135-143.

\bibitem{h}
Y. Han:
\emph{Deterministic Sorting in $O(n\log\log{n})$ Time and Linear Space},
J. Algorithms 50(1): 96-105, 2004.

\bibitem{ht}
Y. Han, M. Thorup:
\emph{Integer Sorting in $O(n\sqrt{\log\log{n}})$ 
  Expected Time and Linear Space},
FOCS 2002: 135-144.

\bibitem{kr}
D.G. Kirkpatrick, S. Reisch:
\emph{Upper Bounds for Sorting Integers on Random Access Machines},
Theoretical Computer Science 28: 263-276 (1984).

\end{thebibliography}

\end{document}
