6.897: Advanced Data Structures (Spring 2003)
[Home]
[General motivation]
[Requirements]
[Projects]
[Scribe notes]
[Erik's notes]
[Email archive]
[Accessibility]
General Motivation
The field of data structures is similar in spirit to algorithms: what's the
fastest way to solve problem X? But data structures take a different tact,
focusing on ways to organize a typically long-lived corpus of data so that
queries can be answered really fast. A typical example are web search engines,
which store sophisticated indexes that support a variety of queries (e.g., what
documents contain word X?) quickly. In contrast to most algorithmic problems,
"linear time" is too slow--you can't afford to scan through the entire web
to answer a single query--and the goal is typically logarithmic or even
constant time per query. Also, space tends to become a more important issue,
because the data corpus is usually big, and you don't really want a data
structure that is larger (or even as big as) the data itself.
To me, data structures are particularly exciting because they are so universal
and powerful. Almost every algorithmic problem I've encountered has at least
a small data structural component, and knowledge about data structures and
the literature has made faster algorithms possible.
Here are a couple of examples of very recent and cool data structures, so that
you can judge whether these results excite you and you want to take or sit in
the class:
- Text indexing and web searching [research over the past several years].
A classic data structure in this area allows you to preprocess text in
linear time so that you can search for an arbitrary substring in time
proportional to the length of that substring. So searching for one word
is really really fast. A recent improvement to this result is that the
data structure can be made small, proportional to the text size; and if the
text is compressible, the data structure can be equally compressed.
- Sorting [past several years]. We all learn sorting is Theta(n lg n), but
that's just in the comparison model. A more typical model in the advanced
data structures world is that you are sorting integers. In this context,
the best known sorting algorithm runs in O(n sqrt(lg lg n)) time (!), and
it's even conceivable that sorting is possible in O(n) time!
- Sorting and priority queues [Thorup 2002]. You probably know that any
priority queue data structure can be turned into a sorting algorithm.
Did you know that the reverse is true, too? Any sorting algorithm that
runs in O(n * f(n)) time can be converted into a priority queue data
structure with O(f(n)) time per operation.
- Cache-oblivious data structures [several results in the past 2 years].
Any problem (e.g., sorting, priority queues) are different when you're
dealing with disk instead of main memory, or you care about cache
performance. A new wave of algorithms and data structures seems to capture
most of these issues in a theoretically clean way, and there have been
several optimal results along these lines.