6.897: Advanced Data Structures (Spring 2003)

Prof. Erik Demaine

[Home] [General motivation] [Requirements] [Projects] [Scribe notes] [Erik's notes] [Email archive] [Accessibility]

General Motivation

The field of data structures is similar in spirit to algorithms: what's the fastest way to solve problem X? But data structures take a different tact, focusing on ways to organize a typically long-lived corpus of data so that queries can be answered really fast. A typical example are web search engines, which store sophisticated indexes that support a variety of queries (e.g., what documents contain word X?) quickly. In contrast to most algorithmic problems, "linear time" is too slow--you can't afford to scan through the entire web to answer a single query--and the goal is typically logarithmic or even constant time per query. Also, space tends to become a more important issue, because the data corpus is usually big, and you don't really want a data structure that is larger (or even as big as) the data itself.

To me, data structures are particularly exciting because they are so universal and powerful. Almost every algorithmic problem I've encountered has at least a small data structural component, and knowledge about data structures and the literature has made faster algorithms possible.

Here are a couple of examples of very recent and cool data structures, so that you can judge whether these results excite you and you want to take or sit in the class:

Text indexing and web searching [research over the past several years]. A classic data structure in this area allows you to preprocess text in linear time so that you can search for an arbitrary substring in time proportional to the length of that substring. So searching for one word is really really fast. A recent improvement to this result is that the data structure can be made small, proportional to the text size; and if the text is compressible, the data structure can be equally compressed.
Sorting [past several years]. We all learn sorting is Theta(n lg n), but that's just in the comparison model. A more typical model in the advanced data structures world is that you are sorting integers. In this context, the best known sorting algorithm runs in O(n sqrt(lg lg n)) time (!), and it's even conceivable that sorting is possible in O(n) time!
Sorting and priority queues [Thorup 2002]. You probably know that any priority queue data structure can be turned into a sorting algorithm. Did you know that the reverse is true, too? Any sorting algorithm that runs in O(n * f(n)) time can be converted into a priority queue data structure with O(f(n)) time per operation.
Cache-oblivious data structures [several results in the past 2 years]. Any problem (e.g., sorting, priority queues) are different when you're dealing with disk instead of main memory, or you care about cache performance. A new wave of algorithms and data structures seems to capture most of these issues in a theoretically clean way, and there have been several optimal results along these lines.