6.897: Advanced Data Structures (Spring 2003)

Prof. Erik Demaine

Mailing List Archive

Below is a copy of all email messages sent to the class mailing list, 6897-students@theory.csail.mit.edu, just in case you didn't receive or misplaced one.

  1. Thu, 9 Jan 2003 14:59:24 -- 6.897 scheduled
  2. Thu, 23 Jan 2003 08:59:59 -- First class on February 10
  3. Thu, 23 Jan 2003 09:47:24 -- Re: First class on February 10
  4. Sun, 9 Feb 2003 19:13:22 -- L1: van Emde Boas
  5. Tue, 11 Feb 2003 23:02:05 -- Class MOVED tomorrow
  6. Tue, 11 Feb 2003 23:08:47 -- Class MOVED tomorrow
  7. Wed, 12 Feb 2003 13:30:28 -- Tuesday is Monday; scribing
  8. Mon, 17 Feb 2003 19:00:15 -- Tomorrow's lecture; more on scribing
  9. Tue, 18 Feb 2003 06:47:48 -- Class canceled; enjoy snow!
  10. Sun, 23 Feb 2003 22:24:07 -- Tomorrow's lecture; more on scribing; notes
  11. Wed, 26 Feb 2003 09:05:29 -- Class today; projects & ideas; progress
  12. Sun, 2 Mar 2003 18:21:03 -- Tomorrow's lecture; choosing scribe dates & project
  13. Wed, 5 Mar 2003 09:43:03 -- Today's lecture; please sign up to scribe
  14. Mon, 10 Mar 2003 09:03:29 -- Class today
  15. Wed, 12 Mar 2003 09:57:56 -- Class today; project reminder
  16. Mon, 31 Mar 2003 08:23:36 -- Today's lecture: strings
  17. Wed, 2 Apr 2003 09:45:44 -- Today's lecture: LCAs & building suffix trees
  18. Mon, 7 Apr 2003 08:53:20 -- Today's lecture: succinct data structures
  19. Wed, 9 Apr 2003 09:36:32 -- Today's lecture: compression
  20. Sun, 13 Apr 2003 23:32:03 -- Tomorrow's class
  21. Tue, 15 Apr 2003 09:01:31 -- Scribes notes up for L1,2,3,4,6,9,10
  22. Tue, 15 Apr 2003 22:58:21 -- Class tomorrow
  23. Sun, 20 Apr 2003 11:47:50 -- Cache-oblivious lecture notes
  24. Wed, 23 Apr 2003 10:31:00 -- Class today
  25. Mon, 28 Apr 2003 10:40:42 -- Today's class: cache-oblivious sorting & priority queues
  26. Mon, 28 Apr 2003 16:05:34 -- Presentations
  27. Wed, 30 Apr 2003 08:35:38 -- Today's lecture: tree layout
  28. Mon, 5 May 2003 10:01:28 -- Today's class: fault tolerance
  29. Tue, 6 May 2003 16:27:53 -- Class presentations tomorrow
  30. Tue, 6 May 2003 21:59:42 -- Slides
  31. Sun, 11 May 2003 21:34:54 -- Class presentations tomorrow (round II)
  32. Wed, 14 May 2003 08:45:11 -- Final round of presentations today

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Thu, 9 Jan 2003 14:59:24 -0500 (EST)
Subject: 6.897 scheduled

Thanks for your interest in 6.897 (Advanced Data Structures)!

Based on all your responses, the scheduled time for the class is
Mondays and Wednesdays at 11:00am-12:30pm.
I'll let you know once the room is scheduled.

Cheers,
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Thu, 23 Jan 2003 08:59:59 -0500 (EST)
Subject: First class on February 10

Dear Students,

The first lecture of 6.897 (Advanced Data Structures) will be on February 10,
because I'm at a conference the first week of the semester.

The room is still being booked; I'll keep you posted.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Thu, 23 Jan 2003 09:47:24 -0500 (EST)
Subject: Re: First class on February 10

On Thu, 23 Jan 2003, Erik Demaine wrote:

> The room is still being booked; I'll keep you posted.

We have 26-168.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Sun, 9 Feb 2003 19:13:22 -0500 (EST)
Subject: L1: van Emde Boas

Tomorrow will be the first lecture in 6.897 (Advanced Data Structures),
at 11-12:30 in 26-168.  I'll start out with a bit of administrivia,
but will quickly delve into the fun stuff.

Both lectures this week will be about "search structures in a fixed universe".
The problem is simple: you want to maintain a dynamic set of integers between
1 and u, and you want to be able to insert, delete, and find successors and
predecessors.  Unlike hash tables, you don't just want to tell whether a query
element is in the set, but rather you want to know, if your query element isn't
in the set, what are the two closest elements (next smallest and next largest).
In other words, an unsuccessful search should report where that element would
fit if it were there.

We'll start out with one of my overall favorite data structures,
designed by Peter van Emde Boas in 1975.  This structure supports all
operations in O(lg lg u) time.  If u is subexponential in n, this is a big
improvement over the usual O(lg n) binary search trees.

Some of you may have seen the van Emde Boas data structure before, but
I'll cover it cleaner and in more depth than usual, and it should be
a lot of fun.

See you then,
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 11 Feb 2003 23:02:05 -0500 (EST)
Subject: Class MOVED tomorrow

We got a (somewhat) larger lecture room for the rest of the semester:
24-121.  It's on the second floor of building 24, which you can get to e.g.
by going above the big lecture room in building 34 (34-100?).

Tomorrow's lecture will be about more fun topics related to van Emde Boas,
the O(lg lg u) structure for insert/delete/successor:
  - Recap + Delete
  - Alternative view of the same structure as a (stratified) tree
  - Reducing the space to O(n) instead of Theta(u)
  - y-fast trees, a simple alternative that uses randomization + indirection
  - O(1)-time (!) insert/delete/successor in the "RAMBO" model

I'll probably cover a couple of tools along the way, because they'll be
necessary in some of these structures and at various times later on:
  - dynamic perfect hashing
  - some basic bit tricks:
    finding the least-significant 1 bit in a (lg n)-bit word
  - the "transdichotomous" model of computation

See you then!
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 11 Feb 2003 23:08:47 -0500 (Eastern Standard Time)
Subject: Class MOVED tomorrow

We got a (somewhat) larger lecture room for the rest of the semester:
24-121.  It's on the second floor of building 24, which you can get to e.g.
by going above the big lecture room in building 34 (34-100?).

Tomorrow's lecture will be about more fun topics related to van Emde Boas,
the O(lg lg u) structure for insert/delete/successor:
  - Recap + Delete
  - Alternative view of the same structure as a (stratified) tree
  - Reducing the space to O(n) instead of Theta(u)
  - y-fast trees, a simple alternative that uses randomization + indirection
  - O(1)-time (!) insert/delete/successor in the "RAMBO" model

I'll probably cover a couple of tools along the way, because they'll be
necessary in some of these structures and at various times later on:
  - dynamic perfect hashing
  - some basic bit tricks:
    finding the least-significant 1 bit in a (lg n)-bit word
  - the "transdichotomous" model of computation

See you then!
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 12 Feb 2003 13:30:28 -0500 (EST)
Subject: Tuesday is Monday; scribing

Dear Students,

Next Monday (Feb. 17) is a holiday, and instead Tuesday (Feb. 18) plays the
role of Monday classes.  So your Tuesday classes are cancelled, and your
Monday classes (like 6.897) move to Tuesday.  (It's Tuesday's special week.)

Scribes, let's try to keep a fast pace.  I'd recommend scribing the same day
as the class if you can, because then everything's still fresh.  But as an
upper bound, you should send me the notes by one week after the lecture
you're scribing.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 17 Feb 2003 19:00:15 -0500 (EST)
Subject: Tomorrow's lecture; more on scribing

Assuming MIT classes aren't cancelled because of snow,
there will be a 6.897 lecture tomorrow (Tuesday Feb. 18) at 11am in 24-121.
Topics:

  - We'll first quickly finish up the van Emde Boas / fixed-universe successor
    topic by presenting a *constant-time* solution on a slightly different
    machine model, the RAMBO.  If you're a fan of Sylvester Stallone, this
    may be exactly the action-packed lecture that you haven't been waiting for.
    The solution is nice & simple, given the intuition we have so far.

  - Along the way, we'll see our first "bit tricks": how to compute the
    least-significant 1-bit in a word of memory in constant time,
    on a (natural) transdichotomous RAM.  This tool is really useful &
    practical; we'll use it all the time, as do various systems such as
    the Linux kernel.

  - Then we cover *fusion trees*, which solve a closely related problem,
    again successor queries, basically what normal search trees do,
    but without the fixed-universe assumption.  I'll briefly mention why
    Theta(lg n) is the best you can do in the comparision model (or something
    more general called the decision-tree model).  Then I'll show how fusion
    trees do better--Theta(sqrt(lg n))--on the transdichotomous RAM.
    This is very slick.  Along the way, we see a bunch more bit tricks,
    and even use van Emde Boas as a subroutine!

------------------

I am a little behind on processing scribe notes.  Meanwhile, I've prepared
some general guidelines for scribing and what your notes should cover and how.
See
        http://theory.lcs.mit.edu/classes/6.897/spring03/requirements.html
Students enrolled in the class should read these over so that you know what's
involved (because I was pretty vague about this before).
Also, to move things along faster, I've changed the deadlines: you should now
send me the first draft of your scribe notes by *2 days* after the lecture.

Those who have scribed so far:
after class, let's set up a meeting to discuss what you have.

See you tomorrow!
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 18 Feb 2003 06:47:48 -0500 (Eastern Standard Time)
Subject: Class canceled; enjoy snow!

OK, I checked 617-253-SNOW, and MIT classes are canceled today.
So Tuesday returns to being Tuesday, and 6.897 is canceled too.
The fusion-tree lecture will be delayed to tomorrow (Wednesday),
back to our usual schedule.

Enjoy the feet of snow!
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Sun, 23 Feb 2003 22:24:07 -0500 (EST)
Subject: Tomorrow's lecture; more on scribing; notes

Tomorrow's 6.897 lecture (Monday Feb. 24 at 11am in 24-121) will continue on
fusion trees.  (We only scratched the surface on Wednesday, so you needn't
worry if you missed the last lecture.)

    Fusion trees solve a closely related problem to van Emde Boas,
    again successor queries, basically what normal search trees do,
    but without the fixed-universe assumption.  I'll show how fusion
    trees do better than lg n--Theta(sqrt(lg n))--on the transdichotomous RAM.
    This is very slick.  Along the way, we see a bunch more bit tricks,
    and even use van Emde Boas as a subroutine!

--------

I've posted a bit more information on the "requirements" webpage about
scribe notes:
  - A LaTeX template which sets the style.
  - A sample set of scribe notes from another, related class.

--------

I've started two webpages with two different sets of lecture notes:

  - Scribe notes: this is where scribe notes from students wil go,
    though currently none are complete (still working on it)

  - Erik's notes: my handwritten notes that I use to lecture.
    The idea is that these will help students catch up if they miss class,
    and if anyone wants to look back at recent lectures for which scribe
    notes aren't yet available.  However, these notes are rough and
    incomplete, and do not serve as replacements to scribe notes or
    to attending lecture.

http://theory.lcs.mit.edu/classes/6.897/spring03/

See you tomorrow!
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 26 Feb 2003 09:05:29 -0500 (Eastern Standard Time)
Subject: Class today; projects & ideas; progress

In today's class, we'll

  - Finish up fusion trees by making them dynamic and faster.

  - Start a rather different topic: self-organizing data structures.
    Here we'll mainly work in the comparison model, and try to exploit
    properties of the input distribution without actually knowing it.

    We'll start with the simplest version of this problem, searching in a
    linked list, which is already very interesting, and in which there are
    some open problems and some very recent progress.
    You may have seen Move-To-Front and related heuristics; we'll analyze those
    and compare them to several notions of the "optimal" strategy.

My written notes for this lecture (#5) are up:
        http://theory.lcs.mit.edu/classes/6.897/spring03/erik_notes/

------------------------------------------------------------------------------

I've started a new webpage whose main purpose is to collect ideas for open
problems to work on for projects:
        http://theory.lcs.mit.edu/classes/6.897/spring03/projects.html

I'll continue to add to this page as I think up or hear more problems.
They vary in difficulty.  You do not have to be taking the class
for credit to work on these problems, but please let me know if you
spend a nontrivial amount of time on a problem, so that I can add you to
the list of "interested students" for the problem, so that we can all
collaborate instead of working independently.

------------------------------------------------------------------------------

There has already been progress on the first problem, which I posed in the last
lecture: speeding up planar point location in a strip through fusion trees.
Ian Eslick and I came up with a couple of ideas which seem like they might
work, but give us a few days to write it down (or get stuck again).

If you don't mind, I'll use the class mailing list as a discussion list for
open problems, so that multiple students can contribute.
(Once we get the strip case, we'll need to reduce the space usage, and then
 adapt it to the general planar map case; so there is more to be done.)

http://theory.lcs.mit.edu/classes/6.897/spring03/

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Sun, 2 Mar 2003 18:21:03 -0500 (EST)
Subject: Tomorrow's lecture; choosing scribe dates & project

Tomorrow I'll continue on with self-organizing data structures, specifically
linear search:

  - I'll prove that Move-To-Front is not only statically optimal but
    *dynamically* optimal--as good as any dynamic list, up to a factor of 2.
    However, this is specific to Sleator & Tarjan's cost model.

  - I'll mention some open problems relating to improving the factor of 2
    in the randomized context.  ("Randomized self-organizing linear search"
    on the projects page.)

  - Then I'll prove that an omniscient algorithm Order-By-Next-Request
    is exponentially faster than Move-To-Front or any other online algorithm,
    for a particular request sequence (list the elements in order).

  - (If there's time,) I'll prove that the cost of Order-By-Next-Request
    is within a constant factor of the entropy of the "distribution" induced by
    the request sequence, which is pretty impressive for linear search.

I also added a new open problem about this last algorithm; see
"Offline self-organizing linear search" on the projects page.

My written notes for this lecture (#6) are now on the webpage.

------------------------------------------------------------------------------

To make sure everyone gets a turn at scribing, I'd like to switch from
ad-hoc volunteers to assigned scribing.  I think we have more students than
lectures, so we can start doubling up, two scribes per lecture.
So, please check out
        http://theory.lcs.mit.edu/classes/6.897/spring03/scribe_notes/
and let me know a few dates during which you'll definitely be here
and you'd like to sign up for scribing.
(Sorry, I can't predict what topics will be covered when at this point.)

------------------------------------------------------------------------------

In particular, some advanced warning that I'll be away March 13-26,
though I will be in email contact.  Because of spring break, this only hits
two lectures--March 17 & 19--which I'm tentatively canceling.

During this time window, March 13-26, you should pick a project, so that we
can discuss it when I return (either March 27 of spring break, or the
following week).  At that point, we are roughly half way through the semester,
so you should be started on your project.

http://theory.lcs.mit.edu/classes/6.897/spring03/

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 5 Mar 2003 09:43:03 -0500 (EST)
Subject: Today's lecture; please sign up to scribe

Today's lecture will finish up self-organizing lists by showing that
Munro's Order-By-Next-Request strategy is not only better than all online
algorithms, but amortized it achieves the "entropy" bound, which is the best
possible for any comparison-based search structure in the stochastic model.

Then we'll move on to trees, first optimal binary search trees
(which play the role of "static OPT" but are much more interesting).
Then we'll cover self-organizing trees and splay trees, how they work,
and describe some of the bounds known about them.

------------------------------------------------------------------------------

I've updated the scribe page to sign up people who have sent me email so far
(which is not very many).  Please send me email about dates you can scribe.

Thanks,
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 10 Mar 2003 09:03:29 -0500 (Eastern Standard Time)
Subject: Class today

This week's lectures are about recent work attacking the
Dynamic Splay Conjecture.

Today we'll cover something called the "unified structure", which has
a property called the "unified property", both developed by Iacono in 2001.
The unified property is stronger than both the dynamic finger property and
the working set property (both obtained by splay trees), and in this sense
simultaneously captures both spatial and temporal locality.
Roughly, it says that it's cheap to access an element that's pretty close
(in rank) to an element you've accessed recently.
It's open whether splay trees also have the unified property.

Along the way, we'll see a very simple way to get a "working set structure"
with the working set property.  The general idea in this data structure--
exponentially increasing substructures--is used in many other data structures.

See you there,
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 12 Mar 2003 09:57:56 -0500 (EST)
Subject: Class today; project reminder

Today's class will the last about results related to splay trees.
We'll finish up the (rather complicated) unified structure.
Then I'll describe some other recent and cool results (without proofs)
and conjectures, leading to a couple more interesting (and relatively
unstudied) open problems related to the dynamic splay conjecture.

A reminder that I am going on a trip starting this afternoon,
and there are no classes next week.

When I get back (during spring break), I expect you all to have thought
seriously about what you'd like to work on for your project.
Then we can meet in the end of spring break, or the following week.
I'll be in email contact throughout, so feel free to email me your ideas.

It's important that you start working on your project now, so that there will
be time for you to present your project in the final two weeks of the semester.
Take a look at the open problems on the projects page; I add to it as I get
new ideas.  And if you are interested in a problem that another student has
marked "interested", please talk to each other to avoid overlap and
encourage collaboration.

    http://theory.lcs.mit.edu/classes/6.897/spring03/projects.html

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 31 Mar 2003 08:23:36 -0500 (Eastern Standard Time)
Subject: Today's lecture: strings

Welcome back from spring break.

Today's lecture will be about string data structures:

  - I'll briefly mention what's known about normal substring searching.
    (Basically, time is linear in the text + substring.)

  - Then we'll see how this can be vastly improved if we can preprocess
    the text in which we want to do substring searches, using a *suffix tree*.
    (Basically, time becomes linear in just the substring.)

  - We'll see how to build suffix trees in linear time.
    (It's already cool that they take linear space--linear time is trickier!)

  - I'll talk about Muthukrishnan's recent related result about
    *document retrieval*: finding the set of (distinct) documents that contain
    a substring, even faster than normal suffix trees.

  - This will lead us into range minimum queries, which can be solved using
    least common ancestors.

  - If there's time, I'll show how to do (static) least common ancestors
    in a tree in constant time, linear space, and linear preprocessing.

See you soon,
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 2 Apr 2003 09:45:44 -0500 (EST)
Subject: Today's lecture: LCAs & building suffix trees

Today's lecture will get started by solving the (static) LCA problem
in constant time per operation after linear preprocessing.
As we saw last class, this result has several implications on range minimum
queries, document retrieval, suffix trees, and various string operations.

Then we'll cover how to build suffix trees in linear time (once you've sorted
the alphabet).  This is quite a sophisticated algorithm, and involves a lot
of cool techniques involved in many string algorithms.

See you there,
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 7 Apr 2003 08:53:20 -0400 (Eastern Daylight Time)
Subject: Today's lecture: succinct data structures

Today's class will be about "succinct data structures", typically static
data structures that use the information-theoretic optimum amount of space
up to lower-order terms, while still supporting the needed navigational
operations quickly.

In particular, we'll cover a couple of encodings of binary trees/tries, which
use 2 n + o(n) bits of space, instead of the usual Theta(n) *words* of space
occupied by child and/or parent pointers.  These encodings still support
left-child/right-child/parent navigation operations in constant time.  The
second encoding will also support subtree size, and computing the index of a
particular external node, making it usable as a binary suffix trie.

Along the way, we'll cover o(n)-size structures for computing the rank of a
bit in a bitstring, i.e., the number of 1's to the left of that bit.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 9 Apr 2003 09:36:32 -0400 (Eastern Daylight Time)
Subject: Today's lecture: compression

Today's class will be about compression.  If I can squeeze it all in (ahem),
I'll cover the following topics:

  - Huffman codes (to the extent you don't already know them).
    These are optimal prefix codes, and are within an additive constant of
    entropy, but can be far away from entropy in the multiplicative sense.

  - Arithmetic coding.  Allows us to mix together codes from adjacent letters
    to get entropy plus an additive constant over the whole string, instead of
    letter by letter.  For a long string, we basically get entropy.

  - Higher-order entropy.  Suppose you consider the probability of a letter
    occuring given k previous letters of context.  You can generalize entropy
    to capture this notion, and it gives a lower bound on compression schemes
    that use only k previous letters.

  - Burrows-Wheeler transform.
    One of the more recent & cool ideas for compression.
    I'll describe the algorithm, why it's so cool, and tell you about some
    brand new analysis that suggests why it's so good, connecting it to
    higher-order entropy.
    Along the way, we'll use linear-time suffix-tree construction,
    move-to-front, and arithmetic encoding!

My apologies for being behind on (a) responding about project proposals and
(b) posting scribe notes.  I will get on top of these soon!

http://theory.lcs.mit.edu/classes/6.897/spring03/

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Sun, 13 Apr 2003 23:32:03 -0400 (EDT)
Subject: Tomorrow's class

Tomorrow's class will start moving in the direction of external-memory
and cache-efficient algorithms.

We'll spend most of our time talking about the "ordered-file maintenance"
problem, which asks you to maintain a bunch of elements in order in an array,
subject to being able to insert and delete in the middle of the array.
The naive solution does this in O(n) time per update, but it turns out that
you can do it in O(lg^2 n) time per update!

We'll see how this can solve the related problem of order queries (which
element comes first?) in a linked list in O(1) time per update.

Then, as time allows, I'll start discussing the external-memory model
and the cache-oblivious model, which we'll cover in some depth during
the next few classes.


I have (finally) started putting up scribe notes; more to follow!
        http://theory.lcs.mit.edu/classes/6.897/spring03/scribe_notes/

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 15 Apr 2003 09:01:31 -0400 (EDT)
Subject: Scribes notes up for L1,2,3,4,6,9,10

All scribe notes for which I have the final form from students (1,2,3,4,6,9,10)
are now posted.  Authors of these notes should look over the posted version
to confirm they look OK.  Everyone should let me know if they spot any bugs or
have other comments on the notes.

http://theory.lcs.mit.edu/classes/6.897/spring03/scribe_notes/

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 15 Apr 2003 22:58:21 -0400 (EDT)
Subject: Class tomorrow

Tomorrow's lecture will be more about the external-memory and cache-oblivious
models.  We'll cover the following topics:

  - Specifics and justification of cache-oblivious model, in particular,
    block-replacement strategies and cache associativity.

  - The familiar "linked list" problem, but now in external memory.
    We can do inserts and deletes in O(1) memory transfers, and traversals
    of K elements in the list using O(1+K/B) memory transfers.

  - The familiar linked list problem, but now in the cache-oblivious model.

    - We can obtain the same bounds as above if they are both allowed to be
      amortized; in this case, the traversals "self-adjust" the data structure
      to fix what the updates broke.

    - Using the order-file-maintenance data structure, we can obtain the
      traversal bound worst-case, with an update cost of O((lg^2 N)/B).

    - Allowing the traversal bound to slide a little, we can reduce the
      update cost to O((lg lg N)^(2+epsilon) / B).

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Sun, 20 Apr 2003 11:47:50 -0400 (Eastern Daylight Time)
Subject: Cache-oblivious lecture notes

I've just posted a survey paper about cache-oblivious algorithms and data
structures, which likely covers everything we will cover in class, and more,
including references to the latest work in the area.  These will hopefully
serve as a useful reference in addition to my lecture notes and scribe notes.

        http://theory.lcs.mit.edu/~edemaine/papers/BRICS2002/

On the other side, I'd appreciate any comments you have on these notes
(especially any bugs you spot), before they go to press in a volume of
Lecture Notes in Computer Science.  Thanks!


Also, a reminder that tomorrow (Monday) is a holiday, so there is no class.

http://theory.lcs.mit.edu/classes/6.897/spring03/

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 23 Apr 2003 10:31:00 -0400 (EDT)
Subject: Class today

Today's class will continue on the cache-oblivious theme:

  - In general, we'll discuss divide-and-conquer as a tool for
    cache-oblivious design.
  - As a simple example, we'll see how to do cache-oblivious median/selection.
  - We'll see how to do "binary search" by way of a static search tree.
    (Normal binary search is *not* optimal.)
  - Building on this structure, we'll see how to do cache-oblivious B-trees.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 28 Apr 2003 10:40:42 -0400 (EDT)
Subject: Today's class: cache-oblivious sorting & priority queues

Today's class will be about cache-oblivious sorting & priority queues.
We'll start with an algorithm called Funnelsort, and then cover a dynamic
version called Funnelheap.  Both of these algorithms use a common assumption
about B and M, called the tall-cache assumption: M = Omega(B^(1+gamma))
for some gamma > 0.  The result is that each element can be handled in
O((1/B) log_(M/B) (N/B)) amortized memory transfers, which is optimal.
Notice that this amortized bound is strictly less than 1!


A reminder about projects.  Project presentations will be during the last 3
lectures, May 7, 12, and 14.  Because there are many projects, you should plan
to give one presentation for an entire group project, ideally divided among the
members of the group.  Please let me know if you are willing to present during
the earlier days--May 7 or 12.

Also, the project paper is due on the day of the last class, May 14.
This deadline is imposed by MIT, because this class does not have a final exam,
and the project should not conflict with other classes with final exams.
This deadline is fast approaching!

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 28 Apr 2003 16:05:34 -0400 (EDT)
Subject: Presentations

Several people have asked me what equipment will be available for the
project presentations.  I will make available both an overhead projector
and a data projector, so you can give your talk with either overhead
transparencies or a laptop.  If you want, you can email me PowerPoint or PDF
slides and use my laptop, but be warned that PowerPoint presentations are not
always portable (you should Include Fonts to minimize the risk).
There won't be time for blackboard talks, so these are your options.

I'll post who's speaking when, as they sign up, to the following webpage.

    http://theory.lcs.mit.edu/classes/6.897/spring03/scribe_notes/

Keep in mind that you needn't have as many results if you are presenting
earlier (although it is only a matter of a week).  All presentations must
describe a good, clear problem and why it's interesting.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 30 Apr 2003 08:35:38 -0400 (EDT)
Subject: Today's lecture: tree layout

The topic of Today's lecture will mix two topics we've seen:
tries (as arise in string matching, etc.) and memory hierarchies
(external memory & cache oblivious).  Specifically, we'll look at the
rather fundamental problem of *tree layout*: given a tree of fixed-topology
(e.g., a trie), how should it be stored in memory so as to minimize the
number of memory transfers required to follow a root-to-leaf path?
This is a static data structure problem.

There are two main subproblems here:
  - How do we minimize the maximum cost over all root-to-leaf paths?
  - How do we minimize the expected cost, given a stochastic distribution
    of the leaf accesses?
Both problems can be solved optimally in polynomial time in the external
memory model, and furthermore can be solved optimally up to constant factors
in the cache-oblivious model.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Mon, 5 May 2003 10:01:28 -0400 (Eastern Daylight Time)
Subject: Today's class: fault tolerance

Today's class (which is the last one where I'll be the lecturer)
is about fault-tolerant data structures on a pointer machine.
Most data structures we see are not tolerant to memory faults.
For example, a linked list could be completely lost if the first node became
faulty.  The idea in fault-tolerant data structures is to contain the damage
made by faults by as much as possible, without any replication.  Specifically,
up to a reasonable limit on faults, we can guarantee that f faults can cause
only O(f) nodes to be lost, and the data structure can be recovered to a
consistent state that preserves the desired properties of the structure.
Such a result is possible for (at least) stacks, linked lists, and trees;
I'll focus on the first two.

Along the way, we'll get to use data structures for list labeling.

See you there,
Erik

P.S. If you have not yet signed up for a presentation or are not listed on
        http://theory.lcs.mit.edu/classes/6.897/spring03/scribe_notes/
     then send me email now!
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 6 May 2003 16:27:53 -0400 (EDT)
Subject: Class presentations tomorrow

I'd like to start the remaining classes right at 11am instead of the usual
11:05, to make a little more room in the (already tight) presentation schedule.
I hope this isn't a problem for anyone.

Tomorrow we'll have the following 6 groups presenting:

    * XML-related data structures -- David Malan
    * Partial-sum data structures -- Mihai Patrascu
    * RAMBO data structures -- Jonathan Brunsman and Sonia Garg
    * RAMBO data structures -- Loizos Michael
    * Offline linear search -- Ilya Baran, Denis Cebikins, and Lev Teytelman
    * Nearest-neighbor data structures -- Ray Jones 

Each presentation will be at most 15 minutes, including questions & switchover
time.  So try to keep the body of your presentation at most ~12 minutes.

In each of the two classes next week, there are 7 presentations,
so each presentation is less than 13 minutes including questions & switchover,
so the body should be at most ~10 minutes.

Please think hard about how to fit the problem description, motivation,
intuition, and high-level descriptions of ideas/results into such a short
presentation!

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Tue, 6 May 2003 21:59:42 -0400 (EDT)
Subject: Slides

Dear Students,

As part of the class requirements, please email me a copy of your presentation
(unless it is not electronic, in which case please give me a photocopy).
If you can email/bring me a copy *before* your presentation,
that would be ideal.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Sun, 11 May 2003 21:34:54 -0400 (Eastern Daylight Time)
Subject: Class presentations tomorrow (round II)

A reminder that classes this week will start at exactly 11am (not 11:05am),
to give a few more minutes for presentations.

Tomorrow we'll have the following 7 groups presenting in this order:

    * Amorphous data structures -- Jake Beal
    * Financial data structures -- William Kwong
    * Distributed data structures -- Ben Leong
    * Distributed data structures -- Rui Fan and Sayan Mitra
    * Higher-dimensional RAMBO data structures -- Chris Collier
    * Higher-dimensional van Emde Boas -- Sid Sen and Shantonu Sen
    * String data structures -- Alexandr Andoni and Cristian Cadar

As I mentioned before, each presentation is less than 13 minutes including
questions & switchover, so the body should be at most ~10 minutes.

Please send me your slides if you haven't already.

See you there!
Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/

From: Erik Demaine <edemaine@mit.edu>
To: 6.897 students <6897-students@theory.lcs.mit.edu>
Date: Wed, 14 May 2003 08:45:11 -0400 (EDT)
Subject: Final round of presentations today

Today's speakers will be

    * Minimum spanning hypertrees -- Percy Liang
    * Nearest-neighbor data structures -- Gautam Jayaraman
    * Dynamic text indexing -- Hans Robertson
    * Proximate point location -- Jeff Lindy
    * Fast and small text indexing -- Lance Anderson and Wayland Ni
    * Faster hashing -- Vladimir Kiriansky
    * Faster planar point location -- Glenn Eguchi and Hana Kim
    * Faster planar point location -- Alan Leung

As in the previous round, each presentation is less than 13 minutes including
questions & switchover, so the body should be at most ~10 minutes.

However, today we have one last-minute signup for a presentation, which will
run over the allotted time slot of 11-12:30.  Those of you who can stay,
please do; but I understand if you aren't able.

Please send me your slides if you haven't already.

Erik
-- 
Erik Demaine  |  edemaine@mit.edu  |  http://theory.lcs.mit.edu/~edemaine/