Course Goals and Summary:
This course will explore the recent theoretical literature on distributed
algorithms, emphasizing algorithms for highly ``dynamic'' settings such
as the Internet or mobile computing systems. In such environments, processes
and channels may fail and recover, and may exhibit variations in their
timing behavior. Participants in the algorithms may join and leave the
system (as well as fail) during the course of computation. Algorithms must
adapt to such changes, for example, reconfiguring on-the-fly.
The traditional theory of distributed algorithms deals mostly with fairly
``static'' settings, e.g., fixed networks or shared memory systems with
a known set of participants. This theory must be ``stretched'' to cover
the newer, more dynamic settings. New algorithms are needed to cope with
the new possibilities. New lower bounds and other impossibility results
will arise, based on new uncertainties. All of this work will require new
models, and new metrics for evaluating performance and fault-tolerance.
The papers to be covered in the course represent the current state-of-the-art
in the development of such a theory. Clearly, much more remains to be done.
When we read the papers, we will consider what role their ideas can play
in this theory, and what new work they suggest. This should help interested
students to find particular questions that they can pursue for independent
research.
Problems of special interest will include:
(1) Communication, including various kinds of multicast (e.g., causal,
totally-ordered, atomic), gossiping protocols, and view-oriented group
communication.
(2) Distributed data management, including weak and strong consistency
guarantees, replicated data management, and quorum-based computing. Other
problems we will consider will include failure detection, reaching consensus,
resource allocation, performing distributed work, clock synchronization,
and namespace management.
Since this will be a theory course, we will focus on algorithms for
which it is possible to make definite, precise claims about what they do.
These claims will include correctness, performance, and fault-tolerance
results. We will also consider impossibility results, especially those
that express inherent limitations on what can be accomplished in dynamic
settings. Some not-so-theoretical papers with interesting systems ideas
will also be covered; for these, we will discuss possibilities for future
related theoretical work. |