Massachusetts Institute of Technology
6.852 Distributed Algorithms
Prof. Nancy Lynch September 10, 2015

6.852 Supplementary Reading List, Version 1


1. Books and monographs

[1] Hagit Attiya and Jennifer Welch. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. John Wiley and Sons, Inc., 2004. Second Edition.

[2] Shlomi Dolev. Self-Stabilization. MIT Press, 2000.

[3] David Peleg. Distributed Computing: A Locality-Sensitive Approach. SIAM, 2000.

Morgan & Claypool publishers has a monograph series on various topics in distributed computing theory. These are downloadable for free from any MIT web site. Some interesting ones are listed below, but you should browse the entire list of titles in the series from the Morgan & Claypool web site, www.morganclaypool.com.

[4] Hagit Attiya and Faith Ellen. Impossibility Results for Distributed Computing. Synthesis Lectures on Distributed Computing Theory, Morgan & Claypool Publishers, 2014. .pdf

[5] Leonid Barenboim and Michael Elkin. Distributed Graph Coloring: Fundamentals and Recent Developments. Synthesis Lectures on Distributed Computing Theory, Morgan & Claypool Publishers, 2013. .pdf

[6] Michel Raynal. Fault-Tolerant Agreement in Synchronous Message-passing Systems. Synthesis Lectures on Distributed Computing Theory, Morgan Claypool Publishers, 2010. .pdf

[7] Michel Raynal. Communication and Agreement Abstractions for Fault-tolerant Asynchronous Distributed Systems. Synthesis Lectures on Distributed Computing Theory, Morgan Claypool Publishers, 2010. .pdf

[8] Dilsun K. Kaynar, Nancy Lynch, Roberto Segala, and Frits Vaandrager. The Theory of Timed I/O Automata, Second Edition. Synthesis Lectures on Distributed Computing Theory, Morgan Claypool Publishers, 2010. .pdf

[9] Jennifer L. Welch and Jennifer E. Walter. Link Reversal Algorithms. Synthesis Lectures on Distributed Computing Theory, Morgan & Claypool Publishers, 2011. .pdf

[10] Chryssis Georgiou and Alexander A. Shvartsman. Cooperative Task-Oriented Computing: Algorithms and Complexity. Synthesis Lectures on Distributed Computing Theory, Morgan & Claypool Publishers, 2011. .pdf

[11] Marko Vukolic. Quorum Systems: With Applications to Storage and Consensus. Synthesis Lectures on Distributed Computing Theory, Morgan & Claypool Publishers, 2012. .pdf

2. Dijkstra Prize papers

In each of the years 2000-2015, a prize has been awarded to a research paper that has had a strong impact on research in the area of distributed computing. We will study the key contributions of many of these papers this semester. In case you want to read the original papers for yourselves, here is the complete list:

[12] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558-565, July, 1978. .pdf

[13] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, April, 1985. .pdf

[14] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11):643--644, November, 1974. .pdf

[15] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, January, 1991. .pdf

[16] Robert G. Gallager, Pierre A. Humblet, and Phillip M. Spira. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Language Systems, vol. 5, pages 66--77, 1983. .pdf

[17] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228-234, April, 1980. .pdf

[18] John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21-65, February, 1991. .pdf

[19] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288--323, April 1988. .pdf

[20] Baruch Awerbuch and David Peleg. Sparse partitions. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science (FOCS), pages 503-513, October 1990. .pdf

[21] Joseph Halpern and Yoram Moses. Knowledge and common knowledge in a distributed environment. Journal of the ACM, 37(3):549-587, July 1990. .pdf

[22] Tushar Deepak Chandra and Sam Toueg Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225-267, 1996. .pdf

[23] Tushar D. Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685-722, 1996. .pdf

[24] Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. Sharing memory robustly in message-passing systems. Journal of the ACM, 42(1):124-142, 1995. .pdf

[25] Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289-300, May 1993. .pdf

[26] Nir Shavit and Dan Touitou. Software transactional memory. Distributed Computing, 10(2):99-116, February 1997. .pdf

[27] Nati Linial. Locality in distributed graph algorithms. SIAM Journal of Computing, 21(1):193-201, February 1992. .pdf

[28] Kanianthra Mani Chandy and Leslie Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. In ACM Transactions on Computer Systems, 3(1):63-75, 1985. .pdf

[29] Michael Ben-Or. Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols. In Proceedings of the Second ACM Symposium on Principles of Distributed Computing, pages 27-30, 1983. .pdf

[30] Michael O. Rabin. Randomized Byzantine Generals. In Proceedings of the Twenty-Fourth IEEE Annual Symposium on Foundations of Computer Science, pages 403-409, 1983. .pdf

3. Synchronous networks

[31] David Peleg. Distributed Computing: A Locality-Sensitive Approach. SIAM, 2000.

[32] Mohsen Ghaffari and Stephan Holzer. 6.S899 Distributed Graph Algorithms: Fall 2014. Course Notes

[33] Robert G. Gallager, Pierre A. Humblet, and Phillip M. Spira. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Language Systems, vol. 5, pages 66--77, 1983. .pdf

[34] Michael Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal of Computing, 15(4):1036-1053, November, 1986. .pdf

[35] Yves Metivier, John Michael Robson, Nasser Saheb-Djahromi and Akka Zemmari. An optimal bit complexity randomized distributed MIS algorithm. Distributed Computing, 23(5-6):331-340, April, 2011. .pdf

[36] Shay Kutten and David Peleg. Fast distributed construction of k-dominating sets and applications. In Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, pages 238-251, 1995. .pdf

[37] Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228-234, April, 1980. .pdf

[38] Marcos Kawazoe Aguilera and Sam Toueg. A simple bivalency proof that $t$-resilient consensus requires t+1 rounds. Information Processing Letters, 71(3-4):155-158, August 1999. .pdf

[39] Idit Keidar and Sergio Rajsbaum. A simple proof of the uniform consensus synchronous lower bound. In Information Processing Letters (IPL), 85(1), pages 47-52, January 2003. .pdf

4. Asynchronous networks without failures

[40] Nancy Lynch and Mark Tuttle. An Introduction to Input/Output Automata. CWI-Quarterly, 2(3):219--246, September 1989. .pdf

[41] Robert G. Gallager, Pierre A. Humblet, and Phillip M. Spira. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Language Systems, vol. 5, pages 66--77, 1983. .pdf

[42] Awerbuch, Baruch. Complexity of network synchronization. Journal of the ACM, 32(4):804-823, 1985. .pdf

[43] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558-565, July, 1978. .pdf

[44] Friedemann Mattern. Virtual time and global states of distributed systems. In Michel Cosnard et~al., editors, Parallel and Distributed Algorithms: Proceedings of the International Workshop on Parallel and Distributed Algorithms, (Chateau de Bonas, Gers, France, October, 1988), pages 215--226. North Holland, 1989. Reprinted in: Z. Yang, T.A. Marsland (Eds.), "Global States and Time in Distributed Systems", IEEE, 1994, pages 123-133. .pdf

[45] Kanianthra Mani Chandy and Leslie Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. In ACM Transactions on Computer Systems, 3(1):63-75, 1985. .pdf

5. Asynchronous shared memory without failures

[46] Edsger W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the ACM 8(9), 1965. .pdf

[47] Edsger W. Dijkstra. Hierarchical ordering of sequential processes. Acta informatica 1(2):115-138, 1971. .pdf

6. Asynchronous shared memory with failures

[48] Michael C. Loui and Hosame H. Abu-Amara. Memory requirements for agreement among unreliable asynchronous processes. Advances in Computing Research, 4:163-183, 1987. MIT Library link

[49] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, January 1991. .pdf

[50] Prasad Jayanti. Robust wait-free hierarchies. Journal of the ACM, 44(4): 592-614, 1997. .pdf

[51] Soma Chaudhuri. More choices allow more faults: set consensus problems in totally asynchronous systems. Information and Computation, 105(1):132-158, July 1993. .pdf

[52] Maurice Herlihy and Nir Shavit. The topological structure of asynchronous computability. Journal of the ACM, 46(6):858-923, 1999. .pdf

[53] Michael E. Saks and Fotios Zaharoglu. Wait-free k-set agreement is impossible: The topology of public knowledge. SIAM Journal of Computing, 29(5):1449-1483, 2000. .pdf

[54] Tushar Deepak Chandra, Vassos Hadzilacos, Prasad Jayanti, and Sam Toueg. Generalized irreducibility of consensus and the equivalence of $t$-resilient and wait-free implementations of consensus. SIAM Journal on Computing, 34(2):333-357, 2004. .pdf

[55] Elizabeth Borowsky, Eli Gafni, Nancy Lynch, and Sergio Rajsbaum. The BG distributed simulation algorithm. Distributed Computing, 14:127-146, 2001. .pdf

[56] Paul Attie, Rachid Guerraoui, Petr Kouznetsov, Nancy Lynch, and Sergio Rajsbaum. The impossibility of boosting distributed service resilience. Information and Computation, 209(6):927-950, June 2011. .pdf

[57] Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. Sharing memory robustly in message-passing systems. Journal of the ACM, 42(1):124-142, 1995. .pdf

7. Asynchronous networks with failures

[58] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374-382, April, 1985. .pdf

[59] Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems , 16(2):133--169, May 1998. Also Research Report 49, Digital Equipment Corporation Systems Research Center, Palo Alto, CA, September 1989. .pdf and .pdf

[60] Tushar Deepak Chandra and Sam Toueg Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225-267, 1996. .pdf

[61] Tuchar D. Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685-722, 1996. .pdf.

[62] Alejandro Cornejo, Nancy Lynch, and Srikanth Sastry. Asynchronous Failure Detectors. Technical Report, MIT-CSAIL-TR-2013-025, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, October 2013. This report supersedes MIT-CSAIL-TR-2013-002. .pdf

[63] Scott M. Pike, Yantao Song, and Srikanth Sastry. Wait-Free Dining Under Eventual Weak Exclusion. In S. Rao, M. Chatterjee, P. Jayanti, C. Siva Ram Murthy, S. Kumar Saha, editors, Distributed Computing and Networking, 9th International Conference (ICDCN 2008), volume 4904, of Lecture Notes in Computer Science, pages 135-146, 2008. Springer. .pdf

[64] Srikanth Sastry, Scott M. Pike, and Jennifer L. Welch. The Weakest Failure Detector for Wait-Free Dining under Eventual Weak Exclusion. Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Calgary, Canada, pages 111-120, August 2009. .pdf

[65] Nancy Lynch and Srikanth Sastry. Consensus using Asynchronous Failure Detectors. Technical Report MIT-CSAIL-TR-2015-006, Computer Science and Artificial Intelligence Laboratory, March 2015. .pdf

8. Self-stabilization

[66] Shlomi Dolev. Self-Stabilization. MIT Press, 2000.

[67] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11):643--644, November 1974. .pdf

[68] George Varghese. Self-Stabilization by Local Checking and Correction. Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, October 1992. Also, MIT/LCS/TR-583. .pdf

9. Dynamic distributed algorithms

[69] Seth Gilbert, Nancy Lynch, and Alex Shvartsman. RAMBO: A robust, reconfigurable atomic memory service for dynamic networks. Distributed Computing, 23(4):225-272, December 2010. .pdf. This is a slightly corrected version of the journal version, .pdf. The changes are explained in the following errata sheet .pdf

[70] Shlomi Dolev, Seth Gilbert, Nancy A. Lynch, Alex A. Shvartsman, and Jennifer L. Welch. GeoQuorums: Implementing Atomic Memory in Mobile Ad Hoc Networks. Distributed Computing, 18(2):125-155, November 2005. Special Issue DISC 2003. .pdf

[71] Tina Nolte. Virtual Stationary Timed Automata for Mobile Networks. Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, February, 2009. .pdf

[72] Fabian Kuhn, Nancy Lynch and Rotem Oshman. Distributed computation in dynamic networks. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC 2010), pages 513-522,Cambridge, MA, June 2010. Also, Technical Report MIT-CSAIL-TR-2009-058, MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, November 2009. .pdf and .pdf

[73] Alejandro Cornejo, Anna Dornhaus, Nancy Lynch, and Radhika Nagpal. Task allocation in ant colonies. Distributed Computing, 2014. 46-60. .pdf

[74] Christoph Lenzen, Nancy Lynch, Calvin Newport, and Tsvetomira Radeva. Trade-offs between Selection Complexity and Performance when Searching the Plane without Communication. In Proceedings of the 33nd Annual ACM Symposium on Principles of Distributed Computing, pages 252-261, 2014. .pdf

[75] Mohsen Ghaffari, Cameron Musco, Tsvetomira Radeva, and Nancy Lynch. Distributed House-Hunting in Ant Colonies. In Proceedings of 34th Annual ACM Symposium on Principles of Distributed Computing, pages 57-66, 2015. .pdf