Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science

6.826 — Principles of Computer Systems

Handout 25 April 23, 1997

________________________________________________________________________

Reliable Messages

Note: This web page was converted automatically from a Word original. There may be problems with the formatting and the pictures. To see the intended form, look for the Postscript (.ps), Acrobat (.pdf), or Word (.doc) versions.

 

The attached paper on reliable messages is Chapter 10 from the book Distributed Systems: Architecture and Implementation, edited by Sape Mullender, Addison-Wesley, 1993. It contains a careful and complete treatment of protocols for ensuring that a message is delivered at most once, and that if there are no serious failures it is delivered exactly once and its delivery is properly acknowledged.

 

Chapter 10

Reliable Messages and
Connection Establishment

Butler W. Lampson

 

10.1 Introduction

Given an unreliable network, we would like to reliably deliver messages from a sender to a receiver. This is the function of the transport layer of the ISO seven-layer cake. It uses the network layer, which provides unreliable message delivery, as a channel for communication between the sender and the receiver.

Ideally we would like to ensure that

• messages are delivered in the order they are sent,

• every message sent is delivered exactly once, and

• an acknowledgement is returned for each delivered message.

Unfortunately, it’s expensive to achieve the second and third goals in spite of crashes and an unreliable network. In particular, it’s not possible to achieve them without making some change to stable state (state that survives a crash) every time a message is received. Why? When we receive a message after a crash, we have to be able to tell whether it has already been delivered. But if delivering the message doesn’t change any state that survives the crash, then we can’t tell.

So if we want a cheap deliver operation which doesn’t require writing stable state, we have to choose between delivering some messages more than once and losing some messages entirely when the receiver crashes. If the effect of a message is idempotent, of course, then duplications are harmless and we will choose the first alternative. But this is rare, and the latter choice is usually the lesser of two evils. It is called ‘at-most-once’ message delivery. Usually the sender also wants an acknowledgement that the message has been delivered, or in case the receiver crashes, an indication that it might have been lost. At-most-once messages with acknowledgements are called ‘reliable’ messages.

There are various ways to implement reliable messages. An implementation is called a ‘protocol’, and we will look at several of them. All are based on the idea of tagging a message with an identifier and transmitting it repeatedly to overcome the unreliability of the channel. The receiver keeps a stock of good identifiers that it has never accepted before; when it sees a message tagged with a good identifier, it accepts it, delivers it, and removes that identifier from the good set. Otherwise, the receiver just discards the message, perhaps after acknowledging it. In order for the sender to be sure that its message will be delivered rather than discarded, it must tag the message with a good identifer.

What makes the implementations tricky is that we expect to lose some state when there is a crash. In particular, the receiver will be keeping track of at least some of its good identifiers in volatile variables, so these identifiers will become bad at the crash. But the sender doesn’t know about the crash, so it will go on using the bad identifiers and thus send messages that the receiver will reject. Different protocols use different methods to keep the sender and the receiver more or less in sync about what identifiers to use.

In practice reliable messages are most often implemented in the form of ‘connections’. The idea is that a connection is ‘established’, any amount of information is sent on the connection, and then the connection is ‘closed’. You can think of this as the sending of a single large message, or as sending the first message using one of the protocols we discuss, and then sending later messages with increasing sequence numbers. Usually connections are full-duplex, so that either end can send independently, and it is often cheaper to establish both directions at the same time. We ignore all these complications in order to concentrate on the essential logic of the protocols.

What we mean by a crash is not simply a failure and restart of a node. In practice, protocols for reliable messages have limits, called ‘timeouts’, on the length of time for which they will wait to deliver a message or get an ack. We model the expiration of a timeout as a crash: the protocol abandons its normal operation and reports failure, even though in general it’s possible that the message in fact has been or will be delivered.

We begin by writing a careful specification S for reliable messages. Then we present a ‘lower-level’ spec D in which the non-determinism ;associated with losing messages when there is a crash is moved to a place that is more convenient for implementations. We explain why D implements S but don’t give a proof, since that requires techniques beyond the scope of this chapter. With this groundwork, we present a generic protocol G and a proof that it implements D. Then we describe two protocols that are used in practice, the handshake protocol H and the clock-based protocol C, and show how both implement G. Finally, we explain how to modify our protocols to work with finite sets of message identifiers, and summarize our results.

The goals of this chapter are to:

• Give a simple, clear, and precise specification of reliable message delivery in the presence of crashes.

• Explain the standard handshake protocol for reliable messages that is used in TCP, ISO TP4, and many other widespread communication systems, as well as a newer clock-based protocol.

• Show that both protocols can be best understood as special cases of a simpler, more general protocol for using identifiers to tag messages and acknowledgements for reliable delivery.

• Use the method of abstraction functions and invariants to help in understanding these three subtle concurrent and fault-tolerant algorithms, and in the process present all the hard parts of correctness proofs for all of them.

• Take advantage of the generic protocol to simplify the analysis and the arguments.

10.1.1 Methods

We use the definition of ‘implements’ and the abstraction function proof method explained in Chapter 3. Here is a brief summary of this material.

Suppose that X and Y are state machines with named transitions called actions; think of X as a specification and Y as an implementation. We partition the actions of X and Y into external and internal actions. A behavior of a machine M is a sequence of actions that M can take starting in an initial state, and an external behavior of M is the subsequence of a behavior that contains only the external actions. We say Y implements X iff every external behavior of Y is an external behavior of X. This expresses the idea that what it means for Y to implement X is that from the outside you don’t see Y doing anything that X couldn’t do.

The set of all external behaviors is a rather complicated object and difficult to reason about. Fortunately, there is a general method for proving that Y implements X without reasoning explicitly about behaviors in each case. It works as follows. First, define an abstraction function f from the state of Y to the state of X. Then show that Y simulates X:

1. f maps an initial state of Y to an initial state of X.

2. For each Y-action and each reachable state y there is a sequence of X-actions (perhaps empty) that is the same externally, such that the following diagram commutes.