Thursday, February 26, 2004
Each ADT should provide its own notion of what it means for two instances of that ADT to be equivalent. In general, you should never use the == operator to compare two object references, unless you are explictly trying to see if they are the exact same object in memory.
The equals() method specified in Object should be used to test for object equivalence. Each ADT you specify should override the equals() method, if needed, to provide a notion of equivalence for that ADT.
public boolean equals(Object obj)
The equals() method has the following specified requirements for non-null references:
x.equals(null) is always false.
x.equals(x) is always true.
x.equals(y) it must be that y.equals(x)
x.equals(y) and y.equals(z) then
it must be that x.equals(z)
x.equals(y) return the same
value unless one of the objects is mutated.
Note that this is quite an intentionally incomplete definition, and so the following meets the spec of being an equivalence relationship in Java, even though it does not seem particularly useful:
public boolean equals (Object obj) {
return (obj instanceof ThisClass);
}
In 6.170 we will be concerned with two notions of equivalence for objects:
Question: Which type of equivalence does Object.equals() implement?
Answer: Behavioural.
Question: Which type of equivalence does String.equals() implement?
Answer: Both.
It is worth noting that for immutable types, both observational and behavioural equivalence are the same, since there are no mutators. Consequently, only when we are implementing a mutable ADT do we have to choose between these two notions of equivalence.
Liskov recommends always using the equals method to provide a notion of behavioural equivalence for an ADT, while using an additional similar method to provide the ADT's notion of observational equivalence. In general, the Java API does not follow this guideline for equals and there is no similar method defined in Object.
The goal of hash tables is to let us store objects in such a way that we can perform (expected) constant time lookups later on. We start with an array of buckets. Note that it takes constant time to access any arbitrary bucket. A hash function is used to mathematically map an object to a bucket. The hash-function is a mapping from an object (such as a String) to a bucket number. When an object is being stored, it is mapped to the appropriate bucket and stored there. When we are looking up an object, we only need to look at the bucket identified by that object's hash. It is normal for more than one object to hash to the same bucket. This can be solved, by having a linked list for each bucket in the hash-table. If a good enough hash function is chosen these lists will be small for each bucket and the time to look up an object will be roughly constant.
Java avoids the theoretical issues associated with forming well-balanced hash-tables. The hashCode() method defined in Object lets any hash-table implementation map an object to a number. The hash-table code can then use an internal mapping to create the actual hash (i.e. bucket index) from this number.
Java really just has one main requirement for hashCode() implementations: whenever two objects are equal() they must have the same hashCode(). In general, whenever you override equals(), you will also need to override hashCode().
StringBuffer), other mutable ADT's use
observational equivalence(e.g. Date). In general, you should
look at the specs of the class you are using before relying on either
type of behaviour.
Question: What does the following print:
List arrayList = new ArrayList();
List linkedList = new LinkedList();
arrayList.add("a");
linkedList.add("A".toLowerCase());
arrayList.add("b");
linkedList.add("B".toLowerCase());
System.out.println(arrayList.equals(linkedList));
System.out.println(linkedList.equals(arrayList));
Answer: true
true
The classes in the Java collections framework consistently use the
notion of observational equivalence. This is why the
LinkedList above is equal to the
ArrayList. Furthermore, note that the references in the two
lists are not == to each other.
The Java implementations of the List interface can hold mutable objects without any complications. However, Set's are a special case. According to the specifications, the behaviour of Set implementations can be completely arbitrary if an element contained in the set is mutated.
Question: Can you guess what the following does?
Set set = new HashSet();
List list = new ArrayList();
list.add("6.170");
list.add("6.004");
list.add("6.003");
set.add(list);
set.contains(list); // what does this return?
list.add("6.033");
set.contains(list); // what does this return?
set.add(list);
System.out.println("set has " + set.size() + " elements");
Iterator iterator = set.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next());
}
Answer: The first contains() returns true,
the second returns false and the code prints out the following:
set has 2 elements [6.170 6.004 6.003 6.033] [6.170 6.004 6.003 6.004]
Why exactly does this fail? You need to know some details of the implementations of both ArrayList and HashSet to figure it out.
ArrayList.hashCode(): The returned hash-code is based on the hashCode()'s of all the elements in the list. The implementation in AbstractList actually iterates over the entire list adding up the hashcodes.
HashSet.contains(obj): Uses obj.hashCode() to map to a bucket in its internal hashtable. If it sees an object equal to obj, this method returns true, otherwise it returns false. Doesn't look anywhere else in the data structure.
HashSet.add(obj): Maps the object to a bucket and adds it to the bucket if it isn't already in that bucket.
It is the changed hashCode() value that are the problem. If we had added a StringBuffer to the set instead, mutated it, and added it again, the set would have worked fine, since StringBuffer uses Object.hashCode().
All of our work so far on specifications and Abstract Data Types has been motivated by our desire to break down our programs into modules that can be manipulated and reasoned about more easily. Some of the most important benefits of modularity (code reuse, division of labor, modular reasoning) cannot be realized if the modules are closely coupled, i.e. they depend on the implementation details of other modules in the system. When two modules depend on each other, directly or indirectly, they are said to be coupled. We want to minimize the degree of coupling in our system.
Specifications are an important tool to decouple code. With well-defined specifications, modules do not depend on the implementation details of each other, only on the details of the specifications at the module boundaries. Other forms of dependencies are still possible. In 6.170 we use Module Dependency Diagrams (MDD's) to illustrate common dependency relationships between a program's modules. Such diagrams often help us reason about how to improve our design to make our modules less closely coupled.
The following figure shows a partial MDD for the poker game code from
problem set 1. This MDD is not complete since it does not contain any
references to Java API classes (such as ArrayList and
ImageIcon) and only contains the classes for whose source
code we provided you.
Most of the MDD's we will see in 6.170 will be for Java code, showing the dependancies between actual Java classes, but it is possible to produce more general MDDs. For instance, the MDD below is the package-level MDD for the poker game of PS1. Each box represents a package in this diagram.
Question: Why is there a weak dependency between the player and playingcards packages?
Answer: No class in player ever calls a method
for Card (or any other class in
playingcards). However, ComputerPlayer picks
the three lowest ranked cards (using the Comparable
specs) and discards them (using the methods in Hand). The
weak dependence is due to a dependence on the existence of
Card but not on any of the specifications of
Card.
The elements of a MDD are:
Comparable is an
example of a specification.
Card's relationship
with Comparable is an example of this.
Additional information about MDD's can be found in Ch. 13 of the Liskov text and the 6.170 lecture notes.
Apart from well-defined specifications, techniques such as reliance on partial specifications (such as interfaces) and callbacks can be used to reduce the decouple modules from each other.
The situation is not so simple for an entire program, which may contain many classes. How can the state for an entire program be modeled? One way is to use an object model.
An object model consists of a graph and a textual description. The graph contains nodes and edges, the nodes representing the kinds of objects in the program, and the edges representing relations between them. Specifically, each node is a named set of instances. In the model, we draw a box for each node and a directed arrow for each edge.
The textual description contains definitions of the nodes and relations, and also definitions of derived relations and additional constraints.
Relations should indicate their multiplicity. A relation maps to a set; it maps from a single item in the source set (node) to a set of items in the target set (node). The relation's multiplicity defines how many items are in the set. This is indicated in the diagram by annotating relation arrows with the symbols *, +, ?, or !:
These symbols must appear on both ends of the arrow. Thus, the symbol on the source end (m) indicates how many source items a particular target item maps to, and the symbol on the target end (n) indicates how many target items a particular source item maps to.
Example:
Nodes are:
Question: How would you change the relation to indicate that an employee can work for more than one company.
Answer: Change the '!' to a '+'.
Exercise
Consider, for example, a manufacturing scheduling application. The text description for nodes and relations is as follows.
Nodes would include:
Answer:
An operation consumes zero or more resources as the operation begins, and "produces"
one or more resources as the operation ends. For example, a dough mixing operation might
consume flour, milk, and sugar resources, and might produce a cookie dough resource. It
might also require a mixing machine, and a human operator for the operation. These are
also modelled as resources; the mixing machine and human operator are "consumed" at the
start of the operation, and then "produced" at the finish.
Question: Fill in the relation annotations in the above diagram.
Answer:
Thus, an operation always has one start time, one finish time, and one duration. It can have zero or more input resource requirements, and one or more output resource requirements (it should have at least one because it is supposed to produce something).
Additional constraints can be added to the text description. For example, the duration will typically depend on the type of operation. Details about what kinds of resources can fulfill a resource requirement will also depend on the type of operation. There may be sequencing constraints between operations (the finish time for DoughMixing should be less than the start time for Baking).
Additional information about object models can be found in Ch. 12 of the Liskov text and the 6.170 lecture notes.
The object models above are abstract representations of the problem space, where we have broken down the problem into sets of objects and the relations we expect between them. More concrete object models are also possible, where the model is in terms of the actual Java classes and the relations that connect those classes.
Exercise: Construct the concrete object model for Card and Deck.
Notice the textual constraint for Deck since we could
not represent this information using only our object model notation.