To implement a data abstraction we select a representation for the instances of the type, the "rep", and then implement operations in terms of that rep.
A rep is a not just a data structure, but also includes a set of conventions about that structure. The Rep Invariant describes the set of allowed values of the data structure, and the Abstraction Function shows how the structure is to be interpreted.
Keep in mind, both notions are for use by the implementer of the class, not the users, and so should not be included in any of the specifications.
/**
* A TinySet is a mutable set of integers drawn from the range [0..7]
* @specfield elements : subset of { n : integer | 0 <= n <= 7 }
*/
class TinySet {
private int bits;
// Rep Invariant:
// 0 <= bits <= 255
// Abstraction Function: a = AF(t)
// a.elements = { n | ((t.bits & (1 << n)) != 0) }
/**
* @effects Creates a new, empty TinySet (this.elements_post = {})
**/
public TinySet() {
this.bits = 0;
}
/**
* @requires 0 <= n <= 7
* @effects this.elements_post = this.elements U { n }
*/
public void add(int n) {
bits = bits | (1 << n);
}
/**
* @requires 0 <= n <= 7
* @effects this.elements_post = this.elements - { n }
*/
public void remove(int n) {
bits = bits & !(1 << n);
}
/**
* @requires 0 <= n <= 7
* @returns true iff n in this.elements
*/
public boolean contains(int n) {
return (bits & (1 << n)) != 0;
}
/**
* @requires other != null
* @modifies this
* @effects this.elements_post = this.elements ^ other.elements
**/
public void intersect(TinySet other) {
bits = bits & other.bits;
}
/**
* @requires other != null
* @modifies this
* @effects this.elements_post = this.elements U other.elements
**/
public void union(TinySet other) {
bits = bits | other.bits;
}
}
Since the number of elements in the set is small, we can represent the set as a list of bits, where a bit is on exactly when its corresponding element is in the set.
Since we have eight elements (0-7), we have 2^8 = 256 possible sets, which we number 0-255. We give a rep invariant that the value of the bits integer is 0-255, since these are the only values which are needed. We could consider allowing any int (and then just look at eight bits of it), but we should be better safe than sorry.
Now, say we look at some running program and find that the bits field is set to 86. What set is being represented? To answer this, we use the abstraction function, which says:
// a.elements = { n | ((t.bits & (1 << n)) != 0) }
Here, bits is 86 == 0x56 == 01010110, so the values of n where the
n'th bit of bits is true are: 2, 3, 5, 7 (the primes less than eight).
(Note that 86 satisfies the rep invariant; if it didn't, the AF would
not be applicable.)
From this example, you can see how a representation invariant gives a boolean condition which can be evaluated on some concrete state, and how an abstraction function defines the abstract value of an ADT using its internal representation.
What we really want to do is be able to reason about the implementation of a type. These are techniques for "getting it right" which can, and will save you effort down the road.
The rep invariant, together with the abstraction function, can help you perform modular reasoning. That is, you can consider the correctness of one method independent of the other methods. This helps you narrow the scope of what you must consider, and thus can make the task much simpler.
The rep invariant can also help you find identify errant behavior earlier, which always makes debugging problems easier.
The abstraction function helps you understand the value you are representing. If you are going to write a method which performs some operation, it will usually be easier to think about what abstract value you want as the result, and then to consider how your implementation can create that value - a question which necessarily uses the abstraction function to map a concrete state to an abstract value.
Is the mapping from concrete to abstract values always 1:1? No! In the TinySet example above it is 1:1, but this is not true in general. For example, a Set implemented via a Vector can have many concrete representations map to the same abstract value (e.g. [1 2] and [2 1] are both {1 2}). However, note that every concrete value (which satisfies the rep invariant) must map to exactly one abstract value.
Does the RI have to hold at private methods? Not necessarily; we only care that the RI is true upon return to calling code. For example, a private helper method may be used to operate on the rep when the invariant is false.
Do you have to consider observer methods in this induction? Yes, since some obeservers may have invisible side effects (a.k.a. benevolent side effects).
Is this approach sound? It's close, but not perfect, due to the chance of rep exposure.
If we return mutable objects to the outside world, they can be mutated without concern for our invariants. (Therefore: Often copy before return).
If we take in mutable objects, their value might change while we are containing them, thus causing an invariant to break. (Therefore: Often copy before storing).
If we return a built-in Iterator, it might have a built-in remove method (e.g. iterator() in Vector, or HashSet, or HashMap, or ...).
One solution to prevent returned objects from being modified would be to use methods such as unmodifiableCollection or unmodifiableList from the java.util.Collections class. These methods return a new List which is backed by the given argument, but which does not allow mutations (not even though the Iterator).
Why can't you use these wrappers to "protect" objects which you receive as arguments and then store?
Answer:
Even if you have a wrapped ("safe") version of an object, another piece of code might have a reference to the object itself (with no wrapper), and thus could modify the value.
Note that not all cases of returning or accepting mutable objects will lead to rep exposure. For example, if you were to write an ImmVector class, you could very well specify that the meaning of an element of the vector is its "reference" and not its "abstract value". Then, even if the contained objects were mutated, the abstract value of your ImmVector would still be the same.
How do we check the rep invariant?
The book differs from current 6.170 practice. Here is what we suggest:
1) Write a method with this signature: public void checkRep(). In the body of checkRep, test all of the rep invariants. If all invariants hold, do nothing. If an invariant is broken, throw some unchecked exception to indicate failure. The junit.framework.Assert class can make this method especially easy to write.
2) Place calls to checkRep (i) at the exit of every public constructor and (ii) at the entry and exit point of every public method.
In this way, you are asserting that the rep invariant holds whenever control enters any method of the ADT, and also whenever any method is about to return to the user of the class. If some bit of code violates the rep invariant, the error will pop up immediately in a predictable way, instead of at some undetermined point in the future.
Additionally, test drivers may want to call checkRep while performing tests. That way, even if the implementation is not calling checkRep at every point (e.g. for performance reasons), during testing we always want to verify the RI.
public class RatNum {
private int numer;
private int denom;
// ...
}
What were the AF and RI for this representation?
For a RatNum 'r':
AF(r) = / when (r.denom == 0) : NaN
\ when (r.denom != 0) : r.numer / r.denom
RI(r) = (r.denom >= 0) &&
(r.denom > 0) ==> (gcd(r.numer, r.denom) == 1)
What benefits did the second clause (reduced form) buy us?
What abstract values do the following concrete values represent?
Lessons:
/**
* @specfield cost : real // cost of traversing this Path
* @specfield elements : sequence // the nodes in this Path, from start to end
**/
public interface Path {
// ...
}
public class WeightedNodePath implements Path {
private final WeightedNode node;
private final WeightedNodePath path;
private final int cost;
// ...
}
Note that WNP inherits two specification fields from the Path interface: a real number called 'cost', and a sequence called 'elements'. There is a 'cost' field in the WNP implementation, but there is no 'elements' field in the implementation. What is going on?!?
The abstract state of the 'elements' sequence is not being stored "directly" in the WNP. Instead, we can define an abstraction function which maps the concrete state (concrete fields: node, path, cost) of a given WNP to it's abstract state (abstract fields: elements, cost).
Write an abstraction function which takes a WNP 'w' and produces it's abstract state 'a'. That is, write definitions for the abstract fields 'a.cost' and 'a.elements' in terms of the concrete fields 'w.cost', 'w.node', and 'w.path'.
One answer:
a.cost = w.cost
a.elements = F(w) / when (w.path == null) : [w.node]
\ when (w.path != null) : [w.node] + F(w.path)
What benefits did this representation yield?
Note that the specfield cost (of type real) is mapped to the implementation field cost (of type int). This is legal because, while the Path interface uses a real number for the cost, a WNP can only exist with integer cost (since WeightedNodes have a integer cost).
A Card represents a normal playing card. That is, it can have one of four suits {C, H, D, S} and one of thirteen ranks {A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K}. Therefore there, are 52 possible Cards. A CardSet is simply a set whose elements are Cards.
Choose at least two representations for both types, and write a rep invariant and abstraction function for each representation. What are advantages or disadvantages of your rep choices? (You should also define a few operations (methods) for each class, since the benefits and drawbacks of a rep choice often depend on what methods you define.)
Example answers:
/** Immutable type */
public class Card {
/** @returns number 0..51 which uniqely identifies this card */
public int ordinal();
// ...
}
/** Mutable type */
public class HashCardSet {
private Set mySet = new HashSet();
// AF(c) = { mySet.elements }
// RI(c) = (mySet != null) &&
// (for all elements e in mySet :
// (e != null) &&
// (e instanceOf Card)
// )
// ...
}
/** Mutable type */
public class BitCardSet {
private int[] bitSet = new int[] { 0, 0 };
// AF(c) = { card(n) | (c.bitSet[n & 1] & (1 << (n / 2)) != 0) }
// where card(n) represents the card of ordinal n
// RI(c) = (c.bitSet != null) &&
// (c.bitSet.length == 2) &&
// (0 <= c.bitSet[0] < 2^26-1) &&
// (0 <= c.bitSet[1] < 2^26-1)
// ...
}
HashCardSet advantages
HashCardSet disadvantages
BitCardSet advantages
BitCardSet disadvantages
public class RatPoly {
private RatTermVec terms;
// ...
}
The AF/RI for RatPoly are:
RI(p) =
p.terms != null &&
forall i such that (0 <= i < length(p), C(p,i) != 0 &&
forall i such that (0 <= i < length(p), E(p,i) >= 0 &&
forall i such that (0 <= i < length(p) - 1), E(p,i) > E(p, i+1)
AF(p) =
Sum, from i=0 to length(p)-1, of C(p,i)*x^E(p,i)
Which parts of the RI are core to the integrity of the rep, and could not be removed? Of the rest, why are they added? What operations do they make easier or harder?
Example answer:
It's hard to draw the line on what we mean by "core". What really matters is to talk about what is gained or lost by a certain definition of AF/RI.Some would say that "terms != null" is core, since a null field is meaningless. Others might argue that it could represent a special value -- the zero polynomial.
Many would say that "E(p,i) >= 0" is core, since our RatPoly class only talks about positive exponents. However, you could define your AF to say that negative terms are ignored. This would be considered bad practice, however.
To most, "C(p,i) != 0" is not core to the RI, but a choice by the implementer. The math operations (add, sub, mul, div, neg) would all still work without this, but operations such as degree, unparse, or equals would have special cases to check for zero coefficients.
The terms being sorted and with unique exponents makes many method implementations easier, and while not "core" to defining what reps might make sense in abstract terms, is quite justified to improve ease of implementation.
Example answer:
RatPolyStack is similar to WeightedNodePath.
Similarities:
- Both use a cons-like structure.
- Both store information which could be derived from a function over the elements (cost, size).
Differences:
- Cons is a separate class in RPS, but is part of WNP.
- The reason for cons structure is different. In RPS, it's due to locality of use (you only work with the top few elements on the stack). In WNP, it's to share rep when possible (to be memory-efficient).
Aside: Why is Cons a separate class in RPS, but not so in WNP? Could either one be written either way? Yes, it could be written either, but things make the most sense the way they've been done. In WNP, the "rest" (cdr) makes sense as an entity in and of itself. Indeed, most paths are created from some initial path, which can just be stored into the cdr. However, do all of the RPS operations (div, push, etc.) apply to the "rest" of the stack? Not really, so it makes sense to define the car-cdr structure externally.
/** Mutable type */
public class Matrix {
/** @effects creates a new (m x n) matrix where all the cells are null */
public Matrix(int m, int n);
/** @effects sets the (i,j)'th element of this to be o **/
public void set(int i, int j, Object o);
/** @effects returns the (i,j)'th element of this **/
public Object get(int i, int j);
// ...
}
What representations might you use? What if you knew about certain
usage properties, such as that most of the matrix contains nulls?
Example answer:
You could do the obvious thing:
private Object[][] elements;
This would be fairly easy to understand. The RI would include the statement that the second-level arrays are all the same size.
Getting a bit more complecated, you could use:
private Object[] elements;
private int columns;
Here, to get to (i, j) you lookup (columns * j + i). The RI would include the statement that elements.length is a multiple of columns.
Both of the previous solutions take up O(m*n) memory. However, if you know that the matrix was sparse, you can do better.
One example would be:
private HashMap elements;
The keys of the map would be some immutable type which represented the pair (i,j). This could even be java.lang.Integer, if you map a pair to an int as in the previous example. This would use memory only as cells are filled, but would take up more space per cell. The get and set methods are still constant-time.
Another rep would be like Object[][], but would only use cells when needed. This rep would start with a list of row-entries. Each entry would know what row it is, and would contain column entries. Each column entry would contain a column number and a value. You would consider using a linked-list rep for keeping the sequences of entries, to allow for efficient insertion and deletion. You could write this rep out as ((r1 (c1 5) (c7 3)) (r5 (c3 2))).
You could switch {row, column} in the above if you have usage properties indicating that values usually group by column instead. Also, some spare matrix implementations will have a master linked list by both column and by row, which would allow for efficient iteration in any direction.