Contents:
In this recitation, we discuss the implementation of an Abstract Data Type (ADT). Data abstraction is a type of abstraction used to hide (abstract away) the details of data representation, and is a core concept in object-oriented programming. Data abstraction is also a specification mechanism -- a way to think about programs and designs.
To implement a data abstraction, we select a representation for the instances of the data type, the "rep", and then implement operations in terms of that rep.
A rep is a not just a data structure, but also includes a set of conventions about that structure. The Rep Invariant describes the set of allowed values of the data structure, and the Abstraction Function shows how the structure is to be interpreted.
IMPORTANT NOTE: Keep in mind, both notions are for use by the implementer of the class, not the clients, and so they should not be included in any of the public specifications (the Javadoc).
private final int numer;
private final int denom;
/** @effects If d = 0, constructs a new RatNum = NaN. Else
constructs a new RatNum = (n / d).
*/
public RatNum(int n, int d) {
// special case for zero denominator; gcd(n,d) requires d != 0
if (d == 0) {
numer = n;
denom = 0;
} else {
// reduce ratio to lowest terms
int g = gcd(n,d);
n = n / g;
d = d / g;
if (d < 0) {
numer = -n;
denom = -d;
} else {
numer = n;
denom = d;
}
}
checkRep();
}
Recall that this implementation of rational numbers requires that the numbers are stored in reduced form. Therefore, creating RatNum(4,2) and RatNum(8,4) will both store 2 in the numer field and 1 in the denom field. This lowest terms requirement should be communicated through the Representation Invariant for the RatNum class.
Here is one possible answer for the AF and RI of RatNum:
Abstraction Function - AF(r):
A RatNum r is NaN if r.denom = 0, (r.numer / r.denom) otherwise.
(An abstraction function explains what the state of the fields in a
RatNum represents. In this case, a rational number can be
understood as the result of dividing two integers, or not-a-number
if we would be dividing by zero.)
Representation Invariant - RI(r):
(r.denom >= 0) &&
(r.denom > 0 ==> there does not exist integer i > 1 such that
r.numer mod i = 0 and r.denom mod i = 0;)
In other words,
* r.denom is always positive.
* r.numer/r.denom is in reduced form (assuming r.denom is not zero).
What abstract values do the following concrete values represent?
The main disadvantage of an immutable class is efficiency. This is because an instance cannot be mutated via a mutator method; instead a new instance of the same type must be created using a producer method. However, for many ADTs, the benefits of being able to write more reliable code using immutable classes outweigh the drawback of performance efficiency.
We strongly suggest that, whenever possible, you should make your classes immutable (See Item 13: Favor Immutability on P.63 of Effective Java for details). If you ever need to make a class mutable, try to restrict the ways in which it can be mutated. Examples of ADTs that are usually better-implemented as mutable classes are collections such as sets or lists.
Here is how you can ensure that your class is immutable (adapted from P.63 of Effective Java):What we really want to do is be able to reason about the implementation of a type. These are techniques for "getting it right" which can, and will save you effort down the road.
The rep invariant, together with the abstraction function, can help you perform modular reasoning. That is, you can consider the correctness of one method independent of the other methods. This helps you narrow the scope of what you must consider, and thus can make the task much simpler.
The rep invariant can also help you identify errant behavior earlier, which makes debugging problems easier.
The abstraction function helps you understand the value you are representing. If you are going to write a method which performs some operation, it will usually be easier to think about what abstract value you want as the result, and then to consider how your implementation can create that value - a question which necessarily uses the abstraction function to map a concrete state to an abstract value.
Is the mapping from concrete to abstract values one-to-one? No! In the RatNum example, it is not one-to-one since any concrete instance with r.denom = 0 will map to the abstract NaN. As another example, a Set implemented via a Vector can have many concrete representations map to the same abstract value (e.g. [1 2] and [2 1] are both {1 2}). However, note that every concrete value (which satisfies the rep invariant) must map to exactly one abstract value.
Does the RI have to hold at private methods? Not necessarily; we only care that the RI is true when the program returns to the caller of a class's (public) methods. For example, a private helper method may be called by a public method to operate on the rep. and temporarily make the invariant false, so long as the invariant is true again by the time the public method finishes.
It's a great idea to check the rep invariant at run time whenever possible. You should do this especially when it's inexpensive, and especially if you're developing and debugging your class.
How do we check the rep invariant?
In this way, you are asserting that the rep invariant holds whenever control enters any method of the ADT, and also whenever any method is about to return to the user of the class. If some bit of code violates the rep invariant, the error will pop up immediately in a predictable way, instead of at some undetermined point in the future.
Data encapsulation allows you to change your class's implementation without requiring clients of your class to change their code. Let's take a look at the ImmutableDate class, an Abstract Data Type which allows us to store and access calendar dates.
public class ImmutableDate {
// Internal representation
private int month;
private int day;
private int year;
public ImmutableDate(int month, int day, int year) {
this.month = month;
this.day = day;
this.year = year;
}
public int getMonth() {
return month;
}
public int getDay() {
return day;
}
public int getYear() {
return year;
}
}
This is a pretty straightforward class. To add a little twist, let's say that ImmutableDate can only store dates that are in 1980 or later. Can we write a Representation Invariant for this class? How about an Abstraction Function?
AF(d) = d is the specific date month/day/year
RI(d) = (d.month >= 1) && (d.month <= 12) &&
(d.year >= 1980) && (d.day is valid, based on what d.month is ...)
Why do we have that getMonth() method? Why not just make month a public final variable and let everyone work with it directly? After all, the class is still immutable. One of the dangers of doing this is that it exposes your internal representation. Now everyone knows that you're using three ints to represent the month, day, and year. This may not seem like a huge problem, since obviously a date is useless unless you can retrieve the month, day, and year. But having public variables like this insidiously locks you into one specific implementation for ImmutableDate.
Let's say today you wrote ImmutableDate using the public final variables. Because this date class is amazingly useful and novel, you gave the code to all your friends. Now, let's say next week you discover that Unix stores dates as the number of seconds since 1/1/1970. If you wanted to change your internal representation to "The UNIX Way", you'd run into problems. All your friends are using those public final variables, so you can't just get rid of them. By choosing to expose your internal representation of the date, you've locked yourself into one particular implementation, and now it's nearly impossible to change how you store a date.
On the other hand, you could've "hidden" your implementation using the getMonth() method. Sure, it looked like a trivial method when you stored the month as an int, but what happens when you want to switch to the Unix format? You can remove the month, day, and year variables, replacing them with private int seconds. Then, you can just go through and change your getMonth() method to convert seconds into the right month and output the appropriate int. You've managed to change your internal representation, and your friends won't even notice!
Representation exposure (a.k.a. 'rep. exposure') occurs when an
implementor of a class inadvertently allows outside parties to gain
access to the internal data structures of the class. One of the
powers of data abstraction is the ability to encapsulate data in a
black box and only have clients manipulate the data via methods
defined by the specification. Rep. exposure breaks this data
encapsulation, thus allowing outside parties to destroy the
representation invariants that are crucial for the class to function
properly as an ADT. Rep. exposure is dangerous and can happen in
subtle ways. Let's now see an example using the Period class:
import java.util.Date; // A mutable class with these methods:
// Date()
// - Allocates a Date object and initializes it so that it represents
// the time at which it was allocated, measured to the nearest
// millisecond.
// Date(long date)
// - Allocates a Date object and initializes it to represent the
// specified number of milliseconds since the standard base time known
// as "the epoch", namely January 1, 1970, 00:00:00 GMT.
// long getTime()
// - Returns the number of milliseconds since January 1, 1970, 00:00:00
// GMT represented by this Date object.
// void setTime(long time)
// - Sets this Date object to represent a point in time that is time
// milliseconds after January 1, 1970 00:00:00 GMT.
/**
* The Period class is an immutable ADT that represents a time
* interval between two points in time.
*
* @specfield start : date
* @specfield end : date
* @specfield duration : integer
*/
public final class Period {
private final Date start;
private final Date end;
private final long duration;
// RI(r) = ???
//
// AF(r) = ???
//
/**
* @effects returns a new Period instance
* @throws IllegalArgumentException if start is after end.
* @throws NullPointerException if start or end is null.
*/
public Period(Date start, Date end) {
if (start.compareTo(end) > 0) {
throw new IllegalArgumentException(start + " after " + end);
}
this.start = start;
this.end = end;
this.duration = end.getTime() - start.getTime();
checkRep();
}
/**
* @return the starting date of this period.
*/
public Date start() {
checkRep();
return start;
}
/**
* @return the ending date of this period.
*/
public Date end() {
checkRep();
return end;
}
/**
* @return the duration of this period in milliseconds.
*/
public long duration() {
checkRep();
return duration;
}
private void checkRep() {
if (start == null || end == null)
throw new RuntimeException("endpoint of interval is null.");
if (start.getTime() > end.getTime())
throw new RuntimeException("Illegal state.");
}
}
RI(r) = (r.end.getTime() >= r.start.getTime()) &&
((r.end.getTime() - r.start.getTime()) == r.duration)
AF(r) = A Period r represents the time interval from r.start to r.end
with duration r.duration.
Date start = new Date();
Date end = new Date(start.getTime() + 1000);
Period p = new Period(start, end);
end.setTime(start.getTime() - 1000);
System.out.println("Period p has duration of " + p.duration() + " ms.");
//repaired constructor - make defensive copies of parameters
public Period(Date start, Date end) {
this.start = new Date(start.getTime());
this.end = new Date(end.getTime());
if (this.start.compareTo(this.end) > 0)
throw new IllegalArgumentException(start + " after " + end);
checkRep();
}
//second attack on the internals of a Period instance
Date start = new Date();
Date end = new Date(start.getTime() + 1000);
Period p = new Period(start, end);
p.end().setTime(start.getTime() - 1000);
System.out.println("Period p has duration of " + p.duration() + " ms.");
//Repaired accessors - make defensive copies of external fields
public Date start() {
return new Date(start.getTime());
}
public Date end() {
return new Date(end.getTime());
}
One solution to prevent returned objects from being modified would be to use methods such as unmodifiableCollection or unmodifiableList from the java.util.Collections class. These methods return a new List which is backed by the given argument, but which does not allow mutations (not even though the Iterator).
Note that not all cases of returning or accepting mutable objects will lead to rep exposure. For example, if you were to write an ImmVector class, you could very well specify that the meaning of an element of the vector is its "reference" and not its "abstract value". Then, even if the contained objects were mutated, the abstract value of your ImmVector would still be the same.