Contents:

Abstract Data Types

In this recitation, we discuss the implementation of an Abstract Data Type (ADT). Data abstraction is a type of abstraction used to hide (abstract away) the details of data representation, and is a core concept in object-oriented programming. Data abstraction is also a specification mechanism -- a way to think about programs and designs.

To implement a data abstraction, we select a representation for the instances of the data type, the "rep", and then implement operations in terms of that rep.

A rep is a not just a data structure, but also includes a set of conventions about that structure. The Rep Invariant describes the set of allowed values of the data structure, and the Abstraction Function shows how the structure is to be interpreted.

IMPORTANT NOTE: Keep in mind, both notions are for use by the implementer of the class, not the clients, and so they should not be included in any of the public specifications (the Javadoc).

Example: RatNum

    private final int numer;
    private final int denom;

    /** @effects If d = 0, constructs a new RatNum = NaN.  Else
        constructs a new RatNum = (n / d).
    */
    public RatNum(int n, int d) {
        // special case for zero denominator; gcd(n,d) requires d != 0
        if (d == 0) {
            numer = n;
            denom = 0;

        } else {

            // reduce ratio to lowest terms
            int g = gcd(n,d);
            n = n / g;
            d = d / g;

            if (d < 0) {
                numer = -n;
                denom = -d;
            } else {
                numer = n;
                denom = d;
            }
        }
        checkRep();
    }

Recall that this implementation of rational numbers requires that the numbers are stored in reduced form. Therefore, creating RatNum(4,2) and RatNum(8,4) will both store 2 in the numer field and 1 in the denom field. This lowest terms requirement should be communicated through the Representation Invariant for the RatNum class.

Here is one possible answer for the AF and RI of RatNum:


Abstraction Function - AF(r):
   A RatNum r is NaN if r.denom = 0, (r.numer / r.denom) otherwise.
   (An abstraction function explains what the state of the fields in a
   RatNum represents.  In this case, a rational number can be
   understood as the result of dividing two integers, or not-a-number
   if we would be dividing by zero.)

Representation Invariant - RI(r):
  (r.denom >= 0) &&
  (r.denom > 0 ==> there does not exist integer i > 1 such that
                   r.numer mod i = 0 and r.denom mod i = 0;)
  In other words,
    * r.denom is always positive.
    * r.numer/r.denom is in reduced form (assuming r.denom is not zero).

What abstract values do the following concrete values represent?

Lessons about ADTs

Mutable vs. immutable classes

An ADT can either be implemented as a mutable or an immutable class. A mutable class is one that allows instances to be modified after creation; an immutable class is one that does not allow mutation after creation. Immutable classes are usually easier to reason about and less error-prone because the entire state of an instance is known at the time it is created. Instances of immutable classes can be shared freely without worrying about whether one client who has a reference to an instance may modify it 'behind the back' of another client who has a reference to the same instance.

The main disadvantage of an immutable class is efficiency. This is because an instance cannot be mutated via a mutator method; instead a new instance of the same type must be created using a producer method. However, for many ADTs, the benefits of being able to write more reliable code using immutable classes outweigh the drawback of performance efficiency.

We strongly suggest that, whenever possible, you should make your classes immutable (See Item 13: Favor Immutability on P.63 of Effective Java for details). If you ever need to make a class mutable, try to restrict the ways in which it can be mutated. Examples of ADTs that are usually better-implemented as mutable classes are collections such as sets or lists.

Here is how you can ensure that your class is immutable (adapted from P.63 of Effective Java):

Use of Abstraction Functions & Rep. Invariants

What we really want to do is be able to reason about the implementation of a type. These are techniques for "getting it right" which can, and will save you effort down the road.

The rep invariant, together with the abstraction function, can help you perform modular reasoning. That is, you can consider the correctness of one method independent of the other methods. This helps you narrow the scope of what you must consider, and thus can make the task much simpler.

The rep invariant can also help you identify errant behavior earlier, which makes debugging problems easier.

The abstraction function helps you understand the value you are representing. If you are going to write a method which performs some operation, it will usually be easier to think about what abstract value you want as the result, and then to consider how your implementation can create that value - a question which necessarily uses the abstraction function to map a concrete state to an abstract value.

Is the mapping from concrete to abstract values one-to-one? No! In the RatNum example, it is not one-to-one since any concrete instance with r.denom = 0 will map to the abstract NaN. As another example, a Set implemented via a Vector can have many concrete representations map to the same abstract value (e.g. [1 2] and [2 1] are both {1 2}). However, note that every concrete value (which satisfies the rep invariant) must map to exactly one abstract value.

Does the RI have to hold at private methods? Not necessarily; we only care that the RI is true when the program returns to the caller of a class's (public) methods. For example, a private helper method may be called by a public method to operate on the rep. and temporarily make the invariant false, so long as the invariant is true again by the time the public method finishes.

Checking the Rep Invariant

It's a great idea to check the rep invariant at run time whenever possible. You should do this especially when it's inexpensive, and especially if you're developing and debugging your class.

How do we check the rep invariant?

  1. Write a method with this signature: private void checkRep(). In the body of checkRep(), test all of the rep invariants. If all invariants hold, do nothing. If an invariant is violated, throw some unchecked exception to indicate failure. The junit.framework.Assert class can make this method especially easy to write.
  2. Place calls to checkRep() at the exit of every public constructor and at the entry and exit points of every public method.

In this way, you are asserting that the rep invariant holds whenever control enters any method of the ADT, and also whenever any method is about to return to the user of the class. If some bit of code violates the rep invariant, the error will pop up immediately in a predictable way, instead of at some undetermined point in the future.

Data encapsulation

Data encapsulation allows you to change your class's implementation without requiring clients of your class to change their code. Let's take a look at the ImmutableDate class, an Abstract Data Type which allows us to store and access calendar dates.

public class ImmutableDate {
    // Internal representation
    private int month;
    private int day;
    private int year;

    public ImmutableDate(int month, int day, int year) {
        this.month = month;
        this.day = day;
        this.year = year;
    }

    public int getMonth() {
        return month;
    }

    public int getDay() {
        return day;
    }

    public int getYear() {
        return year;
    }
}

This is a pretty straightforward class. To add a little twist, let's say that ImmutableDate can only store dates that are in 1980 or later. Can we write a Representation Invariant for this class? How about an Abstraction Function?

AF(d) = d is the specific date month/day/year

RI(d) = (d.month >= 1) && (d.month <= 12) &&
        (d.year >= 1980) && (d.day is valid, based on what d.month is ...)

Why do we have that getMonth() method? Why not just make month a public final variable and let everyone work with it directly? After all, the class is still immutable. One of the dangers of doing this is that it exposes your internal representation. Now everyone knows that you're using three ints to represent the month, day, and year. This may not seem like a huge problem, since obviously a date is useless unless you can retrieve the month, day, and year. But having public variables like this insidiously locks you into one specific implementation for ImmutableDate.

Let's say today you wrote ImmutableDate using the public final variables. Because this date class is amazingly useful and novel, you gave the code to all your friends. Now, let's say next week you discover that Unix stores dates as the number of seconds since 1/1/1970. If you wanted to change your internal representation to "The UNIX Way", you'd run into problems. All your friends are using those public final variables, so you can't just get rid of them. By choosing to expose your internal representation of the date, you've locked yourself into one particular implementation, and now it's nearly impossible to change how you store a date.

On the other hand, you could've "hidden" your implementation using the getMonth() method. Sure, it looked like a trivial method when you stored the month as an int, but what happens when you want to switch to the Unix format? You can remove the month, day, and year variables, replacing them with private int seconds. Then, you can just go through and change your getMonth() method to convert seconds into the right month and output the appropriate int. You've managed to change your internal representation, and your friends won't even notice!

Rep. Exposure

Representation exposure (a.k.a. 'rep. exposure') occurs when an implementor of a class inadvertently allows outside parties to gain access to the internal data structures of the class. One of the powers of data abstraction is the ability to encapsulate data in a black box and only have clients manipulate the data via methods defined by the specification. Rep. exposure breaks this data encapsulation, thus allowing outside parties to destroy the representation invariants that are crucial for the class to function properly as an ADT. Rep. exposure is dangerous and can happen in subtle ways. Let's now see an example using the Period class:

import java.util.Date; // A mutable class with these methods:

// Date()
// - Allocates a Date object and initializes it so that it represents
// the time at which it was allocated, measured to the nearest
// millisecond.

// Date(long date)
// - Allocates a Date object and initializes it to represent the
// specified number of milliseconds since the standard base time known
// as "the epoch", namely January 1, 1970, 00:00:00 GMT.

// long getTime()
// - Returns the number of milliseconds since January 1, 1970, 00:00:00
// GMT represented by this Date object.

// void setTime(long time)
// - Sets this Date object to represent a point in time that is time
// milliseconds after January 1, 1970 00:00:00 GMT.

/**
 * The Period class is an immutable ADT that represents a time
 * interval between two points in time.
 *
 * @specfield start :    date
 * @specfield end   :    date
 * @specfield duration : integer
 */
public final class Period {
  private final Date start;
  private final Date end;
  private final long duration;

  // RI(r) = ???
  //
  // AF(r) = ???
  //
  
  /**
   * @effects returns a new Period instance
   * @throws IllegalArgumentException if start is after end.
   * @throws NullPointerException if start or end is null.
   */
  public Period(Date start, Date end) {
    if (start.compareTo(end) > 0) {
      throw new IllegalArgumentException(start + " after " + end);
    }
    this.start = start;
    this.end = end;
    this.duration = end.getTime() - start.getTime();
    checkRep();
  }

  /**
   * @return the starting date of this period.
   */
  public Date start() {
    checkRep();
    return start;
  }

  /**
   * @return the ending date of this period.
   */
  public Date end() {
    checkRep();
    return end;
  }

  /**
   * @return the duration of this period in milliseconds.
   */
  public long duration() {
    checkRep();
    return duration;
  }

  private void checkRep() {
    if (start == null || end == null)
      throw new RuntimeException("endpoint of interval is null.");
    if (start.getTime() > end.getTime())
      throw new RuntimeException("Illegal state.");
  }
}
Question:
What is a possible Abstraction Function and Representation Invariant for this class?
Answer:
RI(r) = (r.end.getTime() >= r.start.getTime()) &&
        ((r.end.getTime() - r.start.getTime()) == r.duration)

AF(r) = A Period r represents the time interval from r.start to r.end
         with duration r.duration.
Question:
What will the following code print out?
    Date start = new Date();
    Date end   = new Date(start.getTime() + 1000);
    Period p   = new Period(start, end);
    end.setTime(start.getTime() - 1000);
    System.out.println("Period p has duration of " + p.duration() + " ms.");
Answer:
The checkRep() throws a RuntimeException. The problem here is that the Period constructor fails to make defensive copies of its parameters.
Question:
How can you fix the constructor?
Answer:
//repaired constructor - make defensive copies of parameters
public Period(Date start, Date end) {
  this.start = new Date(start.getTime());
  this.end   = new Date(end.getTime());
  
  if (this.start.compareTo(this.end) > 0)
    throw new IllegalArgumentException(start + " after " + end);
  
  checkRep();
}
Question:
Consider the following code; what will be printed out?
//second attack on the internals of a Period instance
Date start = new Date();
Date end   = new Date(start.getTime() + 1000);
Period p   = new Period(start, end);
p.end().setTime(start.getTime() - 1000);
System.out.println("Period p has duration of " + p.duration() + " ms.");
Answer:
The checkRep() fails and throws a RuntimeException. The Period accessors fail to make defensive copies of mutable internal fields before returning them. Notice that start and end are declared as final, but that only means that the object references cannot be mutated; the objects referred to by those references can still be mutated, as shown by this example.
Question:
How can you fix this problem?
Answer:
//Repaired accessors - make defensive copies of external fields
public Date start() {
  return new Date(start.getTime());
}

public Date end() {
  return new Date(end.getTime());
}
Question:
Why don't we need to worry about rep. exposure for duration()?
Answer:
Because duration() returns a primitive value, not an object reference, modifying the returned value has no impact on the copy that resides inside the Period object. This is one important distinction between primitives and objects in Java.

Lessons about avoiding rep. exposure

One solution to prevent returned objects from being modified would be to use methods such as unmodifiableCollection or unmodifiableList from the java.util.Collections class. These methods return a new List which is backed by the given argument, but which does not allow mutations (not even though the Iterator).

Question:
Why can't you use these wrappers to "protect" objects which you receive as arguments and then store?
Answer:
Even if you have a wrapped ("safe") version of an object, another piece of code might have a reference to the object itself (with no wrapper), and thus could modify the value.

Note that not all cases of returning or accepting mutable objects will lead to rep exposure. For example, if you were to write an ImmVector class, you could very well specify that the meaning of an element of the vector is its "reference" and not its "abstract value". Then, even if the contained objects were mutated, the abstract value of your ImmVector would still be the same.