6.170 Laboratory in Software Engineering
Spring 2001
Problem Set 3: Designing and Implementing Graph
Due: Thursday, March 1, 2001 at 5pm

Handout P3

Quick links:

Contents:

We recommend that you read the entire problem set before you begin work.


Purpose

This assignment gives you the opportunity to design your own specification for an abstract data type. You will also implement your specification and test your implementation. You will also implement and test an algorithm which uses your abstract data type.

Introduction

In order to give useful directions, the MapQuick system needs a means of finding the shortest path between two points. To do so, MapQuick requires abstractions to represent information about streets, the connectivity between streets, and possible paths of travel. While the next problem set will deal with creating data structures to represent the streets themselves, this problem set will focus on developing the ability to find the shortest path between two points.

To do so, MapQuick will need a graph abstraction to represent connectivity, a path abstraction to describe the cost of traveling a particular course, and a path-finding algorithm to search for the shortest possible path.

While we do not specify the names, method signatures, or specifications of your graph or path-finding abstractions, we do specify Path and provide you with a file format for your testing driver. The testing driver will execute a script of commands and output results to standard out.

Graphs

MapQuick represents street connectivity with a directed graph in which streets are nodes and intersections between streets are represented by edges between them in the graph. You will, however, implement a generic graph representation which can be used for other purposes.

A directed graph consists of a set of nodes some of which may be connected by edges. In a directed graph, the edges have direction: node B might have an edge to node C, but node C is not necessarily connected to node B. For our purposes there cannot be more than one edge from a given node to another given node, but there can be an edge from A to B and an edge from B to A.

A simple directed graph
A simple directed graph with four nodes.

The children of node B are the nodes to which there is an edge from B. In the example above the children of B are A and C. Similarly, the parents of B are the nodes from which there is an edge to B. In the example above B only has one parent, A.

Path

When searching for best means of travel between two points, MapQuick will attempt to minimize the distance traveled. A generic shortest path algorithm, on the other hand, will attempt to minimize some arbitrary cost function of the route traveled. Since the Graph abstraction is generic and can support any type of node, you need an abstraction to represent the cost of some path through the graph. The Path interface will provide an abstraction for both representing a given path and for obtaining the cost of that path.

Path is a Java interface which only specifies a set of required methods, but cannot be directly instantiated and provides no code to classes which implement it. This allows for searching graphs containing any type of node so long as an appropriate implementation of Path is available.

Finding Shortest Paths

Most shortest path algorithms are written for graphs in which the weights (costs) are associated with edges; an example is Dijkstra's algorithm, which you can find in any algorithms textbook (for instance, Introduction to Algorithms by Cormen, Leiserson, and Rivest).

Since the graphs we are dealing with are node-weighted, we provide a shortest path algorithm for you to use which can handle node weighted graphs. Here is a modified version of Dijkstra's algorithm that finds the shortest path from node start to node goal in a node-weighted graph. This algorithm is called a "greedy" algorithm because it never needs to recompute or reconsider information: it simply uses the first path that it finds from start to a particular node.

The following modified version of Dijkstra's algorithm has been written up in a pseudocode fashion which, while not executable, should be easy to read, understand, and translate to Java. The notation [a,b,c] stands for the three-element sequence consisting of a, b, and c; here, we use it to represent a path. When applied to sequences, "+" means concatenation; for instance, [a,b] + [c,d] = [a,b,c,d]. Using parentheses on a map looks up the value of the argument in the map. For instance somemap(a) evaluates to the value that a is mapped to in somemap. This value could be set as somemap(a) = foo.

    // Return a shortest path from any element of starts to any
    // element of goals in a node-weighted graph.
    Algorithm node-weighted-shortest-path(starts, goals) {

      for each start in starts
        if start in goals
          return [start]

      // maps nodes -> paths
      Map paths = { forall start in starts | (start, [start]) }

      // The priority queue contains nodes with priority equal to the cost
      // of the shortest path to reach that node.  Initially it contains 
      // the start nodes
      PriorityQueue active = starts
      Set finished = { }

      while active is non-empty do {
        // y is the element of active with shortest path
        y = active.extractMin()
        ypath = paths(y)

        // iterate over edges (y, z) in y.edges
        for each child z of y {
          zpath = ypath + [z]
          if (z in goals) {
            return zpath
          }
          if (z not in finished) and (z not in active) {
            paths(z) = zpath
            insert z in active
          }
        }
        insert y in finished
      }
      // execution reaches this point only if active becomes empty
      return No Path Exists
    }

You should read the note about a bug in this algorithm.

This algorithm requires a priority queue, a collection data structure that supports inserting elements and extracting the highest-priority element. (The highest-priority element has the numerically smallest priority; here, we use path costs as priorities, so shorter paths have higher priority.) We provide the class PriorityQueue which efficiently implements this data structure.

Explanation of the algorithm

The algorithm keeps track of two disjoint collections of nodes. One collection contains "finished" nodes for which we know the shortest path from start, and we have already examined all edges that leave the node. The other set contains "active" nodes for which we know the shortest path from start, but we have not yet considered paths that extend it. The paths map associates each active node with it the shortest path from start to that node.

A key property of this algorithm is that if x is a finished node, and y is not a finished node (either it is active, or we don't know the shortest path yet), then the length of the shortest path from start to x is no more than the length of the shortest path from start to y.

The algorithm starts with a set of active nodes starts, each element of which is associated with a path consisting of just that node. When any of the nodes contained in goals is reached, the algorithm terminates and returns the associated path with that element of goals.

The basic step of the algorithm (which is repeated over and over) is to find, in the set of active nodes, the one for which the associated shortest path has lowest total weight. Call this node y. Now consider every edge that leaves y; say that edge goes to z. If z is finished or active, ignore the (y, z) edge, because we already know of a shorter path to z than the one that goes through y. Otherwise, add z to the collection of active nodes, and associate with it the path zpath which extends the path associated to y by adding z at the end. zpath is the shortest path from start to z. After all the edges leaving y have been examined, move y from the active collection to the finished collection.

Shortest path example

Consider the following graph:

Small example graph

The nodes a, b, c, d, and e have weights 2, 1, 3, 1, and 4, respectively. We want to find the shortest path from a to e.

Initially:
   paths = { (a, [a]) }
   active = { a }
   finished = { }

Remove a from active; its path has cost 2.
Consider each neighbor of a:
   add b to active
   set paths(b) = [a,b]
   add c to active
   set paths(c) = [a,c]
Add a to finished.

Now:
   paths = { (a, [a]), (b, [a,b]), (c, [a,c]) }    active = { b, c }    finished = { a }

Remove b from active; its path has cost 3, which is the lowest in active.
Consider each neighbor of b:
   do not add c to active, as c already appears in active
   add d to active
   set paths(d) = [a,b,d]
Add b to finished.

Now:
   paths = { (a, [a]), (b, [a,b]), (c, [a,c]), (d, [a,b,d]) }
   active = { c, d }
   finished = { a, b }

Remove d from active; its path has cost 4, which is the lowest in active.
Consider each neighbor of d:
   e is the goal node, so we are finished; return [a,b,d,e]

Test file format

The testing script format is a simple text file with one command listed per line. Each line consists of words separated by white space. The first word on each line is a command name. The remaining words are arguments to the command and have an interpretation which is dependent on the particular command they follow. Lines which have a hash (#) as their first character are considered comment lines and should be echoed to the output when running the test script. Lines which are blank should cause a blank line being printed to the output.

The testing script manipulates graphs of WeightedNode elements. Each WeightedNode has a name and a cost. After a WeightedNode is created, it is referred to by name later on in the testing file. We have also provided a WeightedNodePath which implements the Path interface for sequences of WeightedNodes. When the testing driver outputs a node, it should do so by displaying that node's name. The name for each Graph is also a String. The testing driver must manage the mapping of names to graphs in addition to the mappings from names to nodes. This could be accomplished by maintaining a HashMap in the testing driver mapping Strings to Graphs and another for mapping names to WeightedNodes. To simplify the parsing of the file, names of nodes and graphs must contain only alphanumeric characters.

The following is a list of the valid commands, and the meaning of their arguments. Each command has an associated output which should be printed to standard out when the command is exectued.

CreateGraph graphName
Creates a new graph which shall be named graphName. The graph is initially empty (has no nodes and no edges). The command's output is

created graph graphName

CreateNode nodeName cost
Creates a new node named by the string nodeName with the cost specified by the non-negative integer cost. The node can be referred to by name for the rest of the file. The command's output is

created node nodeName with cost cost

AddNode graphName nodeName
Adds a node represented by the string nodeName to the graph named graphName. The command's output is

added node nodeName to graphName

AddEdge graphName parentNode childNode
Creates an edge in graph graphName from parentNode to childNode. The command's output is

added edge from parentNode to childNode in graphNode

ListNodes graphName
This command has no effect on the graph. Its output starts with

graphName contains:
and is followed, on the same line, by a space-separated list of the names of all nodes in the graph. The nodes should appear in alphabetical order. There is a single space between the colon and the first node name.

ListChildren graphName parentNode
This command has no effect on the graph. Its output starts with

the children of parentNode in graphName are: 
and is followed, on the same line, by a space-separated list of the names of nodes to which there is an edge from parentNode. The nodes should appear in alphabetical order. There is a single space between the colon and the first node name.

FindPath graphName from1 [from2 [from3 ... ] ] -> to1 [ to2 [ to3 ... ] ]
The -> symbol is separated from node names by whitespace on both sides.
This command has no effect on the graph. It finds and outputs the shortest path amongst all of the possible paths from any of the from nodes to any of the to nodes. The output starts with

shortest path in graphName: 
and is followed, on the same line, by a space-separated list of nodes in the order of traversal from fromX -> toX If no path exists the output should be
shortest path in graphName:no path found

Given the following as input for the testing driver:

CreateNode n1 5
CreateNode n3 1
CreateGraph A
AddNode A n1
CreateNode n2 10
AddNode A n2 
CreateGraph B
ListNodes B
AddNode A n3
AddEdge A n3 n1 
AddNode B n1
AddNode B n2
AddEdge B n2 n1
AddEdge A n1 n3
AddEdge A n1 n2
ListNodes A
ListChildren A n1
AddEdge A n3 n3
ListChildren A n3
FindPath A n3 -> n2
a correct implementation would output the following:
created node n1 with cost 5
created node n3 with cost 1
created graph A
added node n1 to A
created node n2 with cost 10
added node n2 to A
created graph B
B contains: 
added node n3 to A
added edge from n3 to n1 in A
added node n1 to B
added node n2 to B
added edge from n2 to n1 in B
added edge from n1 to n3 in A
added edge from n1 to n2 in A
A contains: n1 n2 n3
the children of n1 in A are: n2 n3
added edge from n3 to n3 in A
the children of n3 in A are: n1 n3
shortest path in A: n3 n1 n2
The behavior of the testing driver on malformed input files is not defined; you may assume the input files are well-formed.

Since it is possible that your specification throws exceptions for certain inputs, the testing file format provides for one additional feature. If the command in the input file results in an exception, the command, rather than outputting its normal string, should output "Exception: <toString value of exception>". This exception-output behavior is already implemented in the executeCommand method of the provided PS3TestDriver starter file.

A sample testing file is provided in /mit/6.170/www/psets/ps3/example1.test and the example above has been placed in /mit/6.170/www/psets/ps3/example2.test. The expected output in each case is in /mit/6.170/www/psets/ps3/example1.expected and /mit/6.170/www/psets/ps3/example2.expected respectively.


Problems

Problem 1: Writing a Specification for Graph [15 points]

You should begin by considering what sorts of methods and operations you feel you should include in your graph abstraction. The Testing Script Format may be used as a guide to your choice of which operations to support. Write a specification for methods to perform these operations including requires/effects/modifies clauses. Your Graph object must support storing generic nodes of type Object. Write your specification in your Graph.java file using the Javadoc notation that we have been using for staff provided specifications. Turn your specifications in with your code from Problem 2 in Graph.java. Your graph object itself should not manage, nor be aware of costs or weights of nodes; for this problem set the nodes themselves will store costs.

You will find suggestions on how to go about choosing operations for your Graph abstract data type in the course text in sections 5.1 and 5.1.1 (pp. 79-83).

Please include in your turnin a brief description of why you included the operations you did and why you feel they are a sufficient interface to a graph. This should be placed in problem1.txt.

Problem 2: Implementing Graph [15 points]

Write an implementation of your directed graph data type. Consider when doing so that your Graph abstraction will be used by the your path-finding code. Since you will eventually be creating and operating on very large sized graphs it is important that your Graph implementation be scalable. Since the path-finding algorithm must frequently look up the children for a given node, this operation should be performed quickly even for large graph sizes. This operation should be able to be performed in constant time. Your graph building operations should also be reasonably efficient.

Also include a brief textual description of your representation and explain why you chose it. Compare it to at least one other representation that you considered. You should place this discussion in a file called problem2.txt.

Remember that the Graph data type should be an abstract general purpose abstraction of a directed graph. It should not directly store cost information within it. To represent a directed graph with costs associated with each node, it is the node's responsibility to store the cost. The Graph abstraction should not depend upon the node objects being of any particular runtime type and as such should therefore only use operations defined in Object.

Problem 3: Writing a Testbed for Graph [12 points]

In order to allow us to verify your implementation, you will have to be able to read and execute commands from a script of graph commands as described above under Testing Script Format, using WeightedNodes as the elements of the Graph. You will also write your test cases in this file format.

The testing driver should read input from standard in using the format described under Testing Script Format and print its results to standard out. In order to help you parsing the input file format, we have provided a skeleton implementation of the parser. You should fill in this file with your own code. Please be sure to use the output field in the tester to print out the desired output.

Your test cases should be written in the specified machine readable format (with comments). You may place them in multiple testing files but each file should begin with the letters "p3" and have the extension ".test" The expected output for your test cases should be a file with the name name, but with the extension ".expected". For example you may have a file name "p3foo.test" with expected output in "p3foo.expected".

Include with your test cases documentation explaining your testing strategy and an argument for why you feel that they are sufficient, but not excessive. Place this writeup in problem3.txt. Be sure to include all of your tests and your testing driver in your Manifest file. Note: For the test for this problem your driver does need not yet implement the FindPath command.

Problem 4: Finding Shortest Paths [20 points]

Now create a specification for a PathFinder class containing a method that finds the shortest path between two sets of nodes in a graph. Your path-finding code should be able to take a list of possible sources and a list of possible destination nodes and find the shortest possible path that starts from any of the sources and leads to any of the destinations.

Your path finder abstraction should support searching Graphs with arbitrary node types. It should obtain a cost metric through the Path interface.

In order to keep your PathFinder implementation from being dependent on a particular node type you may find it useful to pass in source Paths rather than source nodes.

Implement your specification.

Problem 5: Writing a Testbed for PathFinder [10 points]

The testing framework used to test Graph is also used to test PathFinder. Extend the testing framework to support the FindPath command. In order to test the functionality of PathFinder you should create a set of test cases, similar to what you did in Problem 3, in a files called ps3pathfinder.test and provide there expected output in ps3pathfinder.expected. Also include an explanation your testing strategy and an argument for why you feel that your tests are sufficient, but not excessive. Place this writeup in problem5.txt.

Problem 6: WeightedNodePath Representation [3 points]

Look at the implementation of WeightedNodePath. In a file named problem6.txt, describe why one might prefer the representation that was chosen over possible alternatives. You should describe at least one alternate representation. You should consider the representation within the context of its expected use in PathFinder.

Returnin [25 points]

After you receive your corrected problem set back from your TA, you will need to resubmit a corrected version of the code as described in the Turnin document. You will have until March 23 at 5pm to to returnin your code. The returnin script will be enabled starting a week in advance, on March 16.

What to Turnin

Your electronic submission should include at least:
Graph.java
PathFinder.java
[.java source for Test Driver]
problem1.txt
problem2.txt
problem3.txt
problem5.txt
problem6.txt
[test cases for p3]
[expected output for p3 tests]
[test cases for p5]
[expected output for p5 tests]

You should also turnin a hardcopy of each of these files.


Provided classes

The following classes are all provided for you, in compiled form, by the staff:

With source code also provided:
ps3.Path
ps3.WeightedNode
ps3.WeightedNodePath

With source code not provided:
ps3.PriorityQueue


Hints

You should write your specifications for both your graph class and your path-finding method first. After writing the spec you should begin to write test cases based on the spec and only after that begin the implementations.

While the testing driver requires listing nodes in alphabetical order, it is not necessary for your graph implementation to store or return values in alphabetical order. You can have your testing driver sort the values it receives from your graph before outputting them. You may find java.util.Collections.sort to be useful.

In order to compare the output of your testing driver to either the expected output we provide, or to your own expected output program, you may find the unix diff command to be useful. The diff command can compare two input files and report what is different between them. To use diff you should first save the output of running your test to a file. You can do this by adding, after your normal command line, a greater than (>) character followed by the name of a file to save the output to. You can then run diff by typing the command (substituting the appropriate file names):

diff expected-output-file actual-output-file
If diff exits and prints nothing to the screen, then the two files are identical. If they are different, however, diff will print to the screen a summary of the changes. If you have a wide terminal you might find:
diff -y expected-output-file actual-output-file
which displays the two files side by side to be more readable.

Just like you can use the greater than symbol to send the output of a program to a file you can use the less than symbol (<) to redirect the contents of a file to standard in. Therefore if you wanted to run the provided testing skeleton reading input from a file name mytest.test and writing it to mytest.out you could run:

java ps3.PS3TestDriver < mytest.test > mytest.out


Algorithm Fix

The algorithm given above will work under most circumstances, but will fail under certain conditions. If you solve this problem set using the algorithm as stated above, you will not have points deducted for the cases which fail. That is, we won't test your implementation under conditions which cause the algorithm to fail.

However, we provide a fix to the algorithm below, for students who wish to have an implementation which will be correct for all inputs. Again, students need not make this change for ps3; we are providing the fix for purposes of later reference, thoroughness, and correctness.

The algorithm is the same as before, except for the changes marked in bold. The change is that the test for reaching a goal is performed after extraction from the queue, instead of while iterating over child nodes.

    // Return a shortest path from any element of starts to any
    // element of goals in a node-weighted graph.
    Algorithm node-weighted-shortest-path(starts, goals) {

    // [ test for start nodes in goals removed ]

      // maps nodes -> paths
      Map paths = { forall start in starts | (start, [start]) }

      // The priority queue contains nodes with priority equal to the cost
      // of the shortest path to reach that node.  Initially it contains 
      // the start nodes
      PriorityQueue active = starts
      Set finished = { }

      while active is non-empty do {
        // y is the element of active with shortest path
        y = active.extractMin()
        ypath = paths(y)

        // [goal test added here]
        if (y in goals) {
          return ypath
        }

        // iterate over edges (y, z) in y.edges
        for each child z of y {
          zpath = ypath + [z]
          // [goal test removed from here]
          if (z not in finished) and (z not in active) {
            paths(z) = zpath

        ...


Change Log

This section details the changes made since the handout was posted to the website. You may use the version number at the bottom of your hardcopy to determine what has changed since you printed this handout.

revision 1.68: Added due date for returnin.
revision 1.67: Add a section describing an optional fix to the path-finding algorithm.
revision 1.65: Example command line in final hint now accounts for PS3TestDriver being in package ps3.
revision 1.64: Added this changelog.
revision 1.62: Move hint #2 about passing in Path instances to PathFinder into the actual problem statement.
revision 1.61: Emphasize that Graph is to be a generic directed graph.
revision 1.60: Don't require that the PathFinder method is static.
revision 1.58: Remove the mention of NoSuchPathException from the comment at the top of the path finding pseudo code. Students are not required to use an exception.
revision 1.57: Add section detailing what classes are provided by the staff.
revision 1.55: Clarify problem 4 to remind students to implement PathFinder.
revision 1.53: Clarify the handling of failing to find a path between two nodes.
revision 1.52: Fix iterator bug in WeightedNodePath implementation.


Back to the Problem sets page.
For problems or questions regarding this page, contact: 6.170-webmaster@mit.edu.
$Id: ps3.html,v 1.71 2001/02/28 22:33:31 mernst Exp $