6.170 Laboratory in Software Engineering
Spring 2001
Problem Set 6: MapQuick
Due: Thursday, April 5, 2001 at 5pm

Handout P6

Quick links:

Contents:


Introduction

In this problem set you will assemble the modules you developed in problem sets 2 through 5, as well as some additional modules, to build MapQuick, a MapQuest-like system for obtaining directions between street addresses. MapQuick obtains its data from Tiger databases provided by the U.S. Census bureau. It has both textual and graphical user interfaces, plus a programmatic interface, which enables it to be used by other programs.

You should go about this problem set by imagining that it involves developing MapQuick from scratch. Most places in this document that we say "MapQuick does suchandsuch." we mean "When you finish, MapQuick will do suchandsuch." You will hand in a self-contained project that includes a problem analysis, a design, module specifications, commented code, testing artifacts, and user documentation. You have already done much of this work. Even if you include work done previously, which we encourage you to do, you may want to improve on it, given the expertise and understanding you have acquired during the semester. For example, if you chose representations of abstract data types that are not efficient enough, you may need to rework them. The representation you chose for Graph, for example, may not scale to large graphs.

You are not required to use the modules you developed previously. If you like, you can start from a clean slate and develop MapQuick from scratch. We strongly caution you, however, that this is a risky path to take, and that if you are talented enough to complete the assignment from scratch in about two weeks, you may do better to assemble a version from what you have first, and only then, if it works, consider a different design. If you choose to do this, we urge you to discuss your new design with your TA; getting feedback on your ideas is essential to doing quality work and avoiding mistakes.

You have about two weeks for this problem set, but you should not underestimate the amount of work involved. It is essential that you pace yourself well. The teaching staff can to advise you if you run into difficulties, but not if you wait until the last minute to ask for help. See the suggested schedule below. You should meet with your TA a few days after the problem set is released to discuss your design and workplan.


The Problem

MapQuick takes as input two addresses -- a starting address and a destination address -- each consisting of a building number, street name, and zipcode (e.g., 305 Memorial Dr. 02139). If there is a path from the start to the destination, MapQuick gives directions; if not, it reports that no path between the two addresses exists. MapQuick does not account for one-way streets, because such information is not available in the Tiger database.

On startup, MapQuick reads a collection of database files containing the map information, and optionally a set of zipcodes to which searches will be confined. The course staff provides files in three directories (all subdirectories of /mit/6.170/tigerdb), giving several different databases; including tiny (a database for testing), small (a larger database that includes Cambridge), and medium (which includes most of the Boston metropolitan area, including Cambridge and downtown Boston). The staff also provides a ps6.StreetSegReader class that reads a tiger database and provides an Iterator of StreetSegments. See the detailed performance requirements.

If you finish early, you may want to implement some optional features that a real-world implementation of this system might have. You should not, however, consider these until you have a fully working system.

Text-Based Interface

Class ps6.TextUI implements a text-based interface (also known as a command line interface or CLI). When invoked from the command line as
java ps6.TextUI database-directory zipcode1 zipcode2 ... zipcodeN

with the first argument giving the database directory, and additional arguments giving zero or more zipcodes, the main method loads all the database files from the specified directory with the file extension .zip, filtering out all street segments that are not in one of the given zipcodes. If no zipcodes are given as arguments, there is no filtering, and all street segments are included. (Some street segments in the database are missing zipcodes; these should always be included. There may be other issues in the exact definition of filtering that you should resolve in your problem analysis.) If the attempt to load fails (say, because there are no .zip files in the given directory or one of the .zip files is not a valid Tiger database), MapQuick prints

Database error

and immediately exits. Otherwise, it enters a user interaction loop.

The user interaction loop queries the user for two addresses, outputs directions for traveling between the two, and then repeats. The address queries have the following format:

starting number? user-input
starting street? user-input
starting zipcode? user-input
destination number? user-input
destination street? user-input
destination zipcode? user-input

If the starting number is 0, then the program exits immediately, without requesting a starting street, starting zipcode, or any destination information.

All six inputs should be read before any other checking is performed.

If the starting zipcode does not appear in the database, the directions output should be

No such zipcode: number street zipcode

Otherwise, if the starting street does not appear in the specified zipcode, the directions output should be

No such street: number street zipcode

Otherwise, if the starting number does not appear on the street in the specified zipcode, the output should be

No such number: number street zipcode

Otherwise, the second zipcode, street, and number should be checked, with error output as specified above.

If there is no way to get from the first address to the second, the directions output should be

No path from number street zipcode to number street zipcode

Otherwise, the directions output should be as follows:

Start at number street zipcode
directions-line
...
number street zipcode is on your side
Trip length: n.n miles

The format of each directions-line (of which there is one or more) is as specified by
ElementaryRoute.directions(heading). The turn specified in the first directions line is always "left" or "right", assuming that travel begins at the starting address facing the street; likewise, the penultimate line states that the destination is on the left or the right. The trip length should be given to the nearest tenth of a mile.

If ps6.TextUI is invoked as specified above, it should not produce any other output to System.out, System.err, or elsewhere. You may implement other behaviors, such as debugging output or more detailed directions, by interpreting command-line flags that turn on such behavior, by providing other classes with main methods, by turning off all debugging output before submitting your solution, or by some other mechanism.

GUI

The command java ps6.GraphicUI launches MapQuick's graphical user interface. You may interpret additional command-line parameters, but the system must run when invoked without any additional command-line parameters.

You may design your own GUI. It must must support selecting a Tiger database directory, entering zipcodes for filtering, entering addresses, and displaying directions. Here is an example of a very simple GUI front-end for MapQuick:

an ugly sample GUI

The design of the graphical interface is relatively unconstrained. We advise that you build a minimal user interface and elaborate it only if you have the time and inclination.

We encourage you to use Swing (which you will use for the Gizmoball final project), but you may use AWT instead if you prefer. To get you started, we provide two sample Java programs that create windows and accept input: see HWFrame.java and HWFrameBetter.java, both in the /mit/6.170/www/psets/ps6/assignment/ directory. You are not required to use the better window-layout technique used by HWFrameBetter.java, but you are encouraged to, and you might find such techniques useful in your final project.

Programmatic Interface

In this problem set, the choice of internal components (classes and interfaces) and their specifications is up to you. However, in order to allow us to test your system, we require that it provide a programmatic interface: a set of classes that allows MapQuick to be executed from another program. Providing such an interface is essential anyway for cleanly separating the user interface from the rest of the system, and it allows you to do your own testing more easily.

See the specifications of the required programmatic interface.

Optional Features

After you have completely implemented, tested, and documented the required functionality, you might wish to implement some of these features (or others of your own devising): If any of these features are added, they cannot affect the behavior of the ps6.TextUI class as specified above. Therefore, in order to demonstrate them, you could

Performance requirements

Your implementation should have reasonable performance. We specify the performance requirements here in terms of some locations whose addresses are given in the section on Database Contents below.

On a SPARC Ultra-10, as configured in the Athena clusters and using a 128-megabyte Java heap, your implementation should meet the following performance requirements:

02139 02141 02215
Fenway Park -> Lobby 7 : 20 seconds

02139 02138 02141 02215 02115
Tower Records (Newbury) -> Fenway Park : 15 seconds
Fenway Park -> Newbury Comics (Newbury) : 30 seconds
Museum of Fine Arts -> Tower Records (Newbury) : 30 seconds
Newbury Comics (Newbury) -> Fenway Park : 40 seconds
Fenway Park -> Lobby 7 : 40 seconds
Tower Records (Newbury) -> Lobby 7 : 90 seconds

02139 02138 02141 02215
Fenway Park -> John Harvards : 5 minutes

02139 02138 02141 02215 02134
Newbury Comics (Everett) -> Fenway Park : 15 minutes

These are conservative numbers. A program that is carefully designed and uses good representations of the datatypes you built in problem sets 2 to 4 should run an order of magnitude faster. Don't be surprised if a query is fast in one direction but slow in another. Directions from Fenway Park to John Harvards (with the above zipcode filtering) take an order of magnitude less time to find in one implementation than the same trip in reverse.

If you find that your implementation performs badly, don't start optimizing all over the place. If you do so, you're unlikely to improve the performance dramatically, and you'll probably introduce bugs into your code. Instead, you should figure out where the bottlenecks are, and develop a strategy for handling them. Read our advice on representations below. Remember, after you implement improvements to your code, make sure to run regression tests.


What to hand in

You should hand in a hard copy of your documented development following the format described in the handout Documenting a Software System. Your code is not part of this document; you should not turn in a complete printout of your code unless requested to by your TA.

In your documentation's appendix on file formats, you should not be concerned about the format of the Tiger database files, but you should specify any formats you design (as part of an optional feature, for example).

The documentation should also include an appendix that records how your implementation performs on the sample queries listed above.

The files from which the hard copy was printed should be turned in using the turnin6170 command. Your turn-in should include a file named README that describes the contents of the files listed in your Manifest. Your code should be appropriately commented.

Grading

Since it straddles two weeks, the problem set is worth more than the earlier problem sets. To help you allocate your effort, here is a rough breakdown of how this problem set will be graded. Problem analysis, design, implementation, and testing will each account for roughly a quarter of the grade. On any of these, your grade will depend on the quality of your ideas as you present them. Documentation will not be graded separately, but work that is poorly documented will not be given much credit. This policy reflects good practice in industry and research: however good your ideas are, they are worth little if you cannot convey them to others.

Inefficient implementations that prevent your MapQuick from achieving reasonable performance will receive only partial credit.

Work on optional features will gain special credit. Such credit will be held in reserve and applied when we compute your final grade after we have ranked the students in the class. This will ensure that students who do not choose to work on optional features will not be penalized in any way, but students who do are rewarded appropriately.


Hints

Strategy

One possible approach to designing and implementing MapQuick is:

Efficiency

Java Heap Size

You will probably need to increase the Java heap size in order to load the entire medium database and handle queries on it. Increasing the maximum Java heap size can be done in the 1.3 JDK with the command line argument -Xmxsize. We recommend a 128-megabyte heap for MapQuick; thus the argument would be -Xmx128M. Run java -X for more information on this and other non-standard options to the Java virtual machine.

Choice of Representations

The representation of an abstract type can dramatically affect performance and scalability. Even with correct implementations of the modules you built in the earlier problem sets, you may be unable to assemble a MapQuick program that can load the entire database and respond to queries in a reasonable time. Here are some hints to help you identify where you might have problems, and how to overcome them.

Hash Functions

Objects that are used as keys in hash tables need reasonable hash functions to prevent collisions. In the worst case (for example, a hash function that always returns the same value), lookups in a hash table degenerate from unit cost to linear cost. Advice on constructing reasonable hash functions can be found in the Horton Java text or the Java Q and A.

In short, there are three properties which a good hash function should have:

  1. It must obey the hash code invariant. a.equals(b) ==> a.hashCode()==b.hashCode().
  2. It is nice if it has good coverage. a.hashCode()==b.hashCode() usually means a.equals(b).
  3. It is important that hashCode be fast. You may want to consider storing the hash code of an object in a private field so your hashCode method can simply return it.

Fractional Segments

Your MapQuick program as submitted must assume that the entire length of both the start and finish street segments are traversed.

This is not a realistic assumption, and you may optionally account for traversing only part of a segment. Consider traveling from the Apartment to the Coffeehouse:

A graphic demonstrating need for precision at start and end street segments

If your path length included all (or none) of the segments of Penny Lane and Broadway between Abbey Road and Side St, then you may incorrectly report the route that goes left onto Penny Lane, right onto Side St, and right onto Broadway, rather than the shorter route which goes right onto Penny Lane, left onto Abbey Road, and left onto Broadway.

Database Contents

The staff-provided StreetSegReader class reads street segments from Tiger database files. Every database directory contains two auxiliary files, segments.txt.gz and killfile.txt, that you may find helpful.

StreetSegReader

We have provided a class, StreetSegReader, for loading a set of StreetSegments from a file for your use on this problem set. Please read the specification for StreetSegReader carefully if you choose to use it in this problem set.

The specification of StreetSegReader refers to a Tiger Database Directory. We have provided a collection of directories holding Tiger Databases for you to use, which are located on the 6.170 locker in the /mit/6.170/tigerdb/ directory. Some of the databases are very large; thus, we have organized some of the directories by size (large, medium, small, smaller, tiny). This way you can experiment using a small database and then work your way up to larger ones. Directory /mit/6.170/tigerdb/arealocal/ holds database files for the Boston and Cambridge area.

Some properties of the collections of GeoSegments you have worked with in the past will not hold with the StreetSegments produced from the Tiger Database. For example, StreetSegments often share the same name (as opposed to having one GeoSegment named MassAve1, the next MassAve2, and so on). Many StreetSegments in the Tiger Database don't even have a name associated with them! (In that case, the name will be the empty string, which you should print as "unnamed street"; users cannot look up addresses on such a street.) So be careful in what you assume when working with the Tiger Database.

Segment list

Every database directory contains a file segments.txt.gz that lists the street segments present and their number ranges and zipcodes. You may find this useful in generating queries. Note: there can be many lines with the same streetName and zipcodes. Since this file is meant to be more of assistance to you than necessary for the problem set, only use it if it helps.

Each line of segments.txt.gz takes the form.

streetName [leftZip] [rightZip] [lowNum-highNum]
where the items in square brackets are optional. A sample line is:
Robinson Rd 02631 02631 1-19

Do not attempt to use segments.txt to load the database; that is what the StreetSegReader is for. The segments.txt file is only a partial view of the database. It has no information about how segments are connected, or of what their start and end points or lengths are.

Here are some addresses of popular locations that you may want to test in queries (they are also listed in addresses.txt:

w20 (Stratton Student Center) : 84 Massachusetts Ave 02139
Lobby 7 : 77 Massachusetts Ave 02139
LCS : 545 Technology Sq 02139
MIT Museum : 265 Massachusetts Ave 02139
The Hong Kong : 1236 Massachusetts Ave 02138
John Harvards Brew House : 33 Dunster St 02138
Loews Theatre (Harvard) : 10 Church St 02138
CambridgeSide Galleria Mall : 100 Cambridgeside Pl 02141
Ximian : 101 Rogers St 02142
Newbury Comics (Everett) : 38 Everett St 02134
Loews Cheri : 50 Dalton St 02115
Newbury Comics (Newbury) : 332 Newbury St 02115
Tower Records (Newbury) : 360 Newbury St 02115
Museum of Fine Arts : 465 Huntington Ave 02115
Fenway Park : 4 Yawkey Way 02215
Nickelodeon Theatre : 34 Cummington St 02215

Duplicates list

Each directory in /mit/6.170/tigerdb contains a killfile.txt file which may be helpful in identifying sets of StreetSegments which have the same p1 and p2, but which are not equal. Lines starting with WARNING indicate segments which meet these warning criteria. Running tests which use any of the warning streets in killfile.txt may lead to results which are not consistent over time.

Multiple shortest paths

It is possible, though rare, for there to be multiple shortest paths between two locations. This can happen for two separate reasons:

Your code is required to return a shortest path, but which shortest path it returns is not specified. The test cases used for validate6170 and for grading your solution have unique shortest paths, so you should not need to worry about this issue.

However, you may find that running your own route-finding queries on different JVMs may produce different paths (of the same length), because of differences in the implementation of iterators on the different JVMs. If this happens (and only if this happens), you should investigate to determine whether the two paths are really of equal length, or there is a bug in your implementation which causes the different results.

Here are three approaches to debugging such a problem:

Miscellaneous

There are two ways to leave an address -- by initially turning right or by initially turning left. Similarly, there are two ways to approach an address -- with the destination on your right or the destination on your left. Thus, even for a search from one address to another address, you will probably find that the easiest way to handle this is to pass two start nodes and two end nodes to DirectionsFinder.getDirections.

The start and destination addresses may appear on the same street; they may even appear on the same street segment. MapQuick should still produce correct output. In this case, there would be only one directions-line (as described in the directions section).

The StreetSegReader provides no one-way street information, and it only produces a single StreetSegment object for each piece of street in the database. You may need to create, and insert in the street graph, the reverse of each StreetSegment produced by the StreetSegReader.

Newer versions of the Java platform (such as those used on Athena) have a relaxed specification for the behavior of floating point operations. This should not affect you in most cases (it is one reasons you were told to use ints to represent latitude and longitude), but can possibly cause your code to arrive at different numeric results depending on the platform you are using. It can also, in certain cases, cause your program to arrive at different results over a single run of your code. To avoid all of these problems use the strictfp keyword in front of class in all classes which perform floating point arithmetic; for instance,

public strictfp class GeoSegment {
  ...
}

Validate

The validate6170 ps6 command uses a public test suite that checks your solution against the following queries on the tiny database:

From: 52 Wauwinet Rd 02554
To: 64 Wauwinet Rd 02554

Start at 52 Wauwinet Rd 02554
Turn left onto Wauwinet Rd and go 0.2 miles.
64 Wauwinet Rd 02554 is on your left
Trip length: 0.2 miles

From: 32 Wauwinet Rd 02554
To: 64 Wauwinet Rd 02554

Start at 32 Wauwinet Rd 02554
Turn left onto Wauwinet Rd and go 0.7 miles.
64 Wauwinet Rd 02554 is on your left
Trip length: 0.7 miles

From: 44 Millbrook Rd 02554
To: 200 Madaket Rd 02554

Start at 44 Millbrook Rd 02554
Turn left onto Millbrook Rd and go 0.7 miles.
Turn slight left onto Milbrook Rd and go 0.2 miles.
Turn left onto Madaket Rd and go 3.1 miles.
200 Madaket Rd 02554 is on your left
Trip length: 4.0 miles

From: 111 Somerset Rd 02554
To: 48 Eel Point Rd 02554

Start at 111 Somerset Rd 02554
Turn left onto Somerset Rd and go 0.4 miles.
Turn left onto Sumerset Ln and go 0.5 miles.
Turn left onto Hummock Pond Rd and go 0.2 miles.
Turn right onto Millbrook Rd and go 1.0 miles.
Turn slight left onto Milbrook Rd and go 0.2 miles.
Turn left onto Madaket Rd and go 0.5 miles.
Turn slight right onto Eel Point Rd and go 1.0 miles.
48 Eel Point Rd 02554 is on your left
Trip length: 3.8 miles

Schedule

This problem set has many facets; to complete it on time, you must set a strict schedule for yourself. Here is one possible schedule, with milestones in bold.
Monday March 19 Problem set released.
Read problem set, and resolve any confusions or issues.
Start analysis of problem with OMs
Start designing module structure of entire system using MDDs
Start specifying additional components.
Wednesday March 21 Problem analysis complete and in final form.
Draft of module structure and specs of new components complete.

Meet with TA to discuss your design and schedule.
Start implementing new components for backend; test on tiny Tiger database.
Experiment with larger databases to pinpoint "hotspots" in code which may require implementation changes.
Friday March 23 Initial implementation complete.
Start work on text-based interface and test drivers.
Make implementation changes for performance; test on small Tiger database.
Monday March 26 Spring Break Starts
Friday March 30 Spring Break Ends
Monday April 2 Entire system with programmatic and text-based interfaces tested and documented.
Start design and implementation of graphical user interface.
Run experiments on medium database.
Wednesday April 4 Entire system with all interfaces tested and documented.
Polish and proofread documentation.
Thursday April 5 Project submitted.

Q & A

This section lists clarifications and answers to common questions about the problem set.

Thursday, March 29, 2001

Q: Can I change the specifications for code I wrote in previous problem sets, even if the specifications were given to us?

A: Yes you may. As alluded to in the the introduction, all parts of the design and specifictions are up to you, except for the limited programmatic interface and textual interface specified above. Be careful, however: to successfully complete ps4 returnin, your ps4 modules must still meet ps4's specifications. This requirement also covers any depended-on components, such as GeoSegment and GeoPoint for StreetSegment. One you have succeded on ps4 returnin, however, you may change the specifcations as you see fit. If you so so, you must clearly describe and justify any changes in your ps6 submission.

Q: Does StreetSegReader provide both a segment and its reverse, or just the segment itself?

A: The StreetSegReader provides only the segment itself, so you will have to create the reverse yourself if you want to use it.

Q: How do I know what cities are in each database?

A: See the README file for tigerdb directories at /mit/6.170/tigerdb/README

Q: Aren't there two answers for the first test used by validate6170 ps6? The start and end address are on the same segment!

A: No, there is only one correct answer.

Q: The performance requirements seem to indicate that certain queries might take up to 15 minutes, yet my implementation answers all queries in under 2 seconds. Am I doing something wrong?

A: No, it is quite possible that a good implementation will answer all queries in only a few seconds.

Sunday, April 1, 2001

Q: Do I need to test the textual and graphical user interfaces?

A: Yes. However, as 6.170 did not cover the testing of GUIs (which often involves test harness that simulates mouse clicks), you do not need to turn in any specific tests for your GUI, besides a brief description of what you did to test it. You should turn in complete tests of your textual UI and your programmatic interface. You can test the textual UI by piping a file into your program (from the command line) or by using a FileReader in place of System.in.

Tuesday, April 3, 2001

Q: Is zero (0) a valid street number (for example, as given to the programmatic interface).

A: Yes.

Q: I'm not working on Athena; how can I make sure StreetSegReader will work?

A: Make sure you have all of the latest files from /mit/6.170/lib/, especially the ps4 and ps6 jar files. Also, make sure you have recent killfiles (from /mit/6.170/tigerdb/tiny/, for example).


Change Log

This section details the changes made since the handout was posted to the website. You may use the version number at the bottom of your hardcopy to determine what has changed since you printed this handout.

revision 1.39: Clarify reference to TestMapQuick.
revision 1.38: Added questions to the the Q & A section. (Updates 1.33 - 1.38)
revision 1.32: Updated the information on the duplicates list; the mentions of duplicates.txt.gz have been changed to killfile.txt.
revision 1.31: Tighened the specification for the Directions interface to say that the sequence of strings must not contain newlines (as opposed to may not contain newlines). Updated the validate source code to match the expected behvaior of the tighted specification. Added more test cases to ValidateQueries.java and implemented more tester funcationality in PS6TestCase.java.
revision 1.30: Rewrote the code used for validate6170 ps6 to be higher quality, so that students may use the framework for their own testing, if they choose.
revision 1.28: Further clarified that fractional street segments should not be implemented in the default configuration.
revision 1.27: Corrected contradiction about input when exiting text-based interface: all six inputs should not be read if the first is 0.
Clarified that fractional street segments are optional (and should not be implemented in the default configuration).
Clarified text about segments.txt.gz and duplicates.txt.gz files.
revision 1.26: Fixed link to validate source.
revision 1.24: Fractional street segments are now optional. hashCode blurb is more substantial. Some simple formatting errors fixed.
revision 1.23: Problem set released.


Back to the Problem sets page.
For problems or questions regarding this page, contact: 6.170-webmaster@mit.edu.
$Id: ps6.html,v 1.39 2001/04/03 19:52:53 mernst Exp $