Handout P6

Quick links:

Contents:

Errata

Getting Started

Please be warned that although you have about two weeks for this problem set, you should not underestimate the amount of work involved. It is essential that you pace yourself. The teaching staff can assist you if you run into difficulties, but not if you wait until the last minute to ask for help.

As usual, the source code that you need can be found by updating your psets module from cvs.

Introduction

In this problem set you will assemble the modules you developed in problem sets 2 through 4, as well as some additional modules, to build MapQuick, a Google Maps-like system for obtaining directions between street addresses. MapQuick obtains its data from Tiger databases provided by the U.S. Census bureau. It has both textual and graphical user interfaces, plus a programmatic interface, which enables it to be used by other programs.

Although you will be reusing code from other problem sets, you should go about your design as if you were developing MapQuick from scratch, then reuse as appropriate.

You are not required to use the modules you developed previously. If you would like to, you may start from a clean slate and develop MapQuick from scratch. We strongly caution you, however, that this is a risky path to take, and that if you are talented enough to complete the assignment from scratch in about two weeks, you may do better to assemble a version from what you have first, and only then, if it works, consider a different design. If you choose to do this, we urge you to discuss your new design with your TA; getting feedback on your ideas is essential to doing quality work and avoiding mistakes.

Even though you have done most of the work needed for the MapQuick system in previous problem sets, you may still wish to improve on it, given the expertise and understanding you have acquired during the semester. If you chose representations of abstract data types that are not efficient enough, you may need to rework them. The representation you chose for Graph, for example, may not scale to large graphs. To help you decide what you might need to change and get you started on this problem set in a timely manner, we ask you to measure the order of growth in time for several operations on data structures from psets 2-4 to be handed in by Friday, Mar 24. Please see the profiling exercise.

In this document we will say "MapQuick does such and such" as shorthand for "When you finish, MapQuick will do such and such." You will hand in a self-contained project that includes a problem analysis, a design, module specifications, commented code, testing artifacts, and user documentation. The details for these items can be found in the handout Documenting a Software System and your documentation should follow the format suggested there. Please note that the suggested number of pages for each section in this document are for the documentation for the Final Project. The documentation for this problem set should be approximately half the suggested length, i.e., about 5 pages. IMPORTANT NOTE: Requirements analysis, design, implementation, and testing will account for a significant proportion of your grade. For any of these parts, your grade will depend on the quality of your ideas as you present them. This policy reflects good practice in industry and research: however good your ideas are, they are worth little if you cannot convey them to others. You will not receive a good grade for this problem set if your documentation is found wanting, even if your system works perfectly.

The Problem

MapQuick takes as input two addresses — a starting address and a destination address — each consisting of an address number, street name, and zipcode (e.g., 305 Memorial Dr. 02139). If there is a path from the start to the destination, MapQuick gives directions; if not, it reports that no path between the two addresses exists. MapQuick does not account for one-way streets, because such information is not available in the Tiger database. The database is also somewhat old, so don't be surprised if there are occasional discrepancies between it and reality.

On startup, MapQuick reads a collection of database files containing the map information, and optionally a set of zipcodes to which searches will be confined. The course staff provides files in three directories (all subdirectories of /mit/6.170/tigerdb), giving several different databases; including tiny (a database for testing), small (a larger database that includes Cambridge), and medium (which includes most of the Boston metropolitan area, including Cambridge and downtown Boston). There are other databases available; look in the /mit/6.170/tigerdb directory to find them. The staff also provides a StreetSegReader class that reads a Tiger database and provides an Iterator of StreetSegments.

If you finish early, you may wish to implement some optional features that a real-world implementation of this system might have. You should not, however, consider these until you have a fully working system.

Text-Based Interface

The class ps6.TextUI implements a text-based interface (also known as a command line interface or CLI). When invoked from the command line as:

java ps6.TextUI database-directory zipcode1 zipcode2 ... zipcodeN

with the first argument giving the database directory, and additional arguments giving zero or more zipcodes, the main method loads all the database files from the specified directory with the file extension .zip, filtering out all street segments that are not in one of the given zipcodes. If no zipcodes are given as arguments, there is no filtering, and all street segments are included. (Some street segments in the database are missing (both) zipcodes; these should always be included. See specs for DirectionsFinder. There may be other issues in the exact definition of filtering that you should resolve in your problem analysis). If the attempt to load fails (say, because there are no .zip files in the given directory), MapQuick prints:

Database error

and immediately exits. Otherwise, it enters a user interaction loop.

The user interaction loop queries the user for two addresses, outputs directions for traveling between those addresses, and then repeats. The address queries have the following format:

starting number? user-input
starting street? user-input
starting zipcode? user-input
destination number? user-input
destination street? user-input
destination zipcode? user-input
walking or driving [w/d]? user-input

If the starting number is -1, then the program exits immediately, without requesting a starting street, starting zipcode, or any destination information.

IMPORTANT NOTE: You should not cause your program to exit by invoking System.exit(). Instead, you should cause your program to terminate by breaking out of the interactive loop with return or break. If you fail to do so and invoke System.exit(), some of your JUnit tests will fail to run.

All seven inputs should be read before any other checking is performed.

If the starting zipcode does not appear in the database, the directions output should be:

No such zipcode: number street zipcode

Otherwise, if the starting street does not appear in the specified zipcode, the directions output should be:

No such street: number street zipcode

Otherwise, if the starting number does not appear on the street in the specified zipcode, the output should be:

No such number: number street zipcode

Otherwise, the second zipcode, street, and number should be checked, with error output as specified above.

If the input for type of directions is not 'w' or 'd', the output should be:

Invalid direction type: direction-type

If there is no way to get from the first address to the second, the directions output should be:

No path from number street zipcode to number street zipcode

Essentially, the text UI outputs the first (and only the first) of these error messages that applies if there is an error.

Otherwise, the directions output should be as follows:

For driving directions:

Start at number street zipcode
directions-line
...
number street zipcode is on your left/right
Trip length: n.n miles

For walking directions:

Start at number street zipcode
directions-line
...
number street zipcode is on your left/right
Trip time: n minutes

The format of each directions-line (of which there is one or more) is as specified by DrivingRouteFormatter or WalkingRouteFormatter depending on the direction type the user has chosen. The turn specified in the first directions line is always either a "left" or a "right" (not "slight left", etc.), assuming that travel begins at the starting address facing the street; likewise, the last line will state that the destination is on the left or the right as you are traveling towards the destination on the prescribed path. The trip length should be given to the nearest tenth of a mile. When reporting trip time, you should round the time to the nearest minute. For walking, assume a speed of 20 minutes per mile.

Please note that the case where both the starting and the destination addresses are on the same street segment warrants special attention for both the first and last directions line. We do not specify the appropriate directions for the special case when the start and end addresses have identical fractional distances along the segment and you are free to print anything reasonable.

If ps6.TextUI is invoked as specified above, it should not produce any other output to System.out, System.err, or elsewhere. You may implement other behaviors, such as debugging output or more detailed directions, by interpreting command-line flags that turn on such behavior, by providing other classes with main methods, by turning off all debugging output before submitting your solution, or by some other mechanism. IMPORTANT: The default behavior of TextUI needs to conform to the above specification exactly in order to pass automated testing.

We have provided a partially implemented TextUI for you to complete and also provided you with two files textui.test and textui.expected for checking your output format. To test your TextUI, change directory to /mit/YOUR_USERNAME/6.170/psets/ps6 and run:

 
java ps6.TextUI /mit/6.170/tigerdb/tiny < textui.test > textui.actual 

Your goal is to have textui.actual printed in the same format as textui.expected.

Graphical User Interface

You were asked to mock-up a graphical user interface for MapQuick in Problem Set 5. You will implement your GUI (taking into account your TA's feedback and comments) in this problem set. Your GUI should be named ps6/GraphicUI.java, so that the command java ps6.GraphicUI (with the appropriate classpath) launches MapQuick's graphical user interface. You may interpret additional command-line parameters, but the system must run when invoked without any additional command-line parameters.

It is possible that you might have designed a very fancy and comprehensive user interface in Problem Set 5. Although it would be good for you to implement the GUI (or perhaps an improved version after taking into account comments from your TA), you are allowed to implement a simpler interface. Your GUI must support at least the following minimal set of features:

  1. Selecting a Tiger database directory;
  2. Entering zipcodes for filtering;
  3. Entering addresses;
  4. Selecting direction type; and
  5. Displaying directions.

You can get full credit for this problem set by implementing and documenting these features well.

Then, you can implement your full GUI from Problem Set 5 and other optional features, if you have spare time after you have this minimal set working correctly.

If you have never done any GUI programming and are not familiar with Java graphics tools, don't panic! The Swing Lab explains how to create graphical windows using various layout managers. The lab also contains instructions for writing key listeners and mouse listeners to make your user interface more compelling and efficient to use.

Please use only components from the Swing widget set for this assignment. You are allowed to use NetBeans (or some other visual GUI builder) to auto-generate the code for your GUI, but you should state clearly in your documentation what GUI builder was used, and we will not officially support these visual GUI builders. Also, please check the GUI builder source file(s) (for example, the NetBeans .form file) into CVS for your TA's reference.

Programmatic Interface

In this problem set, the choice of internal components (classes and interfaces) and their specifications is up to you. However, in order to allow us to test your system, we require that it provide a programmatic interface: a set of classes that allows MapQuick to be executed from another program. Providing such an interface is essential anyway for cleanly separating the user interface from the rest of the system, and it allows you to do your own testing more easily.

See the specifications of the required programmatic interface.

You may change the specs for any of the classes that you defined in the previous problem sets, for example, in your Graph ADT (Problem Set 3) or RoutePath (Problem Set 4). However, you may not modify the specifications of the fully specified classes in Problem Set 2, i.e., RouteFormatter, DrivingRouteFormatter and WalkingRouteFormatter.

Optional Features

After you have completely implemented, tested, and documented the required functionality, you might wish to implement some of these features (or others of your own devising):

If any of these features are added, they cannot affect the behavior of the ps6.TextUI class as specified above. Therefore, in order to demonstrate them, you could:

Work on optional features will gain special credit. Such credit will be held in reserve and applied when we compute your final grade after we have ranked the students in the class. This will ensure that students who do not choose to work on optional features will not be penalized in any way, but students who do are rewarded appropriately. Please remember to document the optional features that you implement clearly. If you should implement a feature that is not visually obvious, it is possible that your TA might miss it and not award you credit for it.

Reading the Tiger Database

StreetSegReader

The staff-provided StreetSegReader class can be used to read street segments from Tiger database directories. Every database directory contains two auxiliary files, segments.txt.gz and killfile.txt, that you may find helpful. Please read the specifications for StreetSegReader carefully.

The specification of StreetSegReader refers to a Tiger Database Directory. We have provided a collection of directories holding Tiger Databases for you to use, which are located on the 6.170 locker in the /mit/6.170/tigerdb/ directory. Some of the databases are very large; thus, we have organized some of the directories by size (large, medium, small, smaller, tiny). This way you can experiment using a small database and then work your way up to larger ones. The directory /mit/6.170/tigerdb/arealocal/ holds database files for the Boston and Cambridge area.

Some properties of the collections of GeoSegments you have worked with in the past will not hold with the StreetSegments produced from the Tiger Database. For example, StreetSegments often share the same name (as opposed to having one GeoSegment named MassAve1, the next MassAve2, and so on). Many StreetSegments in the Tiger Database don't even have a name associated with them! (In that case, the name will be the empty string, StreetSegReader will automatically assign the String "(unnamed street)" to it as the street name; users cannot look up addresses on such a street.) So be careful in what you assume when working with the Tiger Database. Also, you want to avoid tests that involve segments in the Duplicates List.

Please note that street numbers are often not unique for a given street. For example, there are at least two 77s on Massachusetts Avenue in /mit/6.170/tigerdb/mit. Hence, you will need to take the zipcode into account when you check if a street segment contains an address to make sure that you get the right segment.

Segment list

Every database directory contains a file segments.txt.gz that lists the street segments present and their number ranges and zipcodes. You may find this useful in generating queries and to help you with debugging. Note: there can be many lines with the same streetName and zipcodes. Since this file is meant to be more of assistance to you than necessary for the problem set, only use it if it helps.

Each line of segments.txt.gz takes the form:

streetName startPt endPt length [leftZip] [rightZip] [leftNumbers] [rightNumbers]

where the items in square brackets are optional.

Do not attempt to use segments.txt to load the database; that is what the StreetSegReader is for. These files are rather large, so a quick way to filter out the StreetSegments of interest is to use the grep command. For example, if you want a listing of the streets with the name "Prospect St" in /mit/6.170/tigerdb/tiny, you can try:

zcat /mit/6.170/tigerdb/tiny/segments.txt.gz | grep "Prospect"

(The correct version of zcat is only available on Linux Athena machines. Users of other machines can use gunzip to temporarily create flat text files.) This command returns the following results:

Prospect St     (41.278841,-70.105022)  (41.27939,-70.106266)   0.1             02554           2-10
Prospect St     (41.278773,-70.104749)  (41.278841,-70.105022)  0.0             02554           12-14
Prospect St     (41.278544,-70.104021)  (41.278773,-70.104749)  0.0     02554           17-19
Prospect St     (41.278407,-70.10393)   (41.278544,-70.104021)  0.0     02554   02554   21      16-22
Prospect St     (41.27779,-70.102959)   (41.278407,-70.10393)   0.1     02554   02554   23-33   24
Prospect St     (41.277698,-70.102716)  (41.27779,-70.102959)   0.0             02554           26-32
Prospect St     (41.277058,-70.101624)  (41.277205,-70.101875)  0.0
S Prospect St   (41.275686,-70.100471)  (41.277058,-70.101624)  0.1
S Prospect St   (41.274977,-70.099591)  (41.275686,-70.100471)  0.1     02554   02554   55-65   58-70
S Prospect St   (41.274679,-70.099197)  (41.274977,-70.099591)  0.0
Prospect St     (41.277205,-70.101875)  (41.277698,-70.102716)  0.1

Here are some addresses of popular locations that you may want to test in queries (they are also listed in addresses.txt:

  w20 (Stratton Student Center): 84    Massachusetts Ave    02139
  Lobby 7                      : 77    Massachusetts Ave    02139
  MIT CSAIL                    : 32    Vassar St            02139
  MIT Museum                   : 265   Massachusetts Ave    02139
  The Hong Kong                : 1236  Massachusetts Ave    02138
  John Harvard's Brew House    : 33    Dunster St           02138
  Loews Theatre (Harvard)      : 10    Church St            02138
  CambridgeSide Galleria Mall  : 100   Cambridgeside Pl     02141
  Ximian                       : 101   Rogers St            02142
  Newbury Comics (Everett)     : 38    Everett St           02134
  Loews Cheri                  : 50    Dalton St            02115
  Newbury Comics (Newbury)     : 332   Newbury St           02115
  Tower Records (Newbury)      : 360   Newbury St           02115
  Museum of Fine Arts          : 465   Huntington Ave       02115
  Fenway Park                  : 4     Yawkey Way           02215
  Nickelodeon Theatre          : 34    Cummington St        02215

Duplicates list

Each directory in /mit/6.170/tigerdb contains a killfile.txt file which may be helpful in identifying sets of StreetSegments which have the same end points, but which are not equal. Lines starting with WARNING indicate segments which meet these warning criteria. Running tests which use any of the warning streets in killfile.txt may lead to results which are not consistent over time. The StreetSegmentReader will also (by default) implicitly remove StreetSegments marked KILL in the killfile.

Testing

Validate

The ant validate command uses your test suite, in SpecificationTests.java . We supply you with the following tests, which use the tiny database:

From: 52 Wauwinet Rd 02554
To: 64 Wauwinet Rd 02554

Start at 52 Wauwinet Rd 02554
Turn left onto Wauwinet Rd and go 0.2 miles.
64 Wauwinet Rd 02554 is on your left
Trip length: 0.2 miles

From: 32 Wauwinet Rd 02554
To: 64 Wauwinet Rd 02554

Start at 32 Wauwinet Rd 02554
Turn left onto Wauwinet Rd and walk for 13 minutes.
64 Wauwinet Rd 02554 is on your left
Trip time: 13 minutes

From: 44 Millbrook Rd 02554
To: 200 Madaket Rd 02554

Start at 44 Millbrook Rd 02554
Turn left onto Millbrook Rd and go 0.7 miles.
Turn slight left onto Milbrook Rd and go 0.2 miles.
Turn left onto Madaket Rd and go 3.1 miles.
200 Madaket Rd 02554 is on your left
Trip length: 4.0 miles

From: 111 Somerset Rd 02554
To: 48 Eel Point Rd 02554

Start at 111 Somerset Rd 02554
Turn left onto Somerset Rd and go 0.4 miles.
Turn left onto Sumerset Ln and go 0.5 miles.
Turn left onto Hummock Pond Rd and go 0.2 miles.
Turn right onto Millbrook Rd and go 1.0 miles.
Turn slight left onto Milbrook Rd and go 0.2 miles.
Turn left onto Madaket Rd and go 0.5 miles.
Turn slight right onto Eel Point Rd and go 1.0 miles.
48 Eel Point Rd 02554 is on your left
Trip length: 3.8 miles

The fact that we supply these few tests does not mean you should not supply more of your own tests in SpecificationTests, however.

Testing Strategy

Testing MapQuick will involve both unit testing for the new modules and also integration testing to confirm that all the modules work together to produce the correct results, i.e., return the correct directions. You already have some experience with both unit testing and integration testing from previous problem sets. Unit testing involves writing JUnit tests that verify the correctness of the methods in each module and you should do this systematically.

For integration testing, your main goal is to check that MapQuick returns the correct directions for the route between two end points. You would want to be systematic about this and divide the testing domain into subdomains; for example, you may wish to test routes that involve end points on the same segment, end points on different segments and also testing the system on end points that are of increasing distances apart, etc.

The main difficulty that you would encounter in coming up with test cases is that it is not easy to determine what the correct answers should be. One way to construct test cases is to use information from the segments.txt.gz files for simple cases involving short routes. For longer routes, it would probably be easier to construct test cases based on landmarks that are familiar to you by using the database mit or arealocal. To help you ascertain whether your program returns the correct directions, you may also wish to refer to directions provided by Google Maps. Please be warned however that the Tiger database that we are using for this problem set is somewhat outdated (the census department updates it once a decade) and the road information available may be somewhat outdated compared to what you find on MapQuest or Google Maps.

To test TextUI, you can either use the provided testing infrastructure or create test files similar to textui.test (so that you don't have to type the same input commands into TextUI repeatedly) and run them from the command line.

To use our testing infrastructure, you can model TestMapQuick after PublicTest. Then, create a class that stores your test queries similar to ValidateQueries (or make it an inner class in TestMapQuick) for both your programmatic interface and the text user interface. Add your test cases by creating appropriate TestRecords.

Performance

Though 6.170 does not focus on performance in software engineering much, it is important to be aware of performance (especially order of growth) issues as you make your design. To this end we have prepared a preliminary exercise to this pset to help you start thinking about how your design affects the run time of your programs.

Profiling Exercise

DUE: Friday, Mar. 24 before lecture begins

For the preliminary turnin, you will empirically measure the order of growth of several important operations for building your graph and finding paths on it.

Please measure and plot the order of growth (in time) of:

You may find the provided TimeProfiler.java class to be useful. It is a simple abstraction that attempts to model a stopwatch in your code. You can tell the profiler to start timing and stop timing at any point, and it can print you some detailed statistics using toString().

Alternately, you can use the Unix command time to measure the time it takes to execute any other Unix command. For example:

% time sleep 10
0.000u 0.001s 0:10.00 0.0%      0+0k 0+0io 0pf+0w

This tells you that the computer took 10 seconds of real time, 0 seconds of user-level execution time, and 0.001 seconds of system-level execution time to complete the command. You may also feel free to use any third-party Java profiling tools you have access to—just tell us about them; we're always looking for ways to improve this course.

Please turn in clear and labeled graphs of the runtimes of these operations using your codebase to your TA. You must turn these graphs in electronically in PDF format. The graphs should have enough data to see the asymptotic order of growth of each operation, and get a rough estimate of the constants involved. (A common problem is to include too few datapoints!) Please include with your graphs what kind of hardware and measurement tools you used to get your results.

Requirements

Your implementation should have reasonable performance. We specify the performance requirements here in terms of some locations whose addresses are given in the section on Database Contents below.

On an HP Compaq Pentium 4 (3.2 GHz), as configured in the Athena clusters and using a 128-megabyte Java heap, your implementation should meet the following performance requirements (on other faster platforms, you should do even better; these are only the minimal requirements):

02139 02141 02215
Fenway Park -> Lobby 7 : 4 seconds

02139 02138 02141 02215 02115
Museum of Fine Arts -> Tower Records (Newbury) : 5 seconds
Newbury Comics (Newbury) -> Fenway Park : 5 seconds
Fenway Park -> Lobby 7 : 20 seconds
Tower Records (Newbury) -> Lobby 7 : 8 seconds

02139 02138 02141 02215
Fenway Park -> John Harvard's : 20 seconds

02139 02138 02141 02215 02134
Newbury Comics (Everett) -> Fenway Park : 30 seconds

These are conservative numbers. A program that is carefully designed and uses good representations of the datatypes you built in problem sets 2 to 4 should run an order of magnitude faster. Don't be surprised if a query is fast in one direction but slow in another. Directions from Fenway Park to John Harvard's (with the above zipcode filtering) take an order of magnitude less time to find in one implementation than the same trip in reverse. For your convenience, we have included files with these searches in the same format as the textui.test in the tests directory.

If you find that your implementation performs badly, don't start optimizing all over the place. If you do so, you're unlikely to improve the performance dramatically, and you'll probably introduce bugs into your code. Instead, you should figure out where the bottlenecks are, and develop a strategy for handling them. Read our advice on representations below. Remember, after you implement improvements to your code, make sure to run regression tests.

Hints

Strategy

One possible approach to designing and implementing MapQuick is:

Documentation is an important component of this problem set, so you should be documenting your work as you go along. Please do not to leave all the documentation work to the end; if you do so, there is a risk that you may not to be able to find sufficient time for it.

Graph Construction

The main functionality of StreetSegReader is to provide you with an Iterator of StreetSegments. You will have to determine on the connectivity between StreetSegments based on the coordinates of their end points. It is quite common for several StreetSegments to meet at the same point (i.e. at intersections).

The StreetSegReader provides no one-way street information, and it only produces a single StreetSegment object for each segment of a street in the database. Also, because your implementation of Graph in Problem Set 3 is directed, you may need to generate a reverse segment for each segment returned by StreetSegReader when you create the network graph.

Efficiency

In general, a focus on correctness is far more important than a focus on performance, especially when only minor efficiency improvements are at stake. However, dramatic, order-of-magnitude performance improvements may be worth making even before your code is fully debugged, because they may make debugging faster and easier, thus contributing to the correctness goal. For example, if your code is taking minutes to load the tiny database (or your own databases) -- something you may do a lot during testing -- then your testing will be slow and unpleasant. Reducing that time to a fraction of a second will make your debugging more effective.

Choice of Representations

The representation of an abstract type can dramatically affect performance and scalability. Even with correct implementations of the modules you built in the earlier problem sets, you may be unable to assemble a MapQuick program that can load the entire database and respond to queries in a reasonable time. Here are some hints to help you identify where you might have problems, and how to overcome them.

Hash Functions

Objects that are used as keys in hash tables need reasonable hash functions to prevent collisions. In the worst case (for example, a hash function that always returns the same value), lookups in a hash table degenerate from unit cost to linear cost. Advice on constructing reasonable hash functions can be found in the Introduction to Algorithms (CLRS) textbook or the Effective Java text.

In short, there are three properties which a good hash function should have:

  1. It must obey the hash code invariant. a.equals(b) ==> a.hashCode()==b.hashCode().
  2. It is nice if it has good coverage. a.hashCode()==b.hashCode() usually means a.equals(b).
  3. It is important that hashCode be fast. You may want to consider storing the hash code of an object in a private field so your hashCode method can simply return it.

Make sure that hashCode() for GeoPoint, GeoSegment and StreetSegment are implemented efficiently.

Profiling MapQuick

Loading the Tiger databases into a Graph abstraction can be a major bottleneck in your MapQuick systems. First of all, the databases are fairly large files and file I/O is very slow compared to other computer tasks. Also, many of your components from previous problem sets need to be implemented in an efficient way in order to build the Graph quickly. Lastly, an efficient algorithm for building the Graph is not trivial, although you should have already thought of one from Problem Set 5. If your program seems to run too slowly and you have no idea what is causing it, you may wish to profile your program to determine the bottleneck.

It also might be useful to use TimeProfiler or another tool of your choice to figure out three pieces of information:

The first piece of data gives you a general idea of how much time your program spends on file I/O. However, the process DOES depend on your StreetSegment constructor, which also depends on your StreetNumberSet constructor.

The second statistic allows you to figure out how much of the time is spent on building your Graph. There may be lots of reasons for this to be a slow process. You might want to profile sub-parts of the system to deduce whether the slowdown comes from your Graph methods or your building algorithm.

The memory usage is important because poor design choices from earlier problem sets may be using too much memory for practical use.

Turning off checkRep()

If your implementation seems to take forever to load the tiny database, consider turning off (commenting out or short-circuiting with an if statement) checkRep() in your Graph implementation. However, please ensure that Graph is mostly debugged before you do so.

Java Heap Size

You will probably need to increase the Java heap size in order to load the entire medium database and handle queries on it. Increasing the maximum Java heap size can be done in the 1.5 JDK with the command line argument -Xmxsize. We recommend a 128-megabyte heap for MapQuick; thus the argument would be -Xmx128M. Run java -X for more information on this and other non-standard options to the Java virtual machine.

To run your program with a larger than default heap size in Eclipse, instead of simply right-clicking on GraphicUI and choosing Run As -> Java Application, choose Run As -> Run.... In the pop-up, click on the Arguments tab and type in -Xmxsize into the VM Arguments textArea.

Fractional Segments

Your MapQuick program as submitted for grading must assume that the entire length of both the start and finish street segments are traversed.

This is not a realistic assumption, and you may optionally account for traversing only part of a segment (for extra credit). Consider traveling from the Apartment to the Coffeehouse:

A graphic demonstrating need for precision at start and end street segments

If your path length included all (or none) of the segments of Penny Lane and Broadway between Abbey Road and Side St, then you may incorrectly report the route that goes left onto Penny Lane, right onto Side St, and right onto Broadway, rather than the shorter route which goes right onto Penny Lane, left onto Abbey Road, and left onto Broadway.

If you do implement fractional segments, make sure that it is not turned on by default. Otherwise, because the auto-grader currently assumes that fractional segments are not implemented, your program will fail all the auto-graded tests. Instead, you should explain in your documentation how your TA can activate and test your fractional segment feature.

Multiple shortest paths

It is possible, though rare, for there to be multiple shortest paths between two locations. This can happen for two separate reasons:

Your code is required to return a shortest path, but which shortest path it returns is not specified. The test cases used for validate and for grading your solution have unique shortest paths, so you should not need to worry about this issue.

However, you may find that running your own route-finding queries on different JVMs may produce different paths (of the same length), because of differences in the implementation of iterators on the different JVMs. If this happens (and only if this happens), you should investigate to determine whether the two paths are really of equal length, or there is a bug in your implementation which causes the different results.

Here are three approaches to debugging such a problem:

Miscellaneous

There are two ways to leave an address: by initially turning right, or by initially turning left. Similarly, there are two ways to approach an address: with the destination on your right, or the destination on your left. So, when DirectionsFinder.getDirections uses the PathFinder you created in Problem Set 3 to search for the shortest path, the start and goal sets should each contain both the start and end segments and their corresponding reverse segments.

You should remove trailing white spaces from the user input (tip: String.trim() will do this). You can also use String.replaceAll( "\n", "") to strip newline characters, if necessary, to meet the specifications for Directions. Although you are not allowed to change the interface for RouteFormatter, you can convert the String returned by RouteFormatter into lines of directions using String.split("\n").

Zipcode filtering is meant to be easy: you have a set of zipcodes from user input and StreetSegmentReader provides you with an Iterator of StreetSegments. For each segment, you use it to build your graph if its zipcode is in the zipcode set; if not, you ignore it. Another possible approach is to create a new ZipcodeFilter (which implements ps6.StreetSegmentFilter) and modify StreetSegReader so that it returns an iterator of only the segments you need. The latter is likely to be significantly more complicated.

The start and destination addresses may appear on the same street; they may even appear on the same street segment. MapQuick should still produce correct output. In the second case, there would be only one directions-line (as described in the directions section). The trip does not contain any U-turns and the trip distance is the entire length of the street segment. Keep in mind that MapQuick assumes buildings are evenly spaced apart along a street segment and think about when you might need to use orderStatistic from StreetNumberSet.

Newer versions of the Java platform (such as those used on Athena) have a relaxed specification for the behavior of floating point operations. This should not affect you in most cases (it is one reasons you were told to use ints to represent latitude and longitude), but can possibly cause your code to arrive at different numeric results depending on the platform you are using. It can also, in certain cases, cause your program to arrive at different results over a single run of your code. To avoid all of these problems use the strictfp keyword in front of class in all classes which perform floating point arithmetic; for instance:

public strictfp class GeoSegment {
  ...
}

Grading Policy

Unlike previous problem sets, this problem set is not divided into separate questions. Although this problem set will be graded out of 100 points, it will be weighted to be worth approximately 50% more than Problem Sets 1 to 5. The distribution of credit for this problem set is as follows:

It should be clear from this distribution that the emphasis of this problem set is design, testing and documentation, not coding. You should refer to the handout Documenting a Software System and make sure that your write-up covers all the sections mentioned therein.

Even if your program has errors, you may still get a lot of credit for this problem set if your work is well-documented; on the other hand, even if your code works perfectly, you may be awarded less than half the total points if you fail to document your work. To get full credit for "Testing", you need to do more than submit a bunch of JUnit tests. You will need to discuss how your tests were selected and why they are sufficient. Your job is to convince your TA that no important tests were omitted, and that the system will really operate as desired once it passes all your tests.

Feedback

Please answer the following questions in a file named feedback.txt in your doc/ directory:

  1. How many hours did you spend on the following components of this problem set?
  2. In retrospect, what could you have done better to reduce the time you spent solving this problem set?
  3. What could the 6.170 staff have done better to improve your learning experience in this problem set?

What to Turn In

Profiling Exercise

By Fri, Mar. 24, submit electronically your graphs for the profiling exercise.

Main Turnin

Please hand in documentation about your development following the format described in the handout Documenting a Software System. This documentation should be turned in electronically as MapQuick.html along with your code.

In your documentation's appendix on file formats, you should not be concerned about the format of the Tiger database files, but you should specify any formats you design (as part of an optional feature, for example).

The documentation should also include an appendix that records how your implementation performs on the sample queries listed above.

Your turn-in should include a file named README that briefly describes the contents of all the files that you turn in. As usual, all your code should be appropriately commented.

The following files should be checked into CVS before you run ant validate: