6.170 Laboratory in Software Engineering
Spring 2004
Problem Set 6: Traffick

Checkpoint Due: Friday, March 19, 2004
Problem Set Due: Tuesday, April 6, 2004 
at 9 pm

Handout P6

Quick links:

Contents:


Getting Started

Please read the recommended schedule before you begin.

Many students in previous terms became too focused on minor performance tweaking and optimization. Some optimizations are useful but most are not! Please read the section on when to optimize before attempting performance related modifications.

You are provided specifications in HTML. The HTML specifications are good for viewing or printing with a Web browser, and they contain special formatting and hyperlinks, just like the Java API Specification. You can edit the starter Java files to complete your assignment without having to retype the declarations.

Make sure you have followed the instructions on the tools handout relevant to directory setup before you begin development. Furthermore, be sure you have read the problem set procedure handout before you begin.

You can get the source files for problem set 6 by using CVS. Follow the problem set checkout instructions and checkout the ps6 module from your repository.

Your code MUST work on Athena, so if you choose to work on a non-Athena machine, please take a few minutes before submitting your solution to ensure that it runs correctly on Athena. Once everything is in order, read the Problem Set Submission instructions for how to run a final check of your code and turn it in.

Introduction

In this problem set you will assemble the modules you developed in problem sets 3 through 5, as well as some additional modules, to build Traffick, a traffic simulator for Boston and Cambridge.  Traffick obtains information about the city street network from Tiger databases provided by the U.S. Census bureau. It has a graphical user interface that displays the running simulation, plus a programmatic interface that enables traffic flow statistics to gathered by other programs.

You should go about this problem set by imagining that it involves developing Traffick from scratch. Most places in this document that we say "Traffick does such-and-such." we mean "When you finish, Traffick will do such-and-such." You will hand in a self-contained project that includes a problem analysis, a design, module specifications, commented code, testing artifacts, and user documentation. You have already done much of this work. Even if you include work done previously, which we encourage you to do, you may want to improve on it, given the expertise and understanding you have acquired during the semester. For example, if you chose representations of abstract data types that are not efficient enough, you may need to rework them. The representation you chose for BipartiteGraph, for example, may not scale well to large graphs.

You are not required to use the modules you developed previously. If you like, you can start from a clean slate and develop Traffick from scratch. We strongly caution you, however, that this is a risky path to take.  Even if you are talented enough to complete the assignment from scratch in two weeks, you may do better to assemble a version from what you have first, and only then, if it works, consider a different design. If you choose to do this, we urge you to discuss your new design with your TA; getting feedback on your ideas is essential to doing quality work and avoiding mistakes.

You have slightly over two weeks (counting spring break) for this problem set, but you should not underestimate the amount of work involved. It is essential that you pace yourself. The teaching staff can advise you if you run into difficulties, but not if you wait until the last minute to ask for help. See the suggested schedule below. To help you get started, you are required to individually meet with a TA by Friday to discuss your design and work-plan.


The Problem

Traffick takes a road system and a set of traffic sources and simulates the flow of traffic along the roads in the system.  Monitors can be installed at various points in the system in order to report statistics about traffic flow.  Traffick has a graphical user interface that displays the running simulation, as well as a programmatic interface that can be used for testing.

Road System

On startup, Traffick reads a collection of database files containing a description of the road system.  The description of a road system includes:

The staff provides the class ps6.reader.RoadSystemReader that reads a database and provides access to the information it represents: RoadSegments, intersections, and speed limits. Please read its specification carefully before you use it.

The specification of RoadSystemReader refers to a Tiger database directory. We have provided a collection of directories holding Tiger databases for you to use.  In your CVS checkout, you will find three databases (all subdirectories of ps6/traffickdb):

There are other databases available.  Look in the /mit/6.170/traffickdb directory to find them.

Some properties of the collections of RoadSegments you have worked with in the past will not hold with the RoadSegments produced from a Tiger Database. For example, RoadSegments in the Tiger database often share the same name (as opposed to having one RoadSegment named MassAve1, the next MassAve2, and so on). Many RoadSegments in the Tiger Database don't even have a name associated with them! In that case, the name will be the empty string.  So be careful what you assume when working with the Tiger Database.

Graph Building

You will need to build a graph of lanes and intersections from the streets in the Tiger database.  Here's an algorithm for doing that efficiently.

// create Lanes
for each RoadSegment s returned by RoadSystemReader,
create a lane for s
add it to the graph

// create Intersections
let pointToIntersection be a Map (initially empty) that maps from a GeoPoint to the intersection at that point
for each intersection i returned by RoadSystemReader,
create an intersection for i
add it to the graph
add it to the pointToIntersection map

// connect Lanes and Intersections to each other
for each lane in the graph,
for each of the lane's end points,
look up the point in the pointToIntersection map
if an intersection already exists at the endpoint,
connect the lane to the intersection (as input or output depending on which lane endpoint it is)
if no intersection exists there,
create a cloverleaf
add it to the graph
add it to the pointToIntersection map
connect the cloverleaf to the lane

Note that the RoadSystemReader currently ignores the issue of one-way streets. It assumes that all RoadSegments in the database allow bidirectional travel, and its iterator returns two lane descriptors for each segment: one for the segment, and another for its reverse.

Checkpoint: by this Friday, you must show a TA that your code can load a road system database.  See below for more about the checkpoint.

Traffic Sources

A traffic source produces new cars and injects them into the road system.  All the cars produced by a given traffic source follow the same route through the road system.  Traffic sources should be modeled as filters with no inputs and one output, which should be the first lane on the route.

Each source has an injection delay, measured in seconds, which is the time interval between the appearance of new cars from the traffic source.  If the source is ready to inject a new car, but the output lane doesn't allow the car to enter it, the source must wait to inject the car until it has a chance.  Because of this potential wait, the injection delay is technically a minimum delay.  Traffic on the lane may not allow the source to inject at its desired rate.  The injection delay is measured between actual injections, so once the source successfully injects a car, it must wait for the specified interval before trying to inject another.

A source may also have an optional limit, which is the maximum number of cars it injects.  The default limit is infinity.  After injecting its limit, the source has no more effect for the remainder of the simulation.

All cars created by the same source not only share the same route but also the same color.  They may share the same name, or have similar names generated from a common root (e.g. "MIT 1", "MIT 2", ...)

Traffic sources are specified in an XML file format. The staff provides a class ps6.reader.TrafficSourceReader that reads this format.  Please read its specification carefully before you use it.

Traffic Monitors

A traffic monitor watches the flow of cars passing a certain point in the road system, and reports statistics about traffic flow.  Traffick allows a monitor to be placed on either a lane or an intersection.  A lane monitor reports the number of cars that have exited the lane (either because they turned onto another lane, or because they finished their routes), while an intersection monitor reports the number of cars that have passed through the intersection.  Monitors may be added or removed at any time during a simulation, and a particular point in the road system may have any number of monitors placed on it.  When a monitor is placed in the simulation, it starts counting from 0 at the moment it is placed.

Programmatic Interface

In this problem set, the choice of internal components (classes and interfaces) and their specifications is up to you. However, in order to allow us to test your system, we require that it provide a programmatic interface: a class that allows Traffick to be executed from another program. Providing such an interface is essential anyway for cleanly separating the user interface from the rest of the system, and it allows you to do your own testing more easily.

You should create a class ps6.Traffick that provides the required programmatic interface.  See the specifications of ps6.Traffick. We provide a skeletal implementation of ps6.Traffick for you to fill in.

You should write a TraffickTest class that tests your implementation in whatever way you feel is most appropriate. TraffickTest must implement junit.framework.Test and be runnable via the usual JUnit mechanisms.

Graphical User Interface

Part of the required interface for ps6.Traffick is a main method that launches Traffick's graphical user interface. This method makes it possible to run the Traffick class using either the java command or Eclipse's Run dialog.  Your main method may interpret its command-line parameters if desired, but the system must also run when invoked without any command-line parameters.

The GUI launched by main must support:

Here is an example of a very simple GUI front-end for Traffick:

an ugly sample GUI

The design of the graphical interface is relatively unconstrained. We advise that you build a minimal user interface and elaborate it only if you have the time and inclination.

We encourage you to use Swing, which you will use for the Antichess/Gizmoball final project.  If you have never done any GUI programming and are not familiar with Java graphics tools, don't panic! The Swing Lab tutorial, written by the 6.170 staff, explains how to create graphical windows using various layout managers, how to paint into a window, and how to handle keyboard and mouse input.

The staff provides a graphics library, ps6.graphics.TraffickPalette, that makes it easier to draw RoadSegments, intersections, and cars, and to determine which RoadSegment or intersection the user clicked on.  You may use this graphics library if you wish, but you don't have to.

Optional Features

If you finish early, you may wish to implement some optional features that a real-world implementation of this system might have. You should not, however, consider these until you have a fully working system: If any of these features are added, they cannot affect the behavior of the programmatic interface specified above. Therefore, in order to demonstrate them, you could:

Performance Requirements

Your implementation should have reasonable performance.  On an Athena Linux box, as configured in the Athena clusters and using a 256-megabyte Java heap, your implementation should meet the following performance requirements:

Note that wall clock time is the time that you spend waiting for your program to run the simulation, while simulated time is the sum of all the time slices the simulation executed.

These are conservative numbers. A program that is carefully designed and uses good representations of the datatypes you built in problem sets 3 to 5 should run an order of magnitude faster.

If you find that your implementation performs badly, don't start optimizing all over the place. If you do so, you're unlikely to improve the performance dramatically, and you'll probably introduce bugs into your code. Instead, you should figure out where the bottlenecks are, and develop a strategy for handling them. Read our advice on representations below. Remember, after you implement improvements to your code, make sure to run your regression tests.  (The Eclipse Run dialog makes it easy to run all the JUnit tests in a project.  Use Run >> Run..., make a new JUnit test, select All Tests in Project, Source Folder, or Package, and click Search to select the project.  On the command line, "ant test" has the same effect.)

Profile Your Traffick Simulator

Loading the Tiger databases into a BipartiteGraph abstraction can be a major bottleneck in your Traffick system. First of all, the databases are fairly large files and file I/O is very slow compared to other computer tasks. Also, many of your components from previous problem sets need to be implemented in an efficient way in order to build the graph quickly.

You may find the provided TimeProfiler.java class to be useful. It is a simple abstraction that attempts to model a stopwatch in your code. You can tell the profiler to start timing and stop timing at any point, and it can print you some statistics using toString().

For determining how much memory your program uses, you can use the Unix top command or Windows Task Manager as your Traffick loads the Tiger database.

Load the small database and use the profiling techniques to figure out three pieces of information:

The first piece of data gives you a general idea of how much time your program spends on file I/O.

The second statistic allows you to figure out how much of the time is spent on building your graph. There may be lots of reasons for this to be a slow process. You might want to profile sub-parts of the system to deduce whether the slowdown comes from your graph methods or your building algorithm.

The memory usage is important because poor design choices from earlier problem sets may be using too much memory for practical use. Remember, you can increase the amount of memory Java uses by setting the -Xmx option, as described in the Hints section. You should be able to load the small database in at most 256 Megabytes of memory (many designs use much less).

When to Optimize

"Premature optimization is the root of all evil." —Donald Knuth.

You should consider optimizing your program only after you have a working version and only if that version is too slow. Do not optimize as you go. If your program does not operate correctly, it does not matter how fast it runs. Do not mistake optimization for debugging. In fact, optimized programs are usually harder to understand and to debug because they break abstraction barriers in the name of efficiency. If you do decide to optimize, it is helpful to comment optimizations in your code and to keep the original, unoptimized versions backed up.

Finally, you should always run tests before and after you optimize. Run your normal test suite (for correctness) to make sure you didn't introduce any new bugs. Don't become too attached to an algorithm or assume that a program will always run faster with fewer lines of code. Run performance tests to objectively measure if your optimizations actually resulted in a net speed or size improvement.


What To Hand In

Meeting with TA

Sometime during the first week of the assignment, you must meet with a TA to get checked off.  This meeting will factor into your final grade.  Here's what you'll need to do to get checked off:
You must get checked off by Friday, March 19.  TAs will be available in the W20 cluster on Thursday and Friday afternoons to do checkoffs.

Turnin

Please hand in documentation about your development following the format described in the handout Documenting a Software System.  We expect the following sections at a minimum.  (More details about what goes into each of these sections can be found in Documenting a Software System.  Note that the page ranges in that handout are aimed at group project writeups.  For Traffick, you should shoot for the page length shown below instead.) Your documentation should be turned in as hardcopy.  You can find your TA to hand it in, or you can hand it in by 9 pm to any TA in the W20 cluster.

Remember to check your code into CVS. If you made any changes to PS3, PS4, or PS5 code, make sure to commit those changes too, into your PS3/4/5 directory.

Grading

To help you allocate your effort, here is a rough breakdown of how this problem set will be graded:

25% requirements analysis
25% design
25% implementation
25% testing

On any of these parts, your grade will depend on the quality of your ideas as you present them. Documentation will not be graded separately, but work that is poorly documented will not be given much credit. This policy reflects good practice in industry and research: however good your ideas are, they are worth little if you cannot convey them to others.

Highly inefficient implementations that prevent your Traffick from achieving reasonable performance will receive only partial credit.

On the other hand, no extra credit will be given to implementations that are highly optimized beyond the minimum performance requirements.

Work on optional features will gain special credit. Such credit will be held in reserve and applied when we compute your final grade after we have ranked the students in the class. This will ensure that students who do not choose to work on optional features will not be penalized in any way, but students who do are rewarded appropriately.



Schedule

This problem set has many facets; to complete it on time, you must set a strict schedule for yourself. Here is one possible schedule, with milestones in bold.

Tuesday March 16 Problem set released.
Read problem set, and resolve any confusions or issues.
Implement graph building of road systems.
Start specifying additional components (traffic sources, monitors).
Thursday March 18 Problem analysis complete and in final form.
Test graph building on sample databases.
Start implementing new components for backend (traffic sources, monitors).
Experiment with larger databases to pinpoint "hotspots" in code which may require implementation changes.
Friday, March 19
Required checkpoint.
Meet with a TA on Thursday or Friday to demonstrate your graph-loading code.
Discuss your design, test cases, and schedule.
Monday March 29 Initial implementation complete.
All bugs from ps3-ps5 fixed.
You should have already met with a TA to discuss design and possible areas for improvement.
Start work on programmatic interface and test drivers.
Make implementation changes for performance; test on nantucket database.
Wednesday March 31
Entire system with programmatic interface tested and documented.
Start design and implementation of graphical user interface.
Run experiments on larger databases.
Sunday April 4
Entire system with all interfaces tested and documented.
Polish and proofread documentation.
Tuesday April 6
Project submitted.
Hand in your documentation as hardcopy by 9 pm to a TA in the W20 cluster.
Commit your code to CVS by 9 pm.


Hints

Efficiency

Java Heap Size

You will probably need to increase the Java heap size in order to load the entire boston-cambridge database and handle queries on it. Increasing the maximum Java heap size can be done in Java 1.4 with the command line argument -Xmxsize. We recommend a 256-megabyte heap for Traffick; thus the argument would be -Xmx256M. In Eclipse, you specify these options in the VM Args section of the Run dialog.  For more information on this and other non-standard options to the Java virtual machine, run java -X from the command line.

Choice of Representations

The representation of an abstract type can dramatically affect performance and scalability. Even with correct implementations of the modules you built in the earlier problem sets, you may be unable to assemble a Traffick program that can load the entire database and respond to queries in a reasonable time. Here are some hints to help you identify where you might have problems, and how to overcome them.

Hash Functions

Objects that are used as keys in hash tables need reasonable hash functions to prevent collisions. In the worst case (for example, a hash function that always returns the same value), lookups in a hash table degenerate from unit cost to linear cost. Advice on constructing reasonable hash functions can be found in the Introduction to Algorithms (CLR) textbook or the Effective Java text.

In short, there are three properties which a good hash function should have:

  1. It must obey the hash code invariant. a.equals(b) ==> a.hashCode()==b.hashCode().
  2. It is nice if it has good coverage. a.hashCode()==b.hashCode() usually means a.equals(b).
  3. It is important that hashCode be fast. You may want to consider storing the hash code of an object in a private field so your hashCode method can simply return it.

Miscellaneous

Newer versions of the Java platform (such as those used on Athena) have a relaxed specification for the behavior of floating point operations. This should not affect you in most cases (it is one reason you were told to use ints to represent latitude and longitude), but can possibly cause your code to arrive at different numeric results depending on the platform you are using. It can also, in certain cases, cause your program to arrive at different results over a single run of your code. To avoid all of these problems use the strictfp keyword in front of class in all classes which perform floating point arithmetic; for instance,

public strictfp class RoadSegment {
...
}


Q & A

This section lists clarifications and answers to common questions about the problem set.

Q: The graph-building pseudocode says that I need to create two lanes for each RoadSegment returned by the RoadSystemReader, but the RoadSystemReader's documentation says it already produces two segments for each street. Which is right?
A: RoadSystemReader is right: it returns two segments for each street, one for each direction of traffic. The pseudocode above has been changed to reflect that.
 
Q: May I use a graphics toolkit other than Swing for ps6, such as SWT?
A: No, you must use Swing for ps6. If you would like to use something else for your final project, then talk to your final project TA when the time comes.
 
Q: Did you mean to write makeSimulation() in the overview for Traffick instead of makeTraffick()?
A: Yes, yes we did.
 
Q: How do we make our application not look like it was written in Java?
A: Swing has a number of different "skins" for the different platforms, including Mac and Windows. To get your application to use the most appropriate skin for the platform on which it is running, add the following block of code to your main GUI class:
try {
	String lookAndFeelClass =
		UIManager.getSystemLookAndFeelClassName();
		UIManager.setLookAndFeel(lookAndFeelClass);
} catch (Exception e) {
	// if an exception is thrown,
	// then the default (metallic) look and feel will be used
	// it is pretty unlikely that this will happen
	e.printStackTrace();
}
This will improve the usability of your application, as it improves external consistency with the platform on which it is running, making it more familiar to users.
 


Errata


Back to the Problem sets page.
For problems or questions regarding this page, contact: 6.170-webmaster@mit.edu.
$Id: ps6.html,v 1.54 2004/04/04 16:39:14 mbolin Exp $