| 6.170 | Laboratory in Software Engineering Spring 2004 Problem Set 6: Traffick Checkpoint Due: Friday, March 19, 2004 Problem Set Due: Tuesday, April 6, 2004 at 9 pm |
Quick links:
Please read the recommended schedule before you begin.
Many students in previous terms became too focused on minor performance tweaking and optimization. Some optimizations are useful but most are not! Please read the section on when to optimize before attempting performance related modifications.
You are provided specifications in
HTML.
The HTML specifications are good for viewing or printing with a Web
browser, and they contain special formatting and hyperlinks, just like
the
Java API Specification. You can edit the starter Java files to complete your assignment without
having to retype the declarations.
Make sure you have followed the instructions on the tools handout relevant to directory setup before you begin development. Furthermore, be sure you have read the problem set procedure handout before you begin.
You can get the source files for problem set 6 by using CVS. Follow
the problem set checkout
instructions and checkout the ps6 module from your
repository.
Your code MUST work on Athena, so if you choose to work on a non-Athena machine, please take a few minutes before submitting your solution to ensure that it runs correctly on Athena. Once everything is in order, read the Problem Set Submission instructions for how to run a final check of your code and turn it in.
You should go about this problem set by imagining that it involves developing Traffick from scratch. Most places in this document that we say "Traffick does such-and-such." we mean "When you finish, Traffick will do such-and-such." You will hand in a self-contained project that includes a problem analysis, a design, module specifications, commented code, testing artifacts, and user documentation. You have already done much of this work. Even if you include work done previously, which we encourage you to do, you may want to improve on it, given the expertise and understanding you have acquired during the semester. For example, if you chose representations of abstract data types that are not efficient enough, you may need to rework them. The representation you chose for BipartiteGraph, for example, may not scale well to large graphs.
You are not required to use the modules you developed previously. If you like, you can start from a clean slate and develop Traffick from scratch. We strongly caution you, however, that this is a risky path to take. Even if you are talented enough to complete the assignment from scratch in two weeks, you may do better to assemble a version from what you have first, and only then, if it works, consider a different design. If you choose to do this, we urge you to discuss your new design with your TA; getting feedback on your ideas is essential to doing quality work and avoiding mistakes.
You have slightly over two weeks (counting spring break) for this problem set, but you should not underestimate the amount of work involved. It is essential that you pace yourself. The teaching staff can advise you if you run into difficulties, but not if you wait until the last minute to ask for help. See the suggested schedule below. To help you get started, you are required to individually meet with a TA by Friday to discuss your design and work-plan.
On startup, Traffick reads a collection of database files containing
a description of the road system. The description of a road
system includes:
The specification of RoadSystemReader refers to a
Tiger database directory. We have provided a collection of
directories holding Tiger databases for you to use. In your CVS
checkout, you will find three databases (all
subdirectories of ps6/traffickdb):
Some properties of the collections of RoadSegments you have worked with in the past will not hold with the RoadSegments produced from a Tiger Database. For example, RoadSegments in the Tiger database often share the same name (as opposed to having one RoadSegment named MassAve1, the next MassAve2, and so on). Many RoadSegments in the Tiger Database don't even have a name associated with them! In that case, the name will be the empty string. So be careful what you assume when working with the Tiger Database.
You will need to build a graph of lanes and intersections from the
streets
in the Tiger database. Here's an algorithm for doing that
efficiently.
// create Lanes
for each RoadSegment s returned by RoadSystemReader,
create a lane for s
add it to the graph
// create Intersections
let pointToIntersection be a Map (initially empty) that maps from a GeoPoint to the intersection at that point
for each intersection i returned by RoadSystemReader,
create an intersection for i
add it to the graph
add it to the pointToIntersection map
// connect Lanes and Intersections to each other
for each lane in the graph,
for each of the lane's end points,
look up the point in the pointToIntersection map
if an intersection already exists at the endpoint,
connect the lane to the intersection (as input or output depending on which lane endpoint it is)
if no intersection exists there,
create a cloverleaf
add it to the graph
add it to the pointToIntersection map
connect the cloverleaf to the lane
Note that the RoadSystemReader currently ignores the issue of one-way streets. It assumes that all RoadSegments in the database allow bidirectional travel, and its iterator returns two lane descriptors for each segment: one for the segment, and another for its reverse.
Checkpoint: by this Friday, you must show a TA that your code can load a road system database. See below for more about the checkpoint.A traffic source produces new cars and injects them into the road system. All the cars produced by a given traffic source follow the same route through the road system. Traffic sources should be modeled as filters with no inputs and one output, which should be the first lane on the route.
Each source has an injection
delay, measured in seconds, which is the time interval
between the appearance of new cars from the traffic source. If
the source is ready to inject a new car, but the output lane doesn't
allow the car to enter it, the source must wait to inject the car until
it has a chance. Because of this potential wait, the injection
delay is technically a minimum
delay. Traffic on the lane may not allow the source to inject at
its desired rate. The injection delay is measured
between actual injections, so
once the source successfully injects a car, it must wait for the
specified interval before trying to inject another.
A source may
also have an optional limit,
which is
the maximum number of cars it injects. The default limit is
infinity. After injecting its limit, the source has no more
effect for the remainder of the simulation.
All cars created by the same source not only share the same route
but also the same color. They may share the same name, or have
similar names generated from a common root (e.g. "MIT 1", "MIT 2", ...)
Traffic sources are specified in an XML file format.
The staff provides
a class ps6.reader.TrafficSourceReader
that
reads this format. Please read its specification carefully before
you use it.
The GUI launched by main
must support:
The design of the graphical interface is relatively unconstrained. We advise that you build a minimal user interface and elaborate it only if you have the time and inclination.
We encourage you to use Swing,
which you will use for the Antichess/Gizmoball final project.
If you have never done any GUI programming and are not familiar with
Java graphics tools, don't panic!
The Swing Lab tutorial, written by
the 6.170 staff, explains how to create graphical windows using various
layout managers, how to paint into a window, and how to handle keyboard
and mouse input.
The staff provides a graphics library, ps6.graphics.TraffickPalette, that makes it easier to draw RoadSegments, intersections, and cars, and to determine which RoadSegment or intersection the user clicked on. You may use this graphics library if you wish, but you don't have to.
Your implementation should have reasonable performance. On an Athena Linux box, as configured in the Athena clusters and using a 256-megabyte Java heap, your implementation should meet the following performance requirements:
Note that wall clock time is the time that you spend waiting for your program to run the simulation, while simulated time is the sum of all the time slices the simulation executed.
These are conservative numbers. A program that is carefully designed and uses good representations of the datatypes you built in problem sets 3 to 5 should run an order of magnitude faster.
If you find that your implementation performs badly, don't start
optimizing
all over the place. If you do so, you're unlikely to improve the
performance
dramatically, and you'll probably introduce bugs into your code.
Instead, you
should figure out where the bottlenecks are, and develop a strategy for
handling
them. Read our advice on representations below.
Remember, after you implement improvements to your code, make sure to
run your regression
tests. (The Eclipse Run dialog makes it easy to run all the JUnit
tests in a project. Use Run >> Run..., make a new JUnit
test, select All Tests in Project, Source Folder, or Package, and click
Search to select the project. On the command line, "ant test" has
the same effect.)
Loading the Tiger databases into a BipartiteGraph abstraction can be a
major
bottleneck in your Traffick system. First of all, the databases are
fairly large files and file I/O is very slow compared to other computer
tasks. Also, many of your components from previous problem sets need to
be implemented in an efficient way in order to build the graph quickly.
You may find the provided TimeProfiler.java class to be useful. It is a simple abstraction that attempts to model a stopwatch in your code. You can tell the profiler to start timing and stop timing at any point, and it can print you some statistics using toString().
For determining how much memory your program uses, you can use the Unix top command or Windows Task Manager as your Traffick loads the Tiger database.
Load the small database and use the profiling techniques to figure out three pieces of information:
The first piece of data gives you a general idea of how much time your program spends on file I/O.
The second statistic allows you to figure out how much of the time is spent on building your graph. There may be lots of reasons for this to be a slow process. You might want to profile sub-parts of the system to deduce whether the slowdown comes from your graph methods or your building algorithm.
The memory usage is important because poor design choices from
earlier problem sets may be using too much memory for practical use.
Remember, you can increase the amount of memory Java uses by setting
the -Xmx option, as described in the Hints
section. You should be
able to load the small database in at most 256 Megabytes of memory
(many designs use much less).
"Premature optimization is the root of all evil." —Donald Knuth.
You should consider optimizing your program only after you have a working version and only if that version is too slow. Do not optimize as you go. If your program does not operate correctly, it does not matter how fast it runs. Do not mistake optimization for debugging. In fact, optimized programs are usually harder to understand and to debug because they break abstraction barriers in the name of efficiency. If you do decide to optimize, it is helpful to comment optimizations in your code and to keep the original, unoptimized versions backed up.
Finally, you should always run tests before and after you optimize. Run your normal test suite (for correctness) to make sure you didn't introduce any new bugs. Don't become too attached to an algorithm or assume that a program will always run faster with fewer lines of code. Run performance tests to objectively measure if your optimizations actually resulted in a net speed or size improvement.
Remember to check your code into CVS. If you made any changes to
PS3, PS4, or PS5 code, make sure to
commit those changes too, into your PS3/4/5 directory.
To help you allocate your effort, here is a rough
breakdown of how this problem set will be graded:
On any of these parts, your grade will depend on the quality of your ideas as you present them. Documentation will not be graded separately, but work that is poorly documented will not be given much credit. This policy reflects good practice in industry and research: however good your ideas are, they are worth little if you cannot convey them to others.
Highly inefficient implementations that prevent your Traffick from achieving reasonable performance will receive only partial credit.
On the other hand, no extra credit will be given to implementations that are highly optimized beyond the minimum performance requirements.
Work on optional features will gain special credit. Such credit will be held in reserve and applied when we compute your final grade after we have ranked the students in the class. This will ensure that students who do not choose to work on optional features will not be penalized in any way, but students who do are rewarded appropriately.
This problem set has many facets; to complete it on time, you must set a strict schedule for yourself. Here is one possible schedule, with milestones in bold.
| Tuesday March 16 | Problem set released. Read problem set, and resolve any confusions or issues. Implement graph building of road systems. Start specifying additional components (traffic sources, monitors). |
| Thursday March 18 | Problem analysis complete and in final form. Test graph building on sample databases. Start implementing new components for backend (traffic sources, monitors). Experiment with larger databases to pinpoint "hotspots" in code which may require implementation changes. |
| Friday, March 19 |
Required
checkpoint. Meet with a TA on Thursday or Friday to demonstrate your graph-loading code. Discuss your design, test cases, and schedule. |
| Monday March 29 | Initial implementation complete. All bugs from ps3-ps5 fixed. You should have already met with a TA to discuss design and possible areas for improvement. Start work on programmatic interface and test drivers. Make implementation changes for performance; test on nantucket database. |
| Wednesday March 31 |
Entire system with programmatic interface
tested and documented. Start design and implementation of graphical user interface. Run experiments on larger databases. |
| Sunday April 4 |
Entire system with all interfaces tested and documented. Polish and proofread documentation. |
| Tuesday April 6 |
Project submitted. Hand in your documentation as hardcopy by 9 pm to a TA in the W20 cluster. Commit your code to CVS by 9 pm. |
You will probably need to increase the Java heap size in order to load the entire boston-cambridge database and handle queries on it. Increasing the maximum Java heap size can be done in Java 1.4 with the command line argument -Xmxsize. We recommend a 256-megabyte heap for Traffick; thus the argument would be -Xmx256M. In Eclipse, you specify these options in the VM Args section of the Run dialog. For more information on this and other non-standard options to the Java virtual machine, run java -X from the command line.
The representation of an abstract type can dramatically affect performance and scalability. Even with correct implementations of the modules you built in the earlier problem sets, you may be unable to assemble a Traffick program that can load the entire database and respond to queries in a reasonable time. Here are some hints to help you identify where you might have problems, and how to overcome them.
Objects that are used as keys in hash tables need reasonable hash functions to prevent collisions. In the worst case (for example, a hash function that always returns the same value), lookups in a hash table degenerate from unit cost to linear cost. Advice on constructing reasonable hash functions can be found in the Introduction to Algorithms (CLR) textbook or the Effective Java text.
In short, there are three properties which a good hash function should have:
Newer versions of the Java platform (such as
those used on Athena) have a relaxed specification for the behavior of
floating point operations. This should not affect you in most cases (it
is
one reason you were told to use ints to represent latitude
and longitude),
but can possibly cause your code to arrive at
different numeric results depending on the platform you are using. It
can also, in certain cases, cause your program to arrive at different
results over a single run of your code. To avoid all of these
problems use the strictfp keyword in front of
class in all classes which perform floating point
arithmetic; for instance,
public strictfp class RoadSegment {
...
}
This section lists clarifications and answers to common questions about the problem set.
| Q: | The graph-building pseudocode says that I need to create two lanes for each RoadSegment returned by the RoadSystemReader, but the RoadSystemReader's documentation says it already produces two segments for each street. Which is right? |
| A: | RoadSystemReader is right: it returns two segments for each street, one for each direction of traffic. The pseudocode above has been changed to reflect that. |
| Q: | May I use a graphics toolkit other than Swing for ps6, such as SWT? |
| A: | No, you must use Swing for ps6. If you would like to use something else for your final project, then talk to your final project TA when the time comes. |
| Q: | Did you mean to write makeSimulation() in the overview for Traffick instead of makeTraffick()? |
| A: | Yes, yes we did. |
| Q: | How do we make our application not look like it was written in Java? |
| A: |
Swing has a number of different "skins" for the different platforms,
including Mac and Windows. To get your application to use the
most appropriate skin for the platform on which it is running,
add the following block of code to your main GUI class:
try {
String lookAndFeelClass =
UIManager.getSystemLookAndFeelClassName();
UIManager.setLookAndFeel(lookAndFeelClass);
} catch (Exception e) {
// if an exception is thrown,
// then the default (metallic) look and feel will be used
// it is pretty unlikely that this will happen
e.printStackTrace();
}
This will improve the usability of your application,
as it improves external consistency with the platform
on which it is running, making it more familiar to users.
|
closest = rs.getP1();The second such line should be changed to:
closest = rs.getP2(); // this is line number 186 if you haven't edited this classWe were going to try to secretly update everyone's code using CVS, but that seemed dangerous, so please just take a minute to make the fix yourself. We apologize for our error.