6.338 2nd
Progress Report:
Paraellelizing Incremental Bayesian Segmentation
Joseph
Hastings <jrh@mit.edu>
(in collaboration with Siddhartha Sen
<sidsen@mit.edu>)
Completed Work
We have managed to port the
IBS algorithm from its original state (Java, LISP, and Perl)
to C++. This took more time than expected because of the added complexity of
reading and parsing the time-series data file in C++. In addition, there were a
number of bugs related to memory allocation and leaks that had to be resolved.
When the ported code was finally debugged, we were able to run the training
phase of the algorithm on time-series data representing the standard system
call transitions that occur while sending mail. We made various attempts to
profile the training code and figure out where the bottlenecks were, so that we
would know where to focus our parallelization efforts.
We identified the check_out_process method (in particular the compute_marginal_likelihood method that it calls) to be the bottleneck in the
training code. We have brainstormed ideas and sketched out our plans for
parallelizing this section of code using both MPI and Cilk, as outlined in our
project proposal.
Problems/Setbacks
We encountered a number of
problems and unexpected issues while performing the tasks described above.
These include the following:
n
Time-series data
file parsing was more complicated in C++ (as compared to Java) than we expected
n
A large number of
memory allocation/leakage errors had to be dealt with
n
Profiling the
training code was difficult, given our lack of experience in UNIX profiling in
general; several methods were attempted, including timing simulated calls to
targeted methods and using the UNIX profil system call—which is a nightmare to interpret by
itself)
n
Data distribution
will be a key factor in determining the speedup gained through parallelization;
we believe it will be worthwhile to have all nodes build their own processes lists from data being read and distributed by the
main node
n
In order to
convert the IBS code to Cilk, it has to be written in C first; this will make
data file parsing even more complicated
n
Cilk uses shared
memory to handle global variables (e.g. the processes
list), so we will have to test the performance of the Cilk version of our
training code on a shared-memory multiprocessor machine
Work to be Done
Mainly, what remains to be
done is the actual parallelization of the IBS code using MPI and Cilk. The
former method will take more time to implement because of the manual
distribution of data and the communication that has to be arbitrated via MPI
calls. Converting the training code to Cilk should not be difficult once we
have successfully converted it to C first; we also don’t have to worry about
distributing data for the global processes list
because the Cilk code will be run on a shared-memory multiprocessor machine
(and the Cilk runtime system will make the designated global variables
accessible to all nodes).
After we have implemented both MPI and Cilk versions of the training code, we will run them on the input data files and compare their performance with each other and the original (serial) versions written in Java and C++.