6

6.338 2^nd Progress Report:

Paraellelizing Incremental Bayesian Segmentation

Joseph Hastings <jrh@mit.edu>

(in collaboration with Siddhartha Sen <sidsen@mit.edu>)

Completed Work

We have managed to port the IBS algorithm from its original state (Java, LISP, and Perl) to C++. This took more time than expected because of the added complexity of reading and parsing the time-series data file in C++. In addition, there were a number of bugs related to memory allocation and leaks that had to be resolved. When the ported code was finally debugged, we were able to run the training phase of the algorithm on time-series data representing the standard system call transitions that occur while sending mail. We made various attempts to profile the training code and figure out where the bottlenecks were, so that we would know where to focus our parallelization efforts.

We identified the check_out_process method (in particular the compute_marginal_likelihood method that it calls) to be the bottleneck in the training code. We have brainstormed ideas and sketched out our plans for parallelizing this section of code using both MPI and Cilk, as outlined in our project proposal.

Problems/Setbacks

We encountered a number of problems and unexpected issues while performing the tasks described above. These include the following:

n Time-series data file parsing was more complicated in C++ (as compared to Java) than we expected

n A large number of memory allocation/leakage errors had to be dealt with

n Profiling the training code was difficult, given our lack of experience in UNIX profiling in general; several methods were attempted, including timing simulated calls to targeted methods and using the UNIX profil system call—which is a nightmare to interpret by itself)

n Data distribution will be a key factor in determining the speedup gained through parallelization; we believe it will be worthwhile to have all nodes build their own processes lists from data being read and distributed by the main node

n In order to convert the IBS code to Cilk, it has to be written in C first; this will make data file parsing even more complicated

n Cilk uses shared memory to handle global variables (e.g. the processes list), so we will have to test the performance of the Cilk version of our training code on a shared-memory multiprocessor machine

Work to be Done

Mainly, what remains to be done is the actual parallelization of the IBS code using MPI and Cilk. The former method will take more time to implement because of the manual distribution of data and the communication that has to be arbitrated via MPI calls. Converting the training code to Cilk should not be difficult once we have successfully converted it to C first; we also don’t have to worry about distributing data for the global processes list because the Cilk code will be run on a shared-memory multiprocessor machine (and the Cilk runtime system will make the designated global variables accessible to all nodes).

After we have implemented both MPI and Cilk versions of the training code, we will run them on the input data files and compare their performance with each other and the original (serial) versions written in Java and C++.