The Problem: You are given an nxn matrix A and asked to
compute all its powers (up to l). That is, you want A, A^2, ...,
A^l. You are to implement parallel solutions to this problem in MPI
and in OpenMP and measure their performance. Check the on-line notes and readings for more information on parallel prefix if you need a review.
NOTE: If your last name begins with a letter from A-M, do the MPI version first; otherwise, do the OpenMP version first. Also be sure to run the script mentioned above if you are participating in the supercomputing productivity survey.
The boilerplate code assumes that an input file (specifying the size and contents of the matrix whose powers are being computed) and an integer specifying the highest power are provided on the command line. The input file format is simply a number indicating the number of dimensions of the matrix followed by the matrix contents. The boilerplate code produces an output log (in e.g. mpi-dimension-power-num-processors.out, so running the MPI code on a 4x4 matrix to the 8th power on 2 processors would yield mpi-4-8-2.out) containing the elapsed time for the guts of the computation and the highest power. (Note: Although your code is only outputting the highest power, it will be very clear to us if the intermediate powers are not computed when we examine your source.)
We have provided code to allocate, read and print matrices,
represented as double arrays. We have also provided preprocessor
macros to get and set their contents. You are free to use or ignore
these. You are free to use the naive O(n^3) algorithm for doing serial matrix multiplication, or come up with something more clever instead.
For each n, produce clear plots (you may decide how to present the results) showing how performance of your OpenMP and MPI implementations scales as a function of power and number of processors. Note that the more the power dwarfs the number of processors, the more the initial O(n/p) term will dominate the running time. Clearly explain any salient performance trends you see.
Note that your OpenMP tests will probably be very quick, since you can only really choose 2 or 4 processes. Be sure to exercise the full capacity of the cluster for your MPI tests, however.