Homework 1: Parallel Matrix Power (via Prefix)

Due Date: 3/15 by 11:59PM

Notes:
All files needed are available here.

The Problem: You are given an nxn matrix A and asked to compute all its powers (up to l). That is, you want A, A^2, ..., A^l. You are to implement parallel solutions to this problem in MPI and in OpenMP and measure their performance. Check the on-line notes and readings for more information on parallel prefix if you need a review.

Question 1 - Basic Implementation

Fill in PrefixMPI.c and PrefixOpenMP.c such that they function correctly on the 5x5 identity matrix and the 5x5 swap matrix. Verify this by running them to compute the 100th powers of each.

NOTE: If your last name begins with a letter from A-M, do the MPI version first; otherwise, do the OpenMP version first. Also be sure to run the script mentioned above if you are participating in the supercomputing productivity survey.

The boilerplate code assumes that an input file (specifying the size and contents of the matrix whose powers are being computed) and an integer specifying the highest power are provided on the command line. The input file format is simply a number indicating the number of dimensions of the matrix followed by the matrix contents. The boilerplate code produces an output log (in e.g. mpi-dimension-power-num-processors.out, so running the MPI code on a 4x4 matrix to the 8th power on 2 processors would yield mpi-4-8-2.out) containing the elapsed time for the guts of the computation and the highest power. (Note: Although your code is only outputting the highest power, it will be very clear to us if the intermediate powers are not computed when we examine your source.)

We have provided code to allocate, read and print matrices, represented as double arrays. We have also provided preprocessor macros to get and set their contents. You are free to use or ignore these. You are free to use the naive O(n^3) algorithm for doing serial matrix multiplication, or come up with something more clever instead.

Question 2 - Performance Measurements

Construct test matrices (e.g. via randn and save from matlab) of sizes n=50, n=100 and n=500. Name these test-50.mat, test-100.mat and test-500.mat respectively. Also produce their 10th powers in files test-n-10.mat via a trusted program.

For each n, produce clear plots (you may decide how to present the results) showing how performance of your OpenMP and MPI implementations scales as a function of power and number of processors. Note that the more the power dwarfs the number of processors, the more the initial O(n/p) term will dominate the running time. Clearly explain any salient performance trends you see.

Note that your OpenMP tests will probably be very quick, since you can only really choose 2 or 4 processes. Be sure to exercise the full capacity of the cluster for your MPI tests, however.