6.338 Final Project Writeup This project's goal was to allow Beowulf clusters at LCS and Course 12 to establish a connection between each other over the data grid and run MPI applications across the two. Background Globus (http://www.globus.org/) is a project to enable the development of computational grids. The main focus is security and the ability to authorize specific services between clusters. Authorized clusters can run applications across other clusters on the grid. An alternate model, called the Data Grid, allows users to run their own computations on local area with data that is distributed across clusters on the data grid. Building/Installing Globus comes with many different components whose purpose is to manage processing resources and data across clusters. Also needed is the sourcecode for the development kit so that we can compile and install applications that run on top of Globus. Requesting Certificates A "web of trust" is created using the Globus organization as a certificate authority. This includes both certificates for clusters and certificates for individual users. Running applications across clusters In most cases, there is no shared filesystem between clusters, and Globus relies on Resource Specification Language (RSL) scripts to execute commands across clusters. On a most basic level, RSL scripts allow one to execute commands on a remote machine, like rsh, but with support for Globus authentication. The Problems MPICH-G2 operates and compiles with the aid of the Globus SDK sourcecode. In most models regarding the way we use MPI, we imagine a scenario in which the root node distributes the data via MPI messages to the child nodes, receives back the results, and either completes the computation or re-distributes the intermediate results for further refinement by the child nodes. This model is consistent with the way we have structured the Beowulf cluster itself. There is a frontend which we log into and "children" behind the frontend with internal IP addresses. MPI, despite many poor programming practices, of which I, myself, am a prime offender, is not, at its base, a hierarchical message passing system. In fact, it depends on passing messages directly between all nodes on a level playing field. Thus, a node in one beowulf cluster must be able to pass MPI messages directly to any other node in any other cluster. Neither Globus nor MPICH-G2 has reached the level of maturity necessary to support clusters that use IP Masquerading. This would require the ability of messages to "tunnel" through the cluster frontends to the internal nodes. It is possible through IP tunneling through individual ports to access nodes hidden behind a firewall. However, this depends on having individual unique IP addresses within the cluster. While my dream is to see an MIT computational grid which students can "plug into" and exploit the resources of, doing so will create an unforseen side-effect. As it becomes more popular to hook up older machines to create Beowulf clusters, if we want to make them Grid-accessible, they will all require individual IP addresses. This will increase the rate of consumption of scarce IP addresses. Either the MPICH-G2 project or Globus must solve this problem via proxies, or creation of the Grid -- whose goal is to create an environment in which we can relieve and distribute the load on our computational resources -- will result in the overburdening and heavier taxation of our IP sources. -Dean