6

6.338 Project Proposal:

A Resource Metering System (RMS) for Grid Services

Siddhartha Sen <sidsen@mit.edu>

Background

Grid Computing has emerged as a new field in distributed computing that addresses the so-called "Grid problem", which is defined as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources [1]. Foster et. al. discuss the motivation behind Grid technologies and present an open Grid architecture in [1], which they formalize in [2] as the Open Grid Services Architecture (OGSA). The OGSA defines a uniform exposed service semantics called the Grid service, which is essentially a Web service that conforms to a set of (WSDL) interfaces and behaviors that define how clients interact with it.

Computational grid environments are generally discussed at the granularity of virtual organizations (VOs), or dynamic ensembles of resources, services, and people [2]. VOs are largely defined by the Grid services they operate and share. These services, in turn, tend to be highly decentralized and distributed over a variety of software, hardware, and human resources. The nature of this distribution places a burden on application developers to deliver qualities of service (QoS) across a collection of resources with heterogeneous and dynamic characteristics. The measurement of QoS—whether in terms of resource management or otherwise—is impossible without the presence of accurate resource metrics.

Motivation

For this reason, it is important to specify a mechanism for collecting resource metering information, or metrics, for tasks being performed by Grid services. A task here is analogous to a unit of work—as defined in DMTF’s Common Information Model (CIM) Metrics white paper [4]—and is the response of a Grid service to a request issued by a client. There is also a need to specify a means for publishing or exposing resource metrics to other Grid services within the same VO (or in other VOs, as the case may be). The availability of these metrics could benefit both the service providers as well as the clients. For example, a given VO may have one or more management services that perform internal accounting of the resources consumed (CPU time, memory, etc.) for each unit of work performed by the other (Grid) services. This data can then be used to generate performance reports for each service and determine whether any resources need to be reallocated to achieve the desired QoS. The VO may also have other management services that use the same resource metrics to provide useful feedback to clients—for example, informing a client of the amount of computing time a particular request took and generating a corresponding billing report.

Proposal

Currently, many standards and specifications exist or have been proposed for addressing different aspects of the resource metering problem, like APIs that applications can use to report transaction response times or data models for representing managed resources. This project will attempt to unify the different standards and technologies by presenting a complete Resource Metering System (RMS) for Grid services operating in a VO. The RMS provides a mechanism for collecting, publishing, and analyzing resource metrics for each unit of work performed by a Grid service. The value in designing a top-down solution like this one is three fold: 1) we can see how the different standards for resource management can be used to solve a real-world problem in Grid computing environments; 2) we can identify problems or inadequacies in the standards that are exposed when they are combined together; and 3) we can use the design as the basis for a real implementation that can be tested in existing VOs.

References

[1] Foster, I., Kesselman, C. and Tuecke, S. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 15 (3). 200-222. 2001. Available online: http://www.globus.org/research/papers/anatomy.pdf.

[2] Foster, I., Kesselman, C., Nick, J. and Tuecke, S. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project, 2002. Available online: http://www.globus.org/research/papers/ogsa.pdf.

[3] Definition of Architecture, Technical Plan and Evaluation Criteria for Scheduling, Resource Management, Security and Job Description. DataGrid WP1: Workload Management. Available online:

http://server11.infn.it/workload-grid/docs/wp1-pm9.pdf

[4] Common Information Model (CIM) Metrics Model, Version 2.6. Distributed Management Task Force, Inc. (DMTF), 2002. Available Online: http://www.dmtf.org/standards/documents/CIM/DSP0141.pdf.