6.338 Project Proposal:
A Resource Metering System (RMS) for
Grid Services
Siddhartha
Sen <sidsen@mit.edu>
Background
Grid Computing has emerged as
a new field in distributed computing that addresses the so-called "Grid
problem", which is defined as flexible, secure, coordinated resource
sharing among dynamic collections of individuals, institutions, and resources
[1]. Foster et. al. discuss the motivation behind Grid technologies and present
an open Grid architecture in [1], which they formalize in [2] as the Open Grid
Services Architecture (OGSA). The OGSA defines a uniform exposed service
semantics called the Grid service, which is essentially a Web service that
conforms to a set of (WSDL) interfaces and behaviors that define how clients
interact with it.
Computational grid
environments are generally discussed at the granularity of virtual
organizations (VOs), or dynamic ensembles of resources, services, and
people [2]. VOs are largely defined by the Grid services they operate and
share. These services, in turn, tend to be highly decentralized and distributed
over a variety of software, hardware, and human resources. The nature of this
distribution places a burden on application developers to deliver qualities of
service (QoS) across a collection of resources with heterogeneous and dynamic
characteristics. The measurement of QoS—whether in terms of resource management
or otherwise—is impossible without the presence of accurate resource metrics.
Motivation
For this reason, it is
important to specify a mechanism for collecting resource metering information,
or metrics, for tasks being performed by Grid services. A task here is
analogous to a unit of work—as defined in DMTF’s Common Information Model (CIM) Metrics
white paper [4]—and is the response of a
Grid service to a request issued by a client. There is also a need to specify a
means for publishing or exposing resource metrics to other Grid services within
the same VO (or in other VOs, as the case may be). The availability of these
metrics could benefit both the service providers as well as the clients. For
example, a given VO may have one or more management services that perform
internal accounting of the resources consumed (CPU time, memory, etc.) for each
unit of work performed by the other (Grid) services. This data can then be used
to generate performance reports for each service and determine whether any
resources need to be reallocated to achieve the desired QoS. The VO may also
have other management services that use the same resource metrics to provide
useful feedback to clients—for example, informing a client of the amount of
computing time a particular request took and generating a corresponding billing
report.
Proposal
Currently, many standards and
specifications exist or have been proposed for addressing different aspects of
the resource metering problem, like APIs that applications can use to report
transaction response times or data models for representing managed resources.
This project will attempt to unify the different standards and technologies by
presenting a complete Resource Metering System (RMS) for Grid services
operating in a VO. The RMS provides a mechanism for collecting, publishing, and
analyzing resource metrics for each unit of work performed by a Grid service.
The value in designing a top-down solution like this one is three fold: 1) we
can see how the different standards for resource management can be used to
solve a real-world problem in Grid computing environments; 2) we can identify
problems or inadequacies in the standards that are exposed when they are
combined together; and 3) we can use the design as the basis for a real
implementation that can be tested in existing VOs.
References
[1] Foster,
[2] Foster,
[3] Definition of
Architecture, Technical Plan and Evaluation Criteria for Scheduling, Resource
Management, Security and Job Description. DataGrid WP1: Workload
Management. Available online:
http://server11.infn.it/workload-grid/docs/wp1-pm9.pdf
[4] Common Information Model
(CIM) Metrics Model, Version 2.6. Distributed Management Task Force, Inc.
(DMTF), 2002. Available Online:
http://www.dmtf.org/standards/documents/CIM/DSP0141.pdf.