scispace - formally typeset
Search or ask a question

Showing papers by "Carl Kesselman published in 2002"


01 Jan 2002
TL;DR: This presentation complements an earlier foundational article, “The Anatomy of the Grid,” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how the architecture can be applied within commercial computing as a basis for distributed system integration.
Abstract: In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. We present an Open Grid Services Architecture that addresses these challenges. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. Our presentation complements an earlier foundational article, “The Anatomy of the Grid,” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how our architecture can be applied within commercial computing as a basis for distributed system integration—within and across organizational domains. This is a DRAFT document and continues to be revised. The latest version can be found at http://www.globus.org/research/papers/ogsa.pdf. Please send comments to foster@mcs.anl.gov, carl@isi.edu, jnick@us.ibm.com, tuecke@mcs.anl.gov Physiology of the Grid 2

3,455 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on the nature of the services that respond to protocol messages and propose a set of services that can be aggregated in various ways to meet the needs of virtual organizations, which themselves can be defined by the services they operate and share.
Abstract: Increasingly, computing addresses collaboration, data sharing, and interaction modes that involve distributed resources, resulting in an increased focus on the interconnection of systems both within and across enterprises. These evolutionary pressures have led to the development of Grid technologies. The authors' work focuses on the nature of the services that respond to protocol messages. Grid provides an extensible set of services that can be aggregated in various ways to meet the needs of virtual organizations, which themselves can be defined in part by the services they operate and share.

1,816 citations


Proceedings ArticleDOI
05 Jun 2002
TL;DR: This approach allows resource providers to delegate some of the authority for maintaining fine-grained access control policies to communities, while still maintaining ultimate control over their resources.
Abstract: In "grids" and "collaboratories", we find distributed communities of resource providers and resource consumers, within which often complex and dynamic policies govern who can use which resources for which purpose. We propose a new approach to the representation, maintenance and enforcement of such policies that provides a scalable mechanism for specifying and enforcing these policies. Our approach allows resource providers to delegate some of the authority for maintaining fine-grained access control policies to communities, while still maintaining ultimate control over their resources. We also describe a prototype implementation of this approach and an application in a data management context.

665 citations


Journal ArticleDOI
01 May 2002
TL;DR: A high-speed transport service that extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access and a replica management service that integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas.
Abstract: An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

633 citations


Proceedings ArticleDOI
16 Nov 2002
TL;DR: A parameterized architectural framework is described, which is name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined, and initial performance results for an RLS prototype are presented, demonstrating that RLS systems can be constructed that meet performance goals.
Abstract: In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.

440 citations


Book ChapterDOI
24 Jul 2002
TL;DR: A resource management model is defined that distinguishes three kinds of resource-independent service level agreements (SLAs), formalizingag reements to deliver capability, perform activities, and bind activities to capabilities, respectively.
Abstract: A fundamental problem in distributed computing is to map activities such as computation or data transfer onto resources that meet requirements for performance, cost, security, or other quality of service metrics. The creation of such mappings requires negotiation among application and resources to discover, reserve, acquire, configure, and monitor resources. Current resource management approaches tend to specialize for specific resource classes, and address coordination across resources only in a limited fashion. We present a new approach that overcomes these difficulties.We define a resource management model that distinguishes three kinds of resource-independent service level agreements (SLAs), formalizingag reements to deliver capability, perform activities, and bind activities to capabilities, respectively. We also define a Service Negotiation and Acquisition Protocol (SNAP) that supports reliable management of remote SLAs. Finally, we explain how SNAP can be deployed within the context of the Globus Toolkit.

426 citations


01 Jan 2002
TL;DR: Technical details are provided, providing a full specification of the behaviors and Web Service Definition Language (WSDL) interfaces that define a Grid service.
Abstract: Building on both Grid and Web services technologies, the Open Grid Services Architecture (OGSA) defines mechanisms for creating, managing, and exchanging information among entities called Grid services. Succinctly, a Grid service is a Web service that conforms to a set of conventions (interfaces and behaviors) that define how a client interacts with a Grid service. These conventions, and other OGSA mechanisms associated with Grid service creation and discovery, provide for the controlled, fault resilient, and secure management of the distributed and often long-lived state that is commonly required in advanced distributed applications. In a separate document, we have presented in detail the motivation, requirements, structure, and applications that underlie OGSA. Here we focus on technical details, providing a full specification of the behaviors and Web Service Definition Language (WSDL) interfaces that define a Grid service.

221 citations


Proceedings Article
01 Jan 2002
TL;DR: The design of a Metadata Catalog Service (MCS) is presented that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attributes and a scalability study of the service is presented.
Abstract: Advances in computational, storage and network technologies as well as middle ware such as the Globus Toolkit allow scientists to expand the sophistication and scope of data-intensive applications. These applications produce and analyze terabytes and petabytes of data that are distributed in millions of files or objects. To manage these large data sets efficiently, metadata or descriptive information about the data needs to be managed. There are various types of metadata, and it is likely that a range of metadata services will exist in Grid environments that are specialized for particular types of metadata cataloguing and discovery. In this paper, we present the design of a Metadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attributes. We describe our experience in using the MCS with several applications and present a scalability study of the service.

177 citations


Proceedings ArticleDOI
24 Jul 2002
TL;DR: The initial design and prototype of a virtual data Grid for LIGO, which is being built to observe the gravitational waves predicted by general relativity, is described.
Abstract: Many Physics experiments today generate large volumes of data. That data is then processed in a variety of ways in order to achieve the understanding of fundamental physical phenomena. The goal of the NSF-funded GriPhyN project (Grid Physics Network) is to enable scientists to seamlessly access data whether it is raw experimental data or a data product which is a result of further processing. GriPhyN provides a new degree of transparency in how data-handling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN refers to the set of all data products available to the user as virtual data. Among the physics applications participating in the project is the Laser Interferometer Gravitational-wave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. We describe our initial design and prototype of a virtual data Grid for LIGO.

150 citations


Proceedings ArticleDOI
29 Jan 2002
TL;DR: These components leverage the substantial body of “Grid” services and protocols developed within the Globus project and by its collaborators, and are being used in a number of data-intensive application projects.
Abstract: We describe work being performed in the Globus project to develop enabling protocols and services for distributed data-intensive science. These services include: * High-performance, secure data transfer protocols based on FTP, plus a range of libraries and tools that use these protocols * Replica catalog services supporting the creation and location of file replicas in distributed systems These components leverage the substantial body of “Grid” services and protocols developed within the Globus project and by its collaborators, and are being used in a number of data-intensive application projects.

90 citations


01 Jan 2002
TL;DR: Motivation Grid computing has made great progress in the last few years, but it is becoming increasingly necessary to develop higher level services which can automate the process and provide an adequate level of performance and reliability.
Abstract: 1 Motivation Grid computing has made great progress in the last few years. The basic mechanisms for accessing remote resources have been developed as part of the Globus Toolkit and are now widely deployed and used. Among such mechanisms are: § Information services, which allow for the discovery and monitoring of resources. The information provided can be used to find the available resources and select the resources which are the most appropriate for the task. § Security services, which allow users and resources to mutually authenticate and allows the resources to authorize users based on local policies. § Resource management, which allows for the scheduling of jobs on particular resources. § Data management services, which enable users and applications to manage large, distributed and replicated data sets. Some of the available services deal with locating particular data sets, others with efficiently moving large amounts of data across wide area networks. With the use of the above mechanisms, one can manually find out about the resources and schedule the desired computations and data movements. However, this process is time consuming and can potentially be complex. As the result it is becoming increasingly necessary to develop higher level services which can automate the process and provide an adequate level of performance and reliability.

01 Jan 2002
TL;DR: The Service Negotiation and Acquisition Protocol (SNAP) as mentioned in this paper is a generalized resource management model in which resource interactions are mapped onto a well defined set of symmetric and resource independent service level agreements.
Abstract: A fundamental problem with distributed applications is how to map activities such as computation or data transfer onto a set of resources that will meet the application’s requirement for performance, cost, security, or other quality of service metrics. An application or client must engage in a multi-phase negotiation process with resource managers, as it discovers, reserves, acquires, configures, monitors, and potentially renegotiates resource access. We present a generalized resource management model in which resource interactions are mapped onto a well defined set of symmetric and resource independent service level agreements. We instantiate this model in (the Service Negotiation and Acquisition Protocol (SNAP) which provides integrated support for lifetime management and an at-most-once creation semantics for SLAs. The result is a resource management framework for distributed systems that we believe is more powerful and general than current approaches. We explain how SNAP can be deployed within the context of the Globus Toolkit.

Book ChapterDOI
09 Jun 2002
TL;DR: This talk will introduce the Grid concept and illustrate it with application examples from a range of scientific disciplines, and explore some of these potential areas of Semantic Web technologies, identifying those that I think offer the most potential.
Abstract: Grids are an emerging computational infrastructure that enables resource sharing and coordinated problem solving across dynamic, distributed collaborations that have come to be known as virtual organizations. Unlike the web, which primarily focuses on the sharing of information, the Grid provides a range of fundamental mechanisms for sharing diverse types of resource, such as computers, storage, data, software, and scientific instruments. In this talk, I will introduce the Grid concept and illustrate it with application examples from a range of scientific disciplines. It is likely that technology that is being developed for the Semantic Web will have important roles to play in Grid Services; I will explore some of these potential areas of Semantic Web technologies, identifying those that I think offer the most potential.

Book ChapterDOI
01 Jan 2002
TL;DR: Wir beschreiben die Voraussetzungen, die Mechanismen dieser Art unseres Erachtens erfullen mussen und erortern, wie wichtig es ist, eine kom- pakte Familie of Integrid-Protokollen zu definieren, die fur die Interoperabilitat der verschiedenen Grid-Systeme sorgen.
Abstract: „Grid Computing“ hat sich als wichtiger neuer Bereich etabliert, der sich dadurch vom konventionellen „Distributed Computing“ unterscheidet, dass es hier primar um den gemeinsamen Zugriff auf sehr groβe Ressourcenpools geht, die innovative Applikationen und in manchen Fallen eine hoch performante Orientierung bieten. In diesem Artikel wollen wir diesen neuen Sektor definieren, wobei wir uns zunachst das „Grid-Problem“ ansehen, das wir als flexiblen, sicheren und koordinierten Zugriff auf gemeinsame Ressourcen in dynamischen Gruppen von Personen, Institutionen und Ressourcen definieren, die wir im Folgenden als vir- tuelle Organisation bezeichnen werden. In Szenarien dieser Art befassen wir uns mit Themen wie die eindeutige Authentifizierung und Autorisierung, den Zugriff auf und die Entdeckung von Ressourcen und anderen Herausforderungen. Und gerade fur diese Klasse von Problemen bietet die Grid-Technologie Losungsan- satze. Als nachstes stellen wir eine skalierbare und offene Grid-Architektur dar, die Protokolle, Services, Application Program Interfaces und Software Develop- ment Kits anhand ihrer Rolle bei der Realisierung des Ressourcen-Sharing kata- logisiert werden. Wir beschreiben die Voraussetzungen, die Mechanismen dieser Art unseres Erachtens erfullen mussen und erortern, wie wichtig es ist, eine kom- pakte Familie von Integrid-Protokollen zu definieren, die fur die Interoperabilitat der verschiedenen Grid-Systeme sorgen. Zum Schluss beschreiben wir, wie Grid- Technologien mit anderen aktuellen Technologien wie die unternehmensweite Integration, Application-Service-Providing, Storage-Service-Providing und Peer-to- Peer-Computing zusammenhangen. Wir sind der Auffassung, dass Grid-Konzepteund Technologien diese anderen Ansatze nicht nur erganzen, sondern insgesamt aufwerten konnen.

Journal Article
TL;DR: The Grid Physics Network (GridPhyN) as mentioned in this paper is an NSF-funded project that aims to realize the concepts of Virtual Data and Virtual Data Grid, a concept that unifies the view of the data whether it is raw or derived.
Abstract: Many Physics experiments today generate large volumes of data. That data is then processed in many ways in order to achieve the understanding of fundamental physical phenomena. Virtual Data is a concept that unifies the view of the data whether it is raw or derived. It provides a new degree of transparency in how data-handling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN (Grid Physics Network) is a NSF-funded project, which aims to realize the concepts of Virtual Data. Among the physics applications participating in the project is the Laser Interferometer Gravitational-wave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. LIGO will produce large amounts of data, which are expected to reach hundreds of petabytes over the next decade. Large communities of scientists, distributed around the world, need to access parts of these datasets and perform efficient analysis on them. It is expected that the raw and processed data will be distributed among various national centers, university computing centers, and individual workstations. In this paper we describe some of the challenges associated with building Virtual Data Grids for experiments such as LIGO.