scispace - formally typeset
Search or ask a question
Topic

Data grid

About: Data grid is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 50400 citations.


Papers
More filters
Journal ArticleDOI
01 Aug 2001
TL;DR: The authors present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing.
Abstract: "Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high performance orientation. In this article, the authors define this new field. First, they review the "Grid problem," which is defined as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources--what is referred to as virtual organizations. In such settings, unique authentication, authorization, resource access, resource discovery, and other challenges are encountered. It is this class of problem that is addressed by Grid technologies. Next, the authors present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. The authors describe requirements that they believe any such mechanisms must satisfy and discuss the importance of defining a compact set of intergrid protocols to enable interoperability among different Grid systems. Finally, the authors discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. They maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

6,716 citations

Book
01 Jan 2003
TL;DR: The Grid Computing: Features contributions from the major players in the field Covers all aspects of grid technology from motivation to applications provided an extensive state-of-the-art guide in grid computing as mentioned in this paper.
Abstract: From the Publisher: Grid computing is applying the resources of many computers in a network to a single problem at the same time Grid computing appears to be a promising trend for three reasons: (1) Its ability to make more cost-effective use of a given amount of computer resources, (2) As a way to solve problems that can't be approached without an enormous amount of computing power (3) Because it suggests that the resources of many computers can be cooperatively and perhaps synergistically harnessed and managed as a collaboration toward a common objective. A number of corporations, professional groups, university consortiums, and other groups have developed or are developing frameworks and software for managing grid computing projects. The European Community (EU) is sponsoring a project for a grid for high-energy physics, earth observation, and biology applications. In the United States, the National Technology Grid is prototyping a computational grid for infrastructure and an access grid for people. Sun Microsystems offers Grid Engine software. Described as a distributed resource management tool, Grid Engine allows engineers at companies like Sony and Synopsys to pool the computer cycles on up to 80 workstations at a time. "the Grid" is a very hot topic generating broad interest from research and industry (e.g. IBM, Platform, Avaki, Entropia, Sun, HP) Grid architecture enables very popular e-Science projects like the Genome project which demand global interaction and networking In recent surveys over 500f Chief Information Officers are expected to use Grid technology this year Grid Computing: Features contributions from the major players in the field Covers all aspects of grid technology from motivation to applications Provides an extensive state-of-the-art guide in grid computing This is essential reading for researchers in Computing and Engineering, physicists, statisticians, engineers and mathematicians and IT policy makers.

1,401 citations

Journal ArticleDOI
TL;DR: In this paper, the authors introduce design principles for a data management architecture called the data grid, and describe two basic services that are fundamental to the design of a data grid: storage systems and metadata management.

1,198 citations

Proceedings ArticleDOI
24 Jul 2002
TL;DR: The Chimera virtual data system is developed, which combines avirtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database.
Abstract: A lot of scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data"). To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. We couple the Chimera system with distributed "data grid" services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to two challenge problems, the reconstruction of simulated collision event data from a high-energy physics experiment, and searching digital sky survey data for galactic clusters, with promising results.

665 citations

Journal ArticleDOI
01 May 2002
TL;DR: A high-speed transport service that extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access and a replica management service that integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas.
Abstract: An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

633 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Cloud computing
156.4K papers, 1.9M citations
82% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Network packet
159.7K papers, 2.2M citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232
20226
20216
202020
201921
201827