scispace - formally typeset
Search or ask a question

Showing papers on "Data access published in 2002"


Proceedings ArticleDOI
07 Aug 2002
TL;DR: A novel paradigm for data management in which a third party service provider hosts "database as a service", providing its customers with seamless mechanisms to create, store, and access their databases at the host site is explored.
Abstract: We explore a novel paradigm for data management in which a third party service provider hosts "database as a service", providing its customers with seamless mechanisms to create, store, and access their databases at the host site. Such a model alleviates the need for organizations to purchase expensive hardware and software, deal with software upgrades, and hire professionals for administrative and maintenance tasks which are taken over by the service provider. We have developed and deployed a database service on the Internet, called NetDB2, which is in constant use. In a sense, a data management model supported by NetDB2 provides an effective mechanism for organizations to purchase data management as a service, thereby freeing them to concentrate on their core businesses. Among the primary challenges introduced by "database as a service" are the additional overhead of remote access to data, an infrastructure to guarantee data privacy, and user interface design for such a service. These issues are investigated. We identify data privacy as a particularly vital problem and propose alternative solutions based on data encryption. The paper is meant as a challenge for the database community to explore a rich set of research issues that arise in developing such a service.

707 citations


Proceedings ArticleDOI
24 Jul 2002
TL;DR: This work develops a family of algorithms and uses simulation studies to evaluate various combinations of these algorithms to suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling.
Abstract: In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation.

504 citations


Patent
14 Mar 2002
TL;DR: In this article, a schema-based service for Internet access to per-user services data is proposed, where access to data is based on each user's identity and each user manipulates (e.g., reads or writes) data in the logical document by data access requests through defined methods.
Abstract: A schema-based service for Internet access to per-user services data, wherein access to data is based on each user's identity. The service includes a schema that defines rules and a structure for each user's data, and also includes methods that provide access to the data in a defined way. The services schema thus corresponds to a logical document containing the data for each user. The user manipulates (e.g., reads or writes) data in the logical document by data access requests through defined methods. In one implementation, the services schemas are arranged as XML documents, and the services provide methods that control access to the data based on the requesting user's identification, defined role and scope for that role. In this way, data can be accessed by its owner, and shared to an extent determined by the owner.

430 citations


Journal Article
TL;DR: In this paper, a unified view of data handling in sensor networks, incorporating long-term storage, multi-resolution data access and spatio-temporal pattern mining, is presented.
Abstract: An important class of networked systems is emerging that involve very large numbers of small, low-power, wireless devices. These systems offer the ability to sense the environment densely, offering unprecedented opportunities for many scientific disciplines to observe the physical world. In this paper, we argue that a data handling architecture for these devices should incorporate their extreme resource constraints - energy, storage and processing - and spatio-temporal interpretation of the physical world in the design, cost model, and metrics of evaluation. We describe DIMENSIONS, a system that provides a unified view of data handling in sensor networks, incorporating long-term storage, multi-resolution data access and spatio-temporal pattern mining.

325 citations


Patent
30 Aug 2002
TL;DR: In this article, techniques for determining storage locations for data in a heterogeneous storage environment based upon storage policies configured for the storage environment are presented. But the data is stored in storage locations that enable efficient data access while optimizing the use of available storage resources with minimum human intervention.
Abstract: Automated techniques for storing data in a data storage environment. Techniques are provided for determining storage locations for data in a heterogeneous storage environment based upon storage policies configured for the storage environment. The data is stored in storage locations that enable efficient data access while optimizing the use of available storage resources with minimum human intervention.

269 citations


Patent
14 May 2002
TL;DR: In this article, the authors present a system and method whereby a local application (1-3) may interface with a single API (1)-3 and be automatically connected to the appropriate source of terminal location information.
Abstract: In one embodiment of the invention there is provided a system and method whereby a local application (1-3) may interface with a single API (1-3) and be automatically connected to the appropriate source of terminal location information. In another embodiment of the invention there is provided a system and method whereby a remote application (1-3) and/or web service may interface with a single API (1-3) and be automatically connected to the appropriate source of terminal location information. In another embodiment of the invention there is provided a system and method whereby a user can specify his privacy preferences to one database and be assured that his preferences would be adhered to by all location providing sources, thereby allowing the user to exact direct control over which applications (1-1) and web services have access to data concerning the location of his mobile.

255 citations


BookDOI
01 Jan 2002
TL;DR: DTSE in Programmable Architectures and Related Compiler Work on Data Tranfer and Storage Management, and Automated Data Reuse Exploration Techniques.
Abstract: 1. DTSE in Programmable Architectures. 2. Related Compiler Work on Data Tranfer and Storage Management. 3. Global Loop Transformations. 4. System-Level Storage Requirements Estimation. 5. Automated Data Reuse Exploration Techniques. 6. Storage Cycle Budget Distribution. 7. Cache Optimization. 8. Demonstrator Designs. 9. Conclusions and Future Work. References. Bibliography. Index.

226 citations


Patent
14 Jan 2002
TL;DR: In this article, a method and device for protecting a network by monitoring both incoming and outgoing data traffic on multiple ports of the network, and preventing transmission of unauthorized data across the ports is presented.
Abstract: A method and device for protecting a network by monitoring both incoming and outgoing data traffic (52) on multiple ports (56) of the network, and preventing transmission of unauthorized data across the ports (56). The monitoring system (50) is provided in a non-promiscuous mode and automatically denies access to data packets from a specific source based upon an associated rules table. The monitoring system (50) processes copies of the data packets resulting in minimal loss of throughput. The monitoring system (50) is also highly adaptable and provides for dynamic writing and issuing of firewall rules by updating the rule table (54). Information regarding the data packets (52) is captured, sorted and cataloged to determine attack profiles and unauthorized data packets.

223 citations


Patent
03 Jul 2002
TL;DR: The EPN Server as discussed by the authors system employs a secure peer network between data sources regardless of their location enabling data access devices to retrieve or submit data from any Internet enabled device from any location.
Abstract: A system for accessing data from any location and any device including those behind firewalls, proxy servers, address translations and other devices, while securing the data and network. The access may be by voice or wireless connection and the data may be PIM data such as calendaring or scheduling information or email. The system employs a secure peer network between data sources regardless of their location enabling data access devices to retrieve or submit data from any Internet enabled device from any location. Messages are tunneled to HTML that passes through firewalls. A Queue Manager in the EPN Server software creates a unique queue for data source which can only be accessed by the data source. The user with a browser enabled device can then access the EPN Server by providing the necessary credentials, such as user id and password, and can then access the data in the data sources for which the user is permissioned. The data source maintains a non-persistent connection through a polling algorithm and services the request in the queue.

220 citations


Proceedings ArticleDOI
23 Oct 2002
TL;DR: This work introduces a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system.
Abstract: Data grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we introduce a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Replication decisions are made based on a cost model that evaluates data access costs and performance gains of creating each replica. The estimation of costs and gains is based on factors such as run-time accumulated read/write statistics, response time, bandwidth, and replica size. To address scalability, replicas are organized in a combination of hierarchical and flat topologies that represent propagation graphs that minimize inter-replica communication costs. To evaluate our model we use the network simulator NS to study the impact of replication. Our results prove that replication improves the performance of data access on the data grid, and that the gain increases with the size of data used.

216 citations


Patent
03 Jul 2002
TL;DR: The EPN Server as mentioned in this paper system employs a secure peer network between data sources regardless of their location enabling data access devices to retrieve or submit data from any Internet enabled device from any location.
Abstract: A system for accessing data from any location and any device including those behind firewalls, proxy servers, address translations and other devices, while securing the data and network. The system employs a secure peer network between data sources regardless of their location enabling data access devices to retrieve or submit data from any Internet enabled device from any location. Messages are tunneled to HTML that passes through firewalls. A Queue Manager in the EPN Server software creates a unique queue for data source which can only be accessed by the data source. The user with a browser enabled device can then access the EPN Server by providing the necessary credentials, such as user id and password, and can then access the data in the data sources for which the user is permissioned. The data source maintains a non-persistent connection through a polling algorithm and services the request in the queue.

Journal ArticleDOI
TL;DR: Gridella improves the highly chaotic and inefficient Gnutella infrastructure with directed search and advanced concepts, thus enhancing efficiency and providing a model for further analysis and research.
Abstract: The authors present Gridella, a Gnutella-compatible P2P system. Gridella is based on the Peer-Grid (P-Grid) approach, which draws on research in distributed and cooperative information systems to provide a decentralized, scalable data access structure. Gridella improves the highly chaotic and inefficient Gnutella infrastructure with directed search and advanced concepts, thus enhancing efficiency and providing a model for further analysis and research.

Patent
15 Aug 2002
TL;DR: In this article, a system including spreadsheet sheets, making calculations and data transformations, which is available through a programming interface, and conforms to the grammar and syntax of a target software development language is presented.
Abstract: A system including spreadsheet sheets, makes calculations and data transformations, which is available through a programming interface, and conforms to the grammar and syntax of a target software development language is presented. The system includes an Object Model with Data Structures representing entities involved in spreadsheets. The system includes a Parser and Code Generator that extracts data from a body of spreadsheet data, instantiates instances of Data Structures of the Object Model to represent the spreadsheet data, parses the data and formulas contained in the cells of the spreadsheets, iterates through the instantiated instances of the Data Structures, and generates source code that performs the calculations and data transformations embodied in the spreadsheet data. The system includes a Calculation Engine with software base classes that implement the common structural and data access features of spreadsheet data, and further implement the operations of common spreadsheet functions and operators.

Patent
13 Mar 2002
TL;DR: In this article, a method for operating an electronic device adapted to be electronically coupled to at least one microprocessor-based device and preventing unauthorized access to data exchanged between the at least 1 microprocessor based device and other microprocessorbased devices was proposed.
Abstract: In accordance with a first aspect, a method for operating an electronic device adapted to be electronically coupled to at least one microprocessor based device and prevent unauthorized access to data exchanged between the at least one microprocessor based device and other microprocessor based devices, the method including: in a first mode, establishing a secure point-to-point communications session with another like device and receiving security data from the other like device, the security data being associated with an intended recipient microprocessor based device; and, in a second mode, receiving the data from an originating one of the at least one microprocessor based devices, encrypting the data using at least the received security data and sending the encrypted data to the originating microprocessor based device. In accordance with a second aspect, a method for exchanging data between a plurality of suitable microprocessor based devices over a computer network so as to frustrate unauthorized access to the data, the method including: identifying at least first and second recipients for the data to be exchanged; identifying first security data associated with the first recipient and second security data associated with the second recipient; and, encrypting the data using the first and second security data.

Patent
John Garney1
08 Aug 2002
TL;DR: In this paper, a write-back mechanism, which may employ security, is employed to enforce usage restrictions, such as an expiration date, usage count limit or data access fee for the acquired data.
Abstract: A destructive-read memory is one that the process of reading the memory causes the contents of the memory to be destroyed. Such a memory may be used in devices that are intended to acquire data that may have associated usage restrictions, such as an expiration date, usage count limit, or data access fee for the acquired data. Typically, to enforce usage restrictions, and protect against theft, complex and often costly security techniques are applied to acquired data. With destructive-read memory, complex and costly security is not required for stored data. In one embodiment, a write-back mechanism, which may employ security, is responsible for enforcing usage restrictions. If the write-back mechanism determines continued access to acquired data is allowed, then it writes back the data as it is destructively read from the memory.

Book ChapterDOI
20 Aug 2002
TL;DR: This paper proposes a solution called C-SDA (Chip-Secured Data Access), which enforces data confidentiality and controls personal privileges thanks to a client-based security component acting as a mediator between a client and an encrypted database.
Abstract: The democratization of ubiquitous computing (access data anywhere, anytime, anyhow), the increasing connection of corporate databases to the Internet and the today's natural resort to Web-hosting companies strongly emphasize the need for data confidentiality. Database servers arouse user's suspicion because no one can fully trust traditional security mechanisms against more and more frequent and malicious attacks and no one can be fully confident on an invisible DBA administering confidential data. This paper gives an in-depth analysis of existing security solutions and concludes on the intrinsic weakness of the traditional server-based approach to preserve data confidentiality. With this statement in mind, we propose a solution called C-SDA (Chip-Secured Data Access), which enforces data confidentiality and controls personal privileges thanks to a client-based security component acting as a mediator between a client and an encrypted database. This component is embedded in a smartcard to prevent any tampering to occur. This cooperation of hardware and software security components constitutes a strong guarantee against attacks threatening personal as well as business data.

Proceedings ArticleDOI
24 Jul 2002
TL;DR: The initial design and prototype of a virtual data Grid for LIGO, which is being built to observe the gravitational waves predicted by general relativity, is described.
Abstract: Many Physics experiments today generate large volumes of data. That data is then processed in a variety of ways in order to achieve the understanding of fundamental physical phenomena. The goal of the NSF-funded GriPhyN project (Grid Physics Network) is to enable scientists to seamlessly access data whether it is raw experimental data or a data product which is a result of further processing. GriPhyN provides a new degree of transparency in how data-handling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN refers to the set of all data products available to the user as virtual data. Among the physics applications participating in the project is the Laser Interferometer Gravitational-wave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. We describe our initial design and prototype of a virtual data Grid for LIGO.

Patent
23 Dec 2002
TL;DR: In this article, a method and device are described which provide a security interface, preferably for a mobile device, providing user-selectable non-secure data that is displayed without the need for a password.
Abstract: A method and device are described which provide a security interface, preferably for a mobile device. The security interface provides user-selectable non-secure data that is displayed without the need for a password. The non-secure data is preferably updated on a regular basis, and can be obtained from different sources, as selected by a user. The secure data can be accessed after successful authentication, such as a positive password verification. Additional non-secure data, related to the displayed non-secure data, can preferably be accessed, with or without a need for a password. An indication can be provided to inform a user that secure data has been updated, without the need to access such secure data. The security interface is preferably enabled after a predetermined timeout period. The interface allows the device to operate in three data access states: a controlled access state; a verification state; and a full access state.

Patent
30 Jul 2002
TL;DR: A schema-based device service as mentioned in this paper provides centralized access to per-user device data, wherein access to the device data is based on each user's identity, and also includes methods that provide access to data in a defined way.
Abstract: A schema-based device service that provides centralized access to per-user device data, wherein access to the device data is based on each user's identity. The device service includes a schema that defines rules and a structure for each user's data, and also includes methods that provide access to the data in a defined way. The device schema thus corresponds to a logical document containing the data for each user. A service such as a notification/alerts service accesses data in the logical document by data access requests through defined methods, such as in order to customize or modify a notification for a device based on the device characteristics. In one implementation, the device schemas are arranged as XML documents, and the services provide methods that control access to the data based on the requesting user's identification, defined role and scope for that role.

Patent
Adam Yeh1, Abhijit Kundu1
28 Jun 2002
TL;DR: In this paper, a system and method for a reporting information service using metadata to communicate with databases and a user interface is presented, which includes software in a data access component, a report component, and an interface component for populating, maintaining and dispatching reports responsive to user requests for the reports via metadata.
Abstract: A system and method for a reporting information service using metadata to communicate with databases and a user interface. The invention includes software in a data access component, a report component, and a user interface component for populating, maintaining, and dispatching reports responsive to user requests for the reports via metadata. The data access component provides a logical view of data in a database via data access metadata. The report component populates, maintains, and dispatches reports via report metadata characterizing the reports. The user interface component renders the report dispatched from the report component via user interface metadata specifying rendering attributes for the report.

Patent
26 Aug 2002
TL;DR: In this article, a single search-oriented uniform user interface is provided to restore data irrespective of the storage location of the data and access to data regardless of the location and type (e.g., archived, backup, or otherwise) of data is enabled.
Abstract: Techniques for restoring data in a heterogeneous storage environment (fig. 3). The data to be restored may be identified based upon user-specified contents and/or attributes of the data (302). The data to be restored may be identified from backup data, archived data, and other types of data (304). A single search-oriented uniform user interface is provided to restore data irrespective of the storage location of the data. Access to data regardless of the location and type (e.g., archived, backup, or otherwise) of the data is enabled.

Leanne P. Guy1, Peter Z. Kunszt1, Erwin Laure1, Heinz Stockinger1, Kurt Stockinger1 
01 Jan 2002
TL;DR: The architecture and design of a Replica Management System, called Reptor, is described within the context of the EU Data Grid project, and a prototype implementation is currently under development.
Abstract: Providing fast, reliable and transparent access to data to all users within a community is one of the the most crucial functions of data management in a Grid environment. User communities are typically large and highly geographically distributed. The volume of data that they wish to access is of the order of petabytes and may also be distributed. It is infeasible for all users to access a single instance of all data. One solution is that of data replication. Identical replicas of data are generated and stored at various globally distributed sites. Replication can reduce data access latency and increase the performance and robustness of distributed applications. The existence of multiple instances of data however introduces additional issues. Replicas must be kept consistent, they must be locatable, and their lifetime must be managed. These and other issues necessitate a high level system for replica management in Data Grids. This paper describes the architecture and design of a Replica Management System, called Reptor, within the context of the EU Data Grid project. A prototype implementation is currently under development.

Patent
10 Jul 2002
TL;DR: In this article, the authors present an approach to manage digital content and digital rights policies associated with one or more users in a dynamic repository, where the user's digital rights policy indicates the level of access a user has to digital content in the repository.
Abstract: A dynamic repository (either storing digital data content or pointers to stored digital data content) works in conjunction with a plurality of interfaces to manage digital content and digital rights policies associated with one or more users. Digital rights policies are unique to each user and such policies define access to digital content in the repository. The user's digital rights policy indicates the level of access a user has to digital content in the repository (e.g., the policy could indicate that the user has authorized access to a particular file for a period of seven days). The interfaces linked with the content repository are used to access and manipulate the digital data content (based upon each user's digital rights policy) and the digital rights policies stored in the content repository. The interfaces include: (a) one or more authentication interfaces for authenticating users, (b) one or more digital rights management (DRM) interfaces allowing users to add, delete, or edit the digital rights policies, (c) one or more data access interfaces allowing users to selectively access digital data content as defined by their individual digital rights policy, (d) one or more browsing interfaces allowing users to selectively browse said digital data content, or a (e) one or more content manipulation interfaces allowing said users to add, delete, or edit said digital data content.

Patent
12 Aug 2002
TL;DR: In this article, the authors present a stand-alone security system for Web-based and IVR-based self-service functions, with five primary facets: (1) control of access to secured information (2) enabling access to users having indirect and direct relationships with the sponsor organization, distribution of security administration from a central information technology resource to users of the security system, support for integration into different environments, and support for system integrators.
Abstract: A stand-alone security system controlling access to secured information and self-service functionality for a sponsor organization, usable for Web-based and IVR-based self-service functions, having five primary facets: (1) control of access to secured information (2) enabling access to users having indirect and direct relationships with the sponsor organization (3) distribution of security administration from a central information technology resource to users of the security system, (4) support for integration into different environments, and (5) support for system integrators. Key components of access control include (1) association of a userID with one specific person, (2) identification of keys to data in back-end systems and association of those keys with the system users, (3) definition of pieces (segments) of an organization so that permissions are granted based on the pieces, (4) definition of user roles based on the functionality to which he has been given permission, (5) a single sign-on for a user with multiple reasons to use the system, and (6) support for direct and indirect assignment of business functions.

Book ChapterDOI
24 Jun 2002
TL;DR: This paper presents the view of the important research issues in location management, which include modeling of location information, uncertainty management, spatio-temporal data access languages, indexing and scalability issues, data mining, location dissemination, privacy and security, location fusion and synchronization.
Abstract: Miniaturization of computing devices, and advances in wireless communication and sensor technology are some of the forces that are propagating computing from the stationary desktop to the mobile outdoors. Some important classes of new applications that will be enabled by this revolutionary development include location-based services, tourist services, mobile electronic commerce, and digital battlefield. Some existing application classes that will benefit from the development include transportation and air traffic control, weather forecasting, emergency response, mobile resource management, and mobile workforce. Location management, i.e. the management of transient location information, is an enabling technology for all these applications. Location management is also a fundamental component of other technologies such as fly-through visualization, context awareness, augmented reality, cellular communication, and dynamic resource discovery.In this paper we present our view of the important research issues in location management. These include modeling of location information, uncertainty management, spatio-temporal data access languages, indexing and scalability issues, data mining (including traffic and location prediction), location dissemination, privacy and security, location fusion and synchronization.

Patent
01 Mar 2002
TL;DR: A messaging data structure (700) as mentioned in this paper is a data structure for accessing data in an identity-centric manner, an identity may be a user, a group of users, or an organization.
Abstract: A messaging data structure (700) for accessing data in an identity-centric manner, An identity may be a user, a group of users, or an organization. Instead of data being maintained on an application-by-application basis, the data associated with a particular identity is stored by one or more data services accessible by many applications. The data is stored in accordance with a schema that is recognized by number of different applications and the data service. The messaging data structure (700) includes fields that identify the target data object to be operated upon using an identity field (701), a schema field (703), and an instance identifier field (704). In addition, the desired operation (707) is specified. Thus, the target data objet is operated on in an identity-centric manner.

Proceedings ArticleDOI
19 Jun 2002
TL;DR: This work presents a QoS management scheme to support guarantees on deadline miss ratio and data freshness (temporal consistency) even in the presence of unpredictable workloads and data access patterns.
Abstract: The demand for real-time database services has been increasing recently. Examples include sensor data fusion, decision support, Web information services, and online trading. In these applications, it is desirable to execute transactions within their deadlines using temporally consistent data. Due to the high service demand, real-time databases can be overloaded. As a result, many transactions may miss their deadlines, or data temporal consistency constraints can be violated. To address these problems, we present a QoS management scheme to support guarantees on deadline miss ratio and data freshness (temporal consistency) even in the presence of unpredictable workloads and data access patterns. Using our approach, admitted user transactions can be processed in time using fresh data. A simulation study shows that our QoS-sensitive approach can achieve a significant performance improvement, in terms of deadline miss ratio and data freshness, compared to several baseline approaches. Furthermore, our approach shows a comparable performance to the theoretical oracle that is privileged by a complete future knowledge of data accesses.

Patent
07 Mar 2002
TL;DR: In this article, a storage system for providing a preferable data access performance by performing controls considering database management system (DBMS) execution information or database process priorities, by acquiring static configurational information of DBMS by means of a DBMS information acquisition and communication program, a DB MS information communication section, and a host information setting program.
Abstract: A storage system for providing a preferable data access performance by performing controls considering database management system (DBMS) execution information or database process priorities, by acquiring static configurational information of a DBMS by means of a DBMS information acquisition and communication program, a DBMS information communication section, and a host information setting program; acquiring DBMS execution information by means of a query plan acquisition program, the DBMS information communication section, and a process performance management program; acquiring information on priorities of database processes given by the process performance management program; and storing them in disk I/O management information with process priorities, DBMS execution information, and DBMS data information, in which cache control section in a storage system control program controls a data cache by referring to the above information.

Book ChapterDOI
18 Nov 2002
TL;DR: This paper argues: (i) that DQP can be important in the Grid, as a means of providing high-level, declarative languages for integrating data access and analysis; and (ii) that the Grid provides resource management facilities that are useful to developers of D QP systems.
Abstract: Distributed query processing (DQP) has been widely used in data intensive applications where data of relevance to users is stored in multiple locations. This paper argues: (i) that DQP can be important in the Grid, as a means of providing high-level, declarative languages for integrating data access and analysis; and (ii) that the Grid provides resource management facilities that are useful to developers of DQP systems. As well as discussing and illustrating how DQP technologies can be deployed within the Grid, the paper describes a prototype implementation of a DQP system running over Globus.

Proceedings ArticleDOI
TL;DR: The Sloan Digital Sky Survey strategies for data publication, data access, curation, and preservation are described, observing that published scientific data needs to be available forever and gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode.
Abstract: Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that published scientific data needs to be available forever -- this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.