scispace - formally typeset
Search or ask a question
Topic

Data access

About: Data access is a research topic. Over the lifetime, 13141 publications have been published within this topic receiving 172859 citations. The topic is also known as: Data access.


Papers
More filters
Journal ArticleDOI
TL;DR: The Materials Application Programming Interface is described, a simple, flexible and efficient interface to programmatically query and interact with the Materials Project database based on the REpresentational State Transfer (REST) pattern for the web.

333 citations

Journal Article
TL;DR: It is shown that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources, and a non-blocking join operator, called XJoin, which has a small memory footprint, allowing many such operators to be active in parallel.
Abstract: Wide-area distribution raises significant performance problems for traditional query processing techniques as data access becomes less predictable due to link congestion, load imbalances, and temporary outages. Pipelined query execution is a promising approach to coping with unpredictability in such environments as it allows scheduling to adjust to the arrival properties of the data. We have developed a non-blocking join operator, called XJoin, which has a small memory footprint, allowing many such operators to be active in parallel. XJoin is optimized to produce initial results quickly and can hide intermittent delays in data arrival by reactively scheduling background processing. We show that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources. 1 Wide-Area Query Processing The explosive growth of the Internet and the World Wide Web has made tremendous amounts of data available on-line. Emerging standards such as XML, combined with wrapper technologies address semantic challenges by providing relational-style interfaces to remote data. Beyond the issues of structure and semantics, however, there remain significant technical obstacles to building responsive, usable query processing systems for widearea environments. A key performance issue that arises in such environments is response-time unpredictability. Data access over wide-area networks involves a large number of remote data sources, intermediate sites, and communications links, all of which are vulnerable to overloading, congestion, and failures. Such problems can cause significant and unpredictable delays in the access of information from remote sources. These delays, in turn, cause traditional distributed query processing strategies to break down, resulting in unresponsive and hence, unusable systems. In previous work [AFTU96] we identified three classes of delays that can affect the responsiveness of query processing: 1) initial delay, in which there is a longer than expected wait until the first tuple arrives from a remote source; 2) slow delivery, in which data arrive at a fairly constant but slower than expected rate; and 3) bursty arrival, in which data arrive in a fluctuating manner. With traditional query processing techniques, query execution can become blocked even if only one of the accessed data sources experiences such delays. Copyright 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering This work was partially supported by the NSF under grant IRI-94-09575, by the Office of Naval Research under contract number N66001-97-C8539 (DARPA order number F475), by a Siemens Faculty Development Award, and by an IBM Partnership Award.

332 citations

Patent
21 Nov 1997
TL;DR: In this paper, a system and method for managing client authorization to access remote data repositories through a middle tier server such as a web server is presented, where client remote data repository access is intercepted by the middle-tier server and the server is searched for stored credentials permitting client access to the remote Data repository, if found, the stored credentials are used to authenticate access without further interaction with the client system.
Abstract: A system and method for managing client authorization to access remote data repositories through a middle tier server such as a web server. Client remote data repository access is intercepted by the middle tier server and the server is searched for stored credentials permitting client access to the remote data repository. If found, the stored credentials are used to authenticate access without further interaction with the client system. If no stored credentials are found, the server requests credentials from the client and passes them to the remote data repository for validation. Validated credentials are stored by the server for future use and indexed by a client identifier. Permitted remote data repository access is stored with the validated credentials. Access to a mounted remote file system is not permitted without authorization even if the remote file system would not otherwise require authorization.

330 citations

Journal ArticleDOI
TL;DR: CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database.
Abstract: The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.

329 citations

Proceedings ArticleDOI
07 Mar 2004
TL;DR: A hybrid approach (HybridCache) is proposed, which can further improve the performance by taking advantage of CacheData and CachePath while avoiding their weaknesses, and can significantly reduce the query delay and message complexity when compared to other caching schemes.
Abstract: Most researches in ad hoc networks focus on routing, and not much work has been done on data access. A common technique used to improve the performance of data access is caching. Cooperative caching, which allows the sharing and coordination of cached data among multiple nodes, can further explore the potential of the caching techniques. Due to mobility and resource constraints of ad hoc networks, cooperative caching techniques designed for wired network may not be applicable to ad hoc networks. In this paper, we design and evaluate cooperative caching techniques to efficiently support data access in ad hoc networks. We first propose two schemes: cachedata which caches the data, and cachepath which caches the data path. After analyzing the performance of those two schemes, we propose a hybrid approach (hybridcache) which can further improve the performance by taking advantage of cachedata and cachepath while avoiding their weaknesses. Simulation results show that the proposed schemes can significantly reduce the query delay and message complexity when compared to other caching schemes.

327 citations


Network Information
Related Topics (5)
Software
130.5K papers, 2M citations
86% related
Cloud computing
156.4K papers, 1.9M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
85% related
The Internet
213.2K papers, 3.8M citations
85% related
Information system
107.5K papers, 1.8M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202351
2022125
2021403
2020721
2019906
2018816