scispace - formally typeset
Search or ask a question
Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.


Papers
More filters
Proceedings Article
01 Jan 2003
TL;DR: This paper introduces a scalable P2P framework for distributed data management applications using mutant query plans: XML serializations of algebraic query plan graphs that can include verbatim XML data, references to resource locations (URLs), and more.
Abstract: Peer-to-peer (P2P) architectures are commonly used for file-sharing applications. The reasons for P2P’s popularity in file sharing ‐ fault tolerance, scalability, and ease of deployment ‐ also make it a good model for distributed data management. In this paper, we introduce a scalable P2P framework for distributed data management applications using mutant query plans: XML serializations of algebraic query plan graphs that can include verbatim XML data, references to resource locations (URLs), and

88 citations

Patent
16 Jul 2003
TL;DR: Associative Data Management and Knowledge Operating System using a Data Instance centric architecture, where Data Instances are typically atomic as discussed by the authors, each data instance can be at the center with all its associations.
Abstract: Associative Data Management and Knowledge Operating System using a Data Instance centric architecture, where Data Instances are typically atomic. Each Data Instance can be at the center with all its associations. The base structures encapsulate the Data Instances and can generally be identical in form and function, and application independent. Encapsulate references can include references to all other directly related independently encapsulated Data Instances. The encapsulated references can be both unique identifiers for each and every associated Data Instance and also logical indexes that encode the abstracted location of each Data Instance, making it possible to both identify and locate any Data Instance using the same reference key.

88 citations

Proceedings ArticleDOI
07 Apr 1997
TL;DR: The goal is to design cooperative strategies between server and client to provide access to information in such a way as to minimize energy expenditure by clients.
Abstract: Mobile computing has the potential for managing information globally. Data management issues in mobile computing have received some attention in recent times, and the design of adaptive broadcast protocols has been posed as an important problem. Such protocols are employed by database servers to decide on the content of broadcasts dynamically, in response to client mobility and demand patterns. In this paper we design such protocols and also propose efficient retrieval strategies that may be employed by clients to download information from broadcasts. The goal is to design cooperative strategies between server and client to provide access to information in such a way as to minimize energy expenditure by clients. We evaluate the performance of our protocols analytically.

88 citations

Proceedings ArticleDOI
24 Jun 2008
TL;DR: Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that the proposed data diffusion approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.
Abstract: Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.

88 citations

Proceedings Article
01 Sep 2014
TL;DR: The challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at "Big Data" scale are described.
Abstract: To enable complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused heavily on the design of systems to train complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the serving and management of models at scale. In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. Velox is a data management system for facilitating the next steps in real-world, large-scale analytics pipelines: online model management, maintenance, and serving. Velox provides end-user applications and services with a low-latency, intuitive interface to models, transforming the raw statistical models currently trained using existing offline large-scale compute frameworks into full-blown, end-to-end data products capable of targeting advertisements, recommending products, and personalizing web content. To provide up-to-date results for these complex models, Velox also facilitates lightweight online model maintenance and selection (i.e., dynamic weighting). In this paper, we describe the challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at “Big Data” scale.

88 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
90% related
Software
130.5K papers, 2M citations
88% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
The Internet
213.2K papers, 3.8M citations
82% related
Cloud computing
156.4K papers, 1.9M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023218
2022485
2021959
20201,435
20191,745
20181,719