Topic
Data management
About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.
Papers published on a yearly basis
Papers
More filters
•
01 Jan 2003TL;DR: This paper introduces a scalable P2P framework for distributed data management applications using mutant query plans: XML serializations of algebraic query plan graphs that can include verbatim XML data, references to resource locations (URLs), and more.
Abstract: Peer-to-peer (P2P) architectures are commonly used for file-sharing applications. The reasons for P2P’s popularity in file sharing ‐ fault tolerance, scalability, and ease of deployment ‐ also make it a good model for distributed data management. In this paper, we introduce a scalable P2P framework for distributed data management applications using mutant query plans: XML serializations of algebraic query plan graphs that can include verbatim XML data, references to resource locations (URLs), and
88 citations
•
16 Jul 2003
TL;DR: Associative Data Management and Knowledge Operating System using a Data Instance centric architecture, where Data Instances are typically atomic as discussed by the authors, each data instance can be at the center with all its associations.
Abstract: Associative Data Management and Knowledge Operating System using a Data Instance centric architecture, where Data Instances are typically atomic. Each Data Instance can be at the center with all its associations. The base structures encapsulate the Data Instances and can generally be identical in form and function, and application independent. Encapsulate references can include references to all other directly related independently encapsulated Data Instances. The encapsulated references can be both unique identifiers for each and every associated Data Instance and also logical indexes that encode the abstracted location of each Data Instance, making it possible to both identify and locate any Data Instance using the same reference key.
88 citations
••
07 Apr 1997TL;DR: The goal is to design cooperative strategies between server and client to provide access to information in such a way as to minimize energy expenditure by clients.
Abstract: Mobile computing has the potential for managing information globally. Data management issues in mobile computing have received some attention in recent times, and the design of adaptive broadcast protocols has been posed as an important problem. Such protocols are employed by database servers to decide on the content of broadcasts dynamically, in response to client mobility and demand patterns. In this paper we design such protocols and also propose efficient retrieval strategies that may be employed by clients to download information from broadcasts. The goal is to design cooperative strategies between server and client to provide access to information in such a way as to minimize energy expenditure by clients. We evaluate the performance of our protocols analytically.
88 citations
••
24 Jun 2008TL;DR: Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that the proposed data diffusion approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.
Abstract: Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.
88 citations
•
01 Sep 2014TL;DR: The challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at "Big Data" scale are described.
Abstract: To enable complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused heavily on the design of systems to train complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the serving and management of models at scale. In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. Velox is a data management system for facilitating the next steps in real-world, large-scale analytics pipelines: online model management, maintenance, and serving. Velox provides end-user applications and services with a low-latency, intuitive interface to models, transforming the raw statistical models currently trained using existing offline large-scale compute frameworks into full-blown, end-to-end data products capable of targeting advertisements, recommending products, and personalizing web content. To provide up-to-date results for these complex models, Velox also facilitates lightweight online model maintenance and selection (i.e., dynamic weighting). In this paper, we describe the challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at “Big Data” scale.
88 citations