scispace - formally typeset
Search or ask a question
Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.


Papers
More filters
Proceedings ArticleDOI
13 Nov 2010
TL;DR: This paper ran experiments using three typical workflow applications on Amazon's EC2, and investigated some of the ways in which data can be managed for workflows in the cloud.
Abstract: Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.

125 citations

Journal ArticleDOI
01 Aug 2019
TL;DR: This tutorial considers how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.
Abstract: The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.

125 citations

Journal ArticleDOI
TL;DR: The standard files developed, and which were used in a software package termed DSSAT V3, have recently been upgraded by a consortium of experimenters and modellers (the International Consortium for Agricultural Systems Applications; ICASA), which constitute an advance in the potential for good documentation and storage of agronomic data.

125 citations

Patent
11 Oct 2010
TL;DR: In this paper, a host driver embedded in an application server connects an application and its data to a cluster and captures real-time data transactions, preferably in the form of an event journal that is provided to the data management system.
Abstract: A “forward” delta data management technique uses a “sparse” index associated with a delta file to achieve both delta management efficiency and to eliminate read latency while accessing history data. The invention may be implemented advantageously in a data management system that provides real-time data services to data sources associated with a set of application host servers. A host driver embedded in an application server connects an application and its data to a cluster. The host driver captures real-time data transactions, preferably in the form of an event journal that is provided to the data management system. In particular, the driver functions to translate traditional file/database/block I/O into a continuous, application-aware, output data stream. A given application-aware data stream is processed through a multi-stage data reduction process to produce a compact data representation from which an “any point-in-time” reconstruction of the original data can be made.

125 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
90% related
Software
130.5K papers, 2M citations
88% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
The Internet
213.2K papers, 3.8M citations
82% related
Cloud computing
156.4K papers, 1.9M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023218
2022485
2021959
20201,435
20191,745
20181,719