scispace - formally typeset
Search or ask a question
Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.


Papers
More filters
Journal ArticleDOI
01 Aug 2019
TL;DR: This tutorial considers how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.
Abstract: The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.

125 citations

Journal ArticleDOI
TL;DR: The standard files developed, and which were used in a software package termed DSSAT V3, have recently been upgraded by a consortium of experimenters and modellers (the International Consortium for Agricultural Systems Applications; ICASA), which constitute an advance in the potential for good documentation and storage of agronomic data.

125 citations

Patent
11 Oct 2010
TL;DR: In this paper, a host driver embedded in an application server connects an application and its data to a cluster and captures real-time data transactions, preferably in the form of an event journal that is provided to the data management system.
Abstract: A “forward” delta data management technique uses a “sparse” index associated with a delta file to achieve both delta management efficiency and to eliminate read latency while accessing history data. The invention may be implemented advantageously in a data management system that provides real-time data services to data sources associated with a set of application host servers. A host driver embedded in an application server connects an application and its data to a cluster. The host driver captures real-time data transactions, preferably in the form of an event journal that is provided to the data management system. In particular, the driver functions to translate traditional file/database/block I/O into a continuous, application-aware, output data stream. A given application-aware data stream is processed through a multi-stage data reduction process to produce a compact data representation from which an “any point-in-time” reconstruction of the original data can be made.

125 citations

Journal ArticleDOI
TL;DR: The results of a survey conducted by the working groups of the DataONE project are used to present a new understanding of challenges to the development of global data collections and preservation by systematically examining the determinants of the researchers' likelihood to openly publish research data.

125 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
90% related
Software
130.5K papers, 2M citations
88% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
The Internet
213.2K papers, 3.8M citations
82% related
Cloud computing
156.4K papers, 1.9M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023231
2022507
2021981
20201,475
20191,799
20181,756