scispace - formally typeset
Search or ask a question
Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.


Papers
More filters
BookDOI
TL;DR: In this article, the authors present a system for exploring and querying scientific Deep Web data sources based on MapReduce for the purpose of finding regions of interest in large scientific data sets.
Abstract: Invited Presentation.- The Scientific Data Management Center: Providing Technologies for Large Scale Scientific Exploration.- Improving the End-User Experience.- Query Recommendations for Interactive Database Exploration.- Scientific Mashups: Runtime-Configurable Data Product Ensembles.- View Discovery in OLAP Databases through Statistical Combinatorial Optimization.- Designing a Geo-scientific Request Language - A Database Approach.- SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources.- Expressing OLAP Preferences.- Indexing, Physical Design, and Energy.- Energy Smart Management of Scientific Data.- Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures.- Finding Regions of Interest in Large Scientific Datasets.- Adaptive Physical Design for Curated Archives.- MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions.- Application Experience.- B-Fabric: An Open Source Life Sciences Data Management System.- Design and Implementation of Metadata System in PetaShare.- Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.- Invited Presentation.- What Makes Scientific Workflows Scientific?.- Workflow.- Enabling Ad Hoc Queries over Low-Level Scientific Data Sets.- Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs.- Data Integration with the DaltOn Framework - A Case Study.- Experiment Line: Software Reuse in Scientific Workflows.- Tracking Files in the Kepler Provenance Framework.- BioBrowsing: Making the Most of the Data Available in Entrez.- Using Workflow Medleys to Streamline Exploratory Tasks.- Query Processing.- Experiences on Processing Spatial Data with MapReduce.- Optimization and Execution of Complex Scientific Queries over Uncorrelated Experimental Data.- Comprehensive Optimization of Declarative Sensor Network Queries.- Efficient Evaluation of Generalized Tree-Pattern Queries with Same-Path Constraints.- Mode Aware Stream Query Processing.- Evaluating Reachability Queries over Path Collections.- Similarity Search.- Easing the Dimensionality Curse by Stretching Metric Spaces.- Probabilistic Similarity Search for Uncertain Time Series.- Reverse k-Nearest Neighbor Search Based on Aggregate Point Access Methods.- Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation.- Keynote Address.- Cloud Computing for Science.- Mining.- Classification with Unknown Classes.- HSM: Heterogeneous Subspace Mining in High Dimensional Data.- Split-Order Distance for Clustering and Classification Hierarchies.- Combining Multiple Interrelated Streams for Incremental Clustering.- Improving Relation Extraction by Exploiting Properties of the Target Relation.- Cor-Split: Defending Privacy in Data Re-publication from Historical Correlations and Compromised Tuples.- A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data.- Spatial Data.- Region Extraction and Verification for Spatial and Spatio-temporal Databases.- Identifying the Most Endangered Objects from Spatial Datasets.- Constraint-Based Learning of Distance Functions for Object Trajectories.

210 citations

Posted Content
TL;DR: In this paper, a framework for managing and sharing electronic medical records (EMRs) for cancer patient care is proposed, which can significantly reduce the turnaround time for EMR sharing, improve decision making for medical care, and reduce the overall cost.
Abstract: Electronic medical records (EMRs) are critical, highly sensitive private information in healthcare, and need to be frequently shared among peers. Blockchain provides a shared, immutable and transparent history of all the transactions to build applications with trust, accountability and transparency. This provides a unique opportunity to develop a secure and trustable EMR data management and sharing system using blockchain. In this paper, we present our perspectives on blockchain based healthcare data management, in particular, for EMR data sharing between healthcare providers and for research studies. We propose a framework on managing and sharing EMR data for cancer patient care. In collaboration with Stony Brook University Hospital, we implemented our framework in a prototype that ensures privacy, security, availability, and fine-grained access control over EMR data. The proposed work can significantly reduce the turnaround time for EMR sharing, improve decision making for medical care, and reduce the overall cost

208 citations

Proceedings ArticleDOI
06 Jun 2010
TL;DR: R Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research Center, and rests on a decomposition of data-analysis algorithms into parts executed by the R statistical analysis system and parts handled by the Hadoop data management system.
Abstract: Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential for marketplace competitiveness. This need to perform deep analysis over huge data repositories poses a significant challenge to existing statistical software and data management systems. On the one hand, statistical software provides rich functionality for data analysis and modeling, but can handle only limited amounts of data; e.g., popular packages like R and SPSS operate entirely in main memory. On the other hand, data management systems - such as MapReduce-based systems - can scale to petabytes of data, but provide insufficient analytical functionality. We report our experiences in building Ricardo, a scalable platform for deep analytics. Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research Center, and rests on a decomposition of data-analysis algorithms into parts executed by the R statistical analysis system and parts handled by the Hadoop data management system. This decomposition attempts to minimize the transfer of data across system boundaries. Ricardo contrasts with previous approaches, which try to get along with only one type of system, and allows analysts to work on huge datasets from within a popular, well supported, and powerful analysis environment. Because our approach avoids the need to re-implement either statistical or data-management functionality, it can be used to solve complex problems right now.

207 citations

01 Jan 2005
TL;DR: The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it can help those building scientific and business metadata-management systems to understand existing provenance system designs.
Abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

206 citations

Journal ArticleDOI
TL;DR: The vision of a worldwide sensor Web is close to becoming a reality with the rapidly increasing number of large-scale sensor network deployments.
Abstract: Harvesting the benefits of a sensor-rich world presents many data management challenges. Recent advances in research and industry aim to address these challenges. With the rapidly increasing number of large-scale sensor network deployments, the vision of a worldwide sensor Web is close to becoming a reality.

205 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
90% related
Software
130.5K papers, 2M citations
88% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
The Internet
213.2K papers, 3.8M citations
82% related
Cloud computing
156.4K papers, 1.9M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023218
2022485
2021959
20201,435
20191,745
20181,719