Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Book•DOI•

Scientific and Statistical Database Management

[...]

Marianne Winslett

01 Jan 2009-Lecture Notes in Computer Science

TL;DR: In this article, the authors present a system for exploring and querying scientific Deep Web data sources based on MapReduce for the purpose of finding regions of interest in large scientific data sets.

...read moreread less

Abstract: Invited Presentation.- The Scientific Data Management Center: Providing Technologies for Large Scale Scientific Exploration.- Improving the End-User Experience.- Query Recommendations for Interactive Database Exploration.- Scientific Mashups: Runtime-Configurable Data Product Ensembles.- View Discovery in OLAP Databases through Statistical Combinatorial Optimization.- Designing a Geo-scientific Request Language - A Database Approach.- SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources.- Expressing OLAP Preferences.- Indexing, Physical Design, and Energy.- Energy Smart Management of Scientific Data.- Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures.- Finding Regions of Interest in Large Scientific Datasets.- Adaptive Physical Design for Curated Archives.- MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions.- Application Experience.- B-Fabric: An Open Source Life Sciences Data Management System.- Design and Implementation of Metadata System in PetaShare.- Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.- Invited Presentation.- What Makes Scientific Workflows Scientific?.- Workflow.- Enabling Ad Hoc Queries over Low-Level Scientific Data Sets.- Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs.- Data Integration with the DaltOn Framework - A Case Study.- Experiment Line: Software Reuse in Scientific Workflows.- Tracking Files in the Kepler Provenance Framework.- BioBrowsing: Making the Most of the Data Available in Entrez.- Using Workflow Medleys to Streamline Exploratory Tasks.- Query Processing.- Experiences on Processing Spatial Data with MapReduce.- Optimization and Execution of Complex Scientific Queries over Uncorrelated Experimental Data.- Comprehensive Optimization of Declarative Sensor Network Queries.- Efficient Evaluation of Generalized Tree-Pattern Queries with Same-Path Constraints.- Mode Aware Stream Query Processing.- Evaluating Reachability Queries over Path Collections.- Similarity Search.- Easing the Dimensionality Curse by Stretching Metric Spaces.- Probabilistic Similarity Search for Uncertain Time Series.- Reverse k-Nearest Neighbor Search Based on Aggregate Point Access Methods.- Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation.- Keynote Address.- Cloud Computing for Science.- Mining.- Classification with Unknown Classes.- HSM: Heterogeneous Subspace Mining in High Dimensional Data.- Split-Order Distance for Clustering and Classification Hierarchies.- Combining Multiple Interrelated Streams for Incremental Clustering.- Improving Relation Extraction by Exploiting Properties of the Target Relation.- Cor-Split: Defending Privacy in Data Re-publication from Historical Correlations and Compromised Tuples.- A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data.- Spatial Data.- Region Extraction and Verification for Spatial and Spatio-temporal Databases.- Identifying the Most Endangered Objects from Spatial Datasets.- Constraint-Based Learning of Distance Functions for Object Trajectories.

...read moreread less

210 citations

Posted Content•

Secure and Trustable Electronic Medical Records Sharing using Blockchain

[...]

Alevtina Dubovitskaya¹, Zhigang Xu², Samuel Ryu², Michael Schumacher¹, Fusheng Wang² - Show less +1 more•Institutions (2)

University of Applied Sciences Western Switzerland¹, Stony Brook University²

02 Aug 2017-arXiv: Computers and Society

TL;DR: In this paper, a framework for managing and sharing electronic medical records (EMRs) for cancer patient care is proposed, which can significantly reduce the turnaround time for EMR sharing, improve decision making for medical care, and reduce the overall cost.

...read moreread less

Abstract: Electronic medical records (EMRs) are critical, highly sensitive private information in healthcare, and need to be frequently shared among peers. Blockchain provides a shared, immutable and transparent history of all the transactions to build applications with trust, accountability and transparency. This provides a unique opportunity to develop a secure and trustable EMR data management and sharing system using blockchain. In this paper, we present our perspectives on blockchain based healthcare data management, in particular, for EMR data sharing between healthcare providers and for research studies. We propose a framework on managing and sharing EMR data for cancer patient care. In collaboration with Stony Brook University Hospital, we implemented our framework in a prototype that ensures privacy, security, availability, and fine-grained access control over EMR data. The proposed work can significantly reduce the turnaround time for EMR sharing, improve decision making for medical care, and reduce the overall cost

...read moreread less

208 citations

Proceedings Article•DOI•

Ricardo: integrating R and Hadoop

[...]

Sudipto Das¹, Yannis Sismanis², Kevin Scott Beyer², Rainer Gemulla², Peter J. Haas², John McPherson² - Show less +2 more•Institutions (2)

University of California, Santa Barbara¹, IBM²

06 Jun 2010

TL;DR: R Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research Center, and rests on a decomposition of data-analysis algorithms into parts executed by the R statistical analysis system and parts handled by the Hadoop data management system.

...read moreread less

Abstract: Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential for marketplace competitiveness. This need to perform deep analysis over huge data repositories poses a significant challenge to existing statistical software and data management systems. On the one hand, statistical software provides rich functionality for data analysis and modeling, but can handle only limited amounts of data; e.g., popular packages like R and SPSS operate entirely in main memory. On the other hand, data management systems - such as MapReduce-based systems - can scale to petabytes of data, but provide insufficient analytical functionality. We report our experiences in building Ricardo, a scalable platform for deep analytics. Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research Center, and rests on a decomposition of data-analysis algorithms into parts executed by the R statistical analysis system and parts handled by the Hadoop data management system. This decomposition attempts to minimize the transfer of data across system boundaries. Ricardo contrasts with previous approaches, which try to get along with only one type of system, and allows analysts to work on huge datasets from within a popular, well supported, and powerful analysis environment. Because our approach avoids the need to re-implement either statistical or data-management functionality, it can be used to solve complex problems right now.

...read moreread less

207 citations

A survey of data provenance techniques

[...]

Yogesh Simmhan, Beth Plale, Dennis Gannon¹•Institutions (1)

Indiana University¹

01 Jan 2005

TL;DR: The main aspect of the taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and storeprovenance, and ways to disseminate it can help those building scientific and business metadata-management systems to understand existing provenance system designs.

...read moreread less

Abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

...read moreread less

206 citations

Journal Article•DOI•

Data Management in the Worldwide Sensor Web

[...]

Magdalena Balazinska¹, Amol Deshpande², Michael J. Franklin³, Phillip B. Gibbons⁴, Jim Gray⁵, Suman Nath⁵, Mark Hansen⁶, M. Liebhold, Alexander S. Szalay⁷, V. Tao⁵ - Show less +6 more•Institutions (7)

Washington University in St. Louis¹, University of Maryland, College Park², University of California, Berkeley³, Intel⁴, Microsoft⁵, University of California, Los Angeles⁶, Johns Hopkins University⁷

01 Apr 2007-IEEE Pervasive Computing

TL;DR: The vision of a worldwide sensor Web is close to becoming a reality with the rapidly increasing number of large-scale sensor network deployments.

...read moreread less

Abstract: Harvesting the benefits of a sensor-rich world presents many data management challenges. Recent advances in research and industry aim to address these challenges. With the rapidly increasing number of large-scale sensor network deployments, the vision of a worldwide sensor Web is close to becoming a reality.

...read moreread less

205 citations

Collapse

Network Information

Performance

Metrics

32,259

Papers

465,338

Citations

No. of papers in the topic in previous years
Year	Papers
2023	218
2022	485
2021	959
2020	1,435
2019	1,745
2018	1,719

Data management

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics