scispace - formally typeset
Search or ask a question
Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: LORIS is a complete solution that has been thoroughly tested through a full 10 year life cycle of a multi-center longitudinal project and is now supporting numerous international neurodevelopment and neurodegeneration research projects.
Abstract: LORIS (Longitudinal Online Research and Imaging System) is a modular and extensible web-based data management system that integrates all aspects of a multi-center study: from heterogeneous data acquisition (imaging, clinical, behavior, genetics) to storage, processing and ultimately dissemination. It provides a secure, user-friendly, and streamlined platform to automate the flow of clinical trials and complex multi-center studies. A subject-centric internal organization allows researchers to capture and subsequently extract all information, longitudinal or cross-sectional, from any subset of the study cohort. Extensive error-checking and quality control procedures, security, data management, data querying and administrative functions provide LORIS with a triple capability (i) continuous project coordination and monitoring of data acquisition (ii) data storage/cleaning/querying, (iii) interface with arbitrary external data processing “pipelines”. LORIS is a complete solution that has been thoroughly tested through the full life cycle of a multi-center longitudinal project# and is now supporting numerous neurodevelopment and neurodegeneration research projects internationally.

146 citations

Proceedings ArticleDOI
18 Jun 2014
TL;DR: A principled approach to provide explanations for answers to SQL queries based on intervention: removal of tuples from the database that significantly affect the query answers is introduced.
Abstract: As a consequence of the popularity of big data, many users with a variety of backgrounds seek to extract high level information from datasets collected from various sources and combined using data integration techniques. A major challenge for research in data management is to develop tools to assist users in explaining observed query outputs. In this paper we introduce a principled approach to provide explanations for answers to SQL queries based on intervention: removal of tuples from the database that significantly affect the query answers. We provide a formal definition of intervention in the presence of multiple relations which can interact with each other through foreign keys. First we give a set of recursive rules to compute the intervention for any given explanation in polynomial time (data complexity). Then we give simple and efficient algorithms based on SQL queries that can compute the top-K explanations by using standard database management systems under certain conditions. We evaluate the quality and performance of our approach by experiments on real datasets.

146 citations

Journal ArticleDOI
28 Sep 2010-PLOS ONE
TL;DR: The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface.
Abstract: Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu.

145 citations

Proceedings ArticleDOI
29 Oct 2012
TL;DR: This work presents Deco, a database system for declarative crowdsourcing, and describes Deco's data model, query language, and the Deco query processor which uses a novel push-pull hybrid execution model to respect theDeco semantics while coping with the unique combination of latency, monetary cost, and uncertainty introduced in the crowdsourcing environment.
Abstract: Crowdsourcing enables programmers to incorporate "human computation" as a building block in algorithms that cannot be fully automated, such as text analysis and image recognition. Similarly, humans can be used as a building block in data-intensive applications--providing, comparing, and verifying data used by applications. Building upon the decades-long success of declarative approaches to conventional data management, we use a similar approach for data-intensive applications that incorporate humans. Specifically, declarative queries are posed over stored relational data as well as data computed on-demand from the crowd, and the underlying system orchestrates the computation of query answers. We present Deco, a database system for declarative crowdsourcing. We describe Deco's data model, query language, and our prototype. Deco's data model was designed to be general (it can be instantiated to other proposed models), flexible (it allows methods for data cleansing and external access to be plugged in), and principled (it has a precisely-defined semantics). Syntactically, Deco's query language is a simple extension to SQL. Based on Deco's data model, we define a precise semantics for arbitrary queries involving both stored data and data obtained from the crowd. We then describe the Deco query processor which uses a novel push-pull hybrid execution model to respect the Deco semantics while coping with the unique combination of latency, monetary cost, and uncertainty introduced in the crowdsourcing environment. Finally, we experimentally explore the query processing alternatives provided by Deco using our current prototype.

145 citations

Patent
26 Feb 2002
TL;DR: In this article, a secure database stores risk management information that is accessible by authorized access through a network and a graphics interface generates graphic data of the risk management data in response to the authorized access.
Abstract: A graphical and interactive interface system manages risk management information. A secure database stores risk management information that is accessible by authorized access through a network. A graphics interface generates graphic data of the risk management information in response to the authorized access. One or more workflow process terminals connect in network with the database to provide updates to the risk management information. Summary reporting and statistical processing functionalities facilitate predictive accuracy of the system by permitting a user to compare relevant system inputs when selecting data to provide recommendations to customers for adjustment of insurance policies in accordance with risk management practices.

144 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
90% related
Software
130.5K papers, 2M citations
88% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
The Internet
213.2K papers, 3.8M citations
82% related
Cloud computing
156.4K papers, 1.9M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023218
2022485
2021959
20201,435
20191,745
20181,719