Topic

Data management

About: Data management is a research topic. Over the lifetime, 31574 publications have been published within this topic receiving 424326 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Introduction to a system for distributed databases (SDD-1)

[...]

James B. Rothnie, Philip A. Bernstein, S. Fox, Nathan Goodman, M. Hammer, Terry Landers, Christopher L Reeve, David W. Shipman, Eugene Wong - Show less +5 more

01 Mar 1980-ACM Transactions on Database Systems

TL;DR: This paper presents an overview of the SDD-1 design and its solutions to the above problems.

...read moreread less

Abstract: The declining cost of computer hardware and the increasing data processing needs of geographically dispersed organizations have led to substantial interest in distributed data management. SDD-1 is a distributed database management system currently being developed by Computer Corporation of America. Users interact with SDD-1 precisely as if it were a nondistributed database system because SDD-1 handles all issues arising from the distribution of data. These issues include distributed concurrency control, distributed query processing, resiliency to component failure, and distributed directory management. This paper presents an overview of the SDD-1 design and its solutions to the above problems.This paper is the first of a series of companion papers on SDD-1 (Bernstein and Shipman [2], Bernstein et al. [4], and Hammer and Shipman [14]).

...read moreread less

253 citations

Journal Article•DOI•

A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology

[...]

Ishwarappa¹, J. Anuradha²•Institutions (2)

College of Engineering, Pune¹, VIT University²

01 Jan 2015-Procedia Computer Science

TL;DR: This paper presents the 5Vs characteristics of big data and the technique and technology used to handle big data in a wide variety of scalable database tools and techniques.

...read moreread less

253 citations

Journal Article•DOI•

NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets.

[...]

Marcus R. Breese¹, Yunlong Liu¹•Institutions (1)

Indiana University¹

15 Feb 2013-Bioinformatics

TL;DR: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files, that provide a stable and modular platform for data management and analysis.

...read moreread less

Abstract: Summary: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. These tools provide a stable and modular platform for data management and analysis. Availability and implementation: NGSUtils is available under a BSD license and works on Mac OS X and Linux systems. Python 2.6+ and virtualenv are required. More information and source code may be obtained from the website: http://ngsutils.org. Contact: ude.iupui@uilnuy Supplemental information: Supplementary data are available at Bioinformatics online.

...read moreread less

253 citations

Journal Article•DOI•

Big Data and Data Science Methods for Management Research

[...]

Gerard George¹, Ernst C. Osinga¹, Dovev Lavie², Brent A. Scott³•Institutions (3)

Singapore Management University¹, Technion – Israel Institute of Technology², Michigan State University³

16 Aug 2016-Academy of Management Journal

TL;DR: This editorial addresses both the collection and handling of big data and the analytical tools provided by data science for management scholars, and provides a primer or a “starter kit” for potential data science applications inmanagement research.

...read moreread less

Abstract: The recent advent of remote sensing, mobile technologies, novel transaction systems, and highperformance computing offers opportunities to understand trends, behaviors, and actions in a manner that has not been previously possible. Researchers can thus leverage “big data” that are generated from a plurality of sources including mobile transactions, wearable technologies, social media, ambient networks, andbusiness transactions.An earlierAcademy of Management Journal (AMJ) editorial explored the potential implications for data science inmanagement research and highlighted questions for management scholarship as well as the attendant challenges of data sharing and privacy (George, Haas, & Pentland, 2014). This nascent field is evolving rapidly and at a speed that leaves scholars and practitioners alike attempting to make sense of the emergent opportunities that big datahold.With thepromiseof bigdata comequestions about the analytical value and thus relevance of these data for theory development—including concerns over the context-specific relevance, its reliability and its validity. To address this challenge, data science is emerging as an interdisciplinary field that combines statistics, data mining, machine learning, and analytics to understand and explainhowwecan generate analytical insights and prediction models from structured and unstructured big data. Data science emphasizes the systematic study of the organization, properties, and analysis of data and their role in inference, including our confidence in the inference (Dhar, 2013).Whereas both big data and data science terms are often used interchangeably, “big data” refer to large and varied data that can be collected and managed, whereas “data science” develops models that capture, visualize, andanalyze theunderlyingpatterns in thedata. In this editorial, we address both the collection and handling of big data and the analytical tools provided by data science for management scholars. At the current time, practitioners suggest that data science applications tackle the three core elements of big data: volume, velocity, and variety (McAfee & Brynjolfsson, 2012; Zikopoulos & Eaton, 2011). “Volume” represents the sheer size of the dataset due to the aggregation of a large number of variables and an even larger set of observations for each variable. “Velocity” reflects the speed atwhich these data are collected and analyzed, whether in real time or near real time from sensors, sales transactions, social media posts, and sentiment data for breaking news and social trends. “Variety” in big data comes from the plurality of structured and unstructured data sources such as text, videos, networks, and graphics among others. The combinations of volume, velocity, and variety reveal the complex task of generating knowledge from big data, which often runs into millions of observations, and deriving theoretical contributions from such data. In this editorial, we provide a primer or a “starter kit” for potential data science applications inmanagement research. We do so with a caveat that emerging fields outdate and improve uponmethodologies while often supplanting them with new applications. Nevertheless, this primer can guide management scholars who wish to use data science techniques to reach better answers to existing questions or explore completely new research questions.

...read moreread less

251 citations

Journal Issue•DOI•

Scientific workflow management and the Kepler system: Research Articles

[...]

Bertram Ludäscher¹, Ilkay Altintas¹, Chad Berkley², Dan Higgins², Efrat Jaeger¹, Matthew B. Jones², Edward A. Lee³, Jing Tao¹, Yang Zhao³ - Show less +5 more•Institutions (3)

San Diego Supercomputer Center¹, University of California, Santa Barbara², University of California, Berkeley³

15 Aug 2006-Concurrency and Computation: Practice and Experience

TL;DR: Characteristics of and requirements for scientific workflows as identified in a number of application projects are described, and some key features of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research are described.

...read moreread less

Abstract: Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery ‘pipelines’. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. ‘the Grid’). However, this infrastructure is only a means to an end and ideally scientists should not be too concerned with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high-performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research. Kepler is a community-driven, open source project, and we always welcome related projects and new contributors to join. Copyright © 2005 John Wiley & Sons, Ltd.

...read moreread less

250 citations

Collapse

Network Information

Performance

Metrics

32,259

Papers

465,338

Citations

No. of papers in the topic in previous years
Year	Papers
2023	218
2022	485
2021	959
2020	1,435
2019	1,745
2018	1,719

Data management

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics