Topic

Distributed database

About: Distributed database is a research topic. Over the lifetime, 11788 publications have been published within this topic receiving 210562 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The Hadoop Distributed File System

[...]

Konstantin Shvachko¹, Hairong Kuang¹, Sanjay Radia¹, Robert J. Chansler¹•Institutions (1)

Yahoo!¹

03 May 2010

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.

...read moreread less

Abstract: The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

...read moreread less

5,005 citations

Book•

Principles of Distributed Database Systems

[...]

M. Tamer zsu¹, Patrick Valduriez²•Institutions (2)

University of Alberta¹, French Institute for Research in Computer Science and Automation²

01 Aug 1990

TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.

...read moreread less

Abstract: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. Coverage of emerging topics such as data streams and cloud computing Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.

...read moreread less

2,395 citations

Journal Article•DOI•

Federated database systems for managing distributed, heterogeneous, and autonomous databases

[...]

Amit P. Sheth, James A. Larson¹•Institutions (1)

Intel¹

01 Sep 1990-ACM Computing Surveys

TL;DR: In this paper, the authors define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed, and define a methodology for developing one of the popular architectures of an FDBS.

...read moreread less

Abstract: A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed. We then define a methodology for developing one of the popular architectures of an FDBS. Finally, we discuss critical issues related to developing and operating an FDBS.

...read moreread less

2,376 citations

Journal Article•DOI•

Virtual time

[...]

David Jefferson¹•Institutions (1)

University of Southern California¹

01 Jul 1985-ACM Transactions on Programming Languages and Systems

TL;DR: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control.

...read moreread less

Abstract: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control. Virtual time provides a flexible abstraction of real time in much the same way that virtual memory provides an abstraction of real memory. It is implemented using the Time Warp mechanism, a synchronization protocol distinguished by its reliance on lookahead-rollback, and by its implementation of rollback via antimessages.

...read moreread less

2,280 citations

Journal Article•DOI•

Network Coding for Distributed Storage Systems

[...]

Alexandros G. Dimakis¹, P B Godfrey², Yunnan Wu³, Martin J. Wainwright⁴, Kannan Ramchandran⁴ - Show less +1 more•Institutions (4)

University of Southern California¹, University of Illinois at Urbana–Champaign², Microsoft³, University of California, Berkeley⁴

01 Sep 2010-IEEE Transactions on Information Theory

TL;DR: It is shown that there is a fundamental tradeoff between storage and repair bandwidth which is theoretically characterize using flow arguments on an appropriately constructed graph and regenerating codes are introduced that can achieve any point in this optimal tradeoff.

...read moreread less

Abstract: Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a single node failure is for a new node to reconstruct the whole encoded data object to generate just one encoded block. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to communicate functions of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff.

...read moreread less

1,919 citations

Collapse

Network Information

Performance

Metrics

11,902

Papers

226,193

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	74
2021	351
2020	430
2019	480
2018	536

Distributed database

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics