Conference

Operating Systems Design and Implementation

About: Operating Systems Design and Implementation is an academic conference. The conference publishes majorly in the area(s): File system & Server. Over the lifetime, 516 publications have been published by the conference receiving 137542 citations.

...read moreread less

Topics: File system, Server, Scheduling (computing), Cache, Scalability ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

Proceedings Article•DOI•

TensorFlow: a system for large-scale machine learning

[...]

Martín Abadi¹, Paul Barham¹, Jianmin Chen¹, Zhifeng Chen¹, Andy Davis¹, Jeffrey Dean¹, Matthieu Devin¹, Sanjay Ghemawat¹, Geoffrey Irving¹, Michael Isard¹, Manjunath Kudlur¹, Josh Levenberg¹, Rajat Monga¹, Sherry Moore¹, Derek G. Murray¹, Benoit Steiner¹, Paul A. Tucker¹, Vijay K. Vasudevan¹, Pete Warden¹, Martin Wicke¹, Yuan Yu¹, Xiaoqiang Zheng¹ - Show less +18 more•Institutions (1)

Google¹

02 Nov 2016

TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.

...read moreread less

Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

...read moreread less

10,913 citations

Proceedings Article•

Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!).

[...]

Fay W. Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Deepak Chandra, Andrew Fikes, Robert Gruber - Show less +5 more

01 Jan 2006

TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.

...read moreread less

Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

...read moreread less

4,843 citations

Proceedings Article•DOI•

Practical Byzantine fault tolerance

[...]

Miguel Castro¹, Barbara Liskov¹•Institutions (1)

Massachusetts Institute of Technology¹

22 Feb 1999

TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.

...read moreread less

Abstract: This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.

...read moreread less

3,562 citations

Journal Article•DOI•

TAG: a Tiny AGgregation service for Ad-Hoc sensor networks

[...]

Samuel Madden¹, Michael J. Franklin¹, Joseph M. Hellerstein¹, Wei Hong²•Institutions (2)

University of California, Berkeley¹, Intel²

09 Dec 2002

TL;DR: This work presents the Tiny AGgregation (TAG) service for aggregation in low-power, distributed, wireless environments, and discusses a variety of optimizations for improving the performance and fault tolerance of the basic solution.

...read moreread less

Abstract: We present the Tiny AGgregation (TAG) service for aggregation in low-power, distributed, wireless environments. TAG allows users to express simple, declarative queries and have them distributed and executed efficiently in networks of low-power, wireless sensors. We discuss various generic properties of aggregates, and show how those properties affect the performance of our in network approach. We include a performance study demonstrating the advantages of our approach over traditional centralized, out-of-network methods, and discuss a variety of optimizations for improving the performance and fault tolerance of the basic solution.

...read moreread less

3,166 citations

Collapse

Performance

Metrics

516

Papers

137,542

Citations

No. of papers from the Conference in previous years
Year	Papers
2021	25
2020	70
2019	5
2018	47
2017	1
2016	54