Scale and performance in a distributed file system

doi:10.1145/35037.35059

Home
/
Papers
/
Scale and performance in a distributed file system

Journal Article•DOI•

Scale and performance in a distributed file system

John H. Howard¹, Michael Kazar¹, Sherri G. Menees¹, David A. Nichols¹, Mahadev Satyanarayanan¹, Robert N. Sidebotham¹, Michael J. West¹ - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

01 Feb 1988-ACM Transactions on Computer Systems (ACM)-Vol. 6, Iss: 1, pp 51-81

TL;DR: Observations of a prototype implementation are presented, changes in the areas of cache validation, server process structure, name translation, and low-level storage representation are motivated, and Andrews ability to scale gracefully is quantitatively demonstrated.

read less

Abstract: The Andrew File System is a location-transparent distributed tile system that will eventually span more than 5000 workstations at Carnegie Mellon University. Large scale affects performance and complicates system operation. In this paper we present observations of a prototype implementation, motivate changes in the areas of cache validation, server process structure, name translation, and low-level storage representation, and quantitatively demonstrate Andrews ability to scale gracefully. We establish the importance of whole-file transfer and caching in Andrew by comparing its performance with that of Sun Microsystems NFS tile system. We also show how the aggregation of files into volumes improves the operability of the system.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The Google file system

[...]

Sanjay Ghemawat¹, Howard Gobioff¹, Shun-Tak Albert Leung¹•Institutions (1)

Google¹

19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

...read moreread less

5,429 citations

Proceedings Article•DOI•

Practical Byzantine fault tolerance

[...]

Miguel Castro¹, Barbara Liskov¹•Institutions (1)

Massachusetts Institute of Technology¹

22 Feb 1999

TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.

...read moreread less

Abstract: This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.

...read moreread less

3,562 citations

Journal Article•DOI•

OceanStore: an architecture for global-scale persistent storage

[...]

John Kubiatowicz¹, David Bindel¹, Yan Chen¹, Steven E. Czerwinski¹, Patrick Eaton¹, Dennis Geels¹, Ramakrishna Gummadi¹, Sean Rhea¹, Hakim Weatherspoon¹, Westley Weimer¹, Chris Wells¹, Ben Y. Zhao¹ - Show less +8 more•Institutions (1)

University of California, Berkeley¹

12 Nov 2000

TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.

...read moreread less

Abstract: OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

...read moreread less

3,376 citations

Journal Article•DOI•

Big Data: A Survey

[...]

Min Chen¹, Shiwen Mao², Yunhao Liu³•Institutions (3)

Huazhong University of Science and Technology¹, Auburn University², Tsinghua University³

01 Apr 2014-Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

...read moreread less

2,303 citations

Journal Article•DOI•

Practical byzantine fault tolerance and proactive recovery

[...]

Miguel Castro¹, Barbara Liskov²•Institutions (2)

Microsoft¹, Massachusetts Institute of Technology²

01 Nov 2002-ACM Transactions on Computer Systems

TL;DR: A new replication algorithm, BFT, is described that can be used to build highly available systems that tolerate Byzantine faults and is used to implement the first Byzantine-fault-tolerant NFS file system, BFS.

...read moreread less

Abstract: Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2p faster to 24p slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.

...read moreread less

2,190 citations

Cites background from "Scale and performance in a distribu..."

...They ran two well-known .le system benchmarks: the modi.ed Andrew benchmark [Ousterhout 1990; Howard et al. 1988] and PostMark [Katcher 1997]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Distributed Operating Systems

[...]

Andrew S. Tanenbaum¹, Robbert van Renesse¹•Institutions (1)

University of Amsterdam¹

30 Jan 2009

TL;DR: What constitutes a distributed operating system and how it is distinguished from a computer network are discussed, and several examples of current research projects are examined in some detail.

...read moreread less

Abstract: Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways This paper is intended as an introduction to distributed operating systems, and especially to current university research about them After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden

...read moreread less

1,327 citations

Journal Article•DOI•

Andrew: a distributed personal computing environment

[...]

James Morris¹, Mahadev Satyanarayanan¹, Michael H. Conner¹, John H. Howard¹, David S. H. Rosenthal¹, F. Donelson Smith¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

01 Mar 1986-Communications of The ACM

TL;DR: The origins of Andrew are traced, its goals and strategies are discussed, and an overview of the current status of its implementation and usage is given.

...read moreread less

Abstract: The Information Technology Center (ITC), a collaborative effort between IBM and Carnegie-Mellon University, is in the process of creating Andrew, a prototype computing and communication system for universities. This article traces the origins of Andrew, discusses its goals and strategies, and gives an overview of the current status of its implementation and usage.

...read moreread less

701 citations

Additional excerpts

...[3]....
[...]

Journal Article•DOI•

A trace-driven analysis of the UNIX 4.2 BSD file system

[...]

John Ousterhout¹, Hervé Da Costa¹, David G. Harrison¹, John Kunze¹, Mike Kupfer¹, James G. Thompson¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

01 Dec 1985

TL;DR: The UNIX 4.2BSD file system is analyzed by recording activity in trace files and writing programs to analyze the traces, and a simulator that uses the traces to predict the performance of caches for disk blocks is written.

...read moreread less

Abstract: : We analyzed the UNIX 4.2BSD file system by recording activity in trace files and writing programs to analyze the traces. The trace analysis shows that the average file system bandwidth needed per user is low (a few hundred bytes per second). Most of the files accessed are short, are open a short time, and are accessed sequentially. Most new information is deleted or overwritten within a few minutes of its creation. We wrote a simulator that uses the traces to predict the performance of caches for disk blocks. The moderate-sized caches used in UNIX reduce disk traffic by about 50%, but larger caches (several megabytes) can achieve much greater reductions, eliminating 90% or more of all disk traffic. With those large caches, large block sizes (16 kbytes or more) result in the fewest disk accesses.

...read moreread less

535 citations

"Scale and performance in a distribu..." refers background in this paper

...-The study by Ousterhout et al. [ 4 ] has shown that most tiles in a 4.2BSD environment are read in their entirety....
[...]

Journal Article•DOI•

The LOCUS distributed operating system

[...]

Bruce E. Walker¹, Gerald J. Popek¹, Robert M. English¹, Charles S. Kline¹, Greg Thiel¹ - Show less +1 more•Institutions (1)

University of California, Los Angeles¹

10 Oct 1983

TL;DR: The complete system architecture is outlined in this paper, and that experience in its use has been summarized.

...read moreread less

Abstract: LOCUS is a distributed operating system which supports transparent access to data through a network wide filesystem, permits automatic replication of storage, supports transparent distributed process execution, supplies a number of high reliability functions such as nested transactions, and is upward compatible with Unix. Partitioned operation of subnet's and their dynamic merge is also supported.The system has been operational for about two years at UCLA and extensive experience in its use has been obtained. The complete system architecture is outlined in this paper, and that experience is summarized.

...read moreread less

473 citations

"Scale and performance in a distribu..." refers background in this paper

...A number of distributed file systems such as Locus [12], IBIS [ll], and the Newcastle Connection [l], have been described in the research literature and surveyed by Svobodova [lo]....
[...]

Journal Article•DOI•

The ITC distributed file system: principles and design

[...]

Mahadev Satyanarayanan¹, John H. Howard¹, David A. Nichols¹, Robert N. Sidebotham¹, Alfred Z. Spector¹, Michael J. West¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

01 Dec 1985

TL;DR: This paper presents the design and rationale of a distributed file system for a network of more than 5000 personal computer workstations, with careful attention paid to the goals of location transparency, user mobility and compatibility with existing operating system interfaces.

...read moreread less

Abstract: This paper presents the design and rationale of a distributed file system for a network of more than 5000 personal computer workstations. While scale has been the dominant design influence, careful attention has also been paid to the goals of location transparency, user mobility and compatibility with existing operating system interfaces. Security is an important design consideration, and the mechanisms for it do not assume that the workstations or the network are secure. Caching of entire files at workstations is a key element in this design. A prototype of this system has been built and is in use by a user community of about 400 individuals. A refined implementation that will scale more gracefully and provide better performance is close to completion.

...read moreread less

298 citations