scispace - formally typeset
Search or ask a question
Author

David A. Nichols

Bio: David A. Nichols is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Network File System & Self-certifying File System. The author has an hindex of 5, co-authored 7 publications receiving 2716 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Observations of a prototype implementation are presented, changes in the areas of cache validation, server process structure, name translation, and low-level storage representation are motivated, and Andrews ability to scale gracefully is quantitatively demonstrated.
Abstract: The Andrew File System is a location-transparent distributed tile system that will eventually span more than 5000 workstations at Carnegie Mellon University. Large scale affects performance and complicates system operation. In this paper we present observations of a prototype implementation, motivate changes in the areas of cache validation, server process structure, name translation, and low-level storage representation, and quantitatively demonstrate Andrews ability to scale gracefully. We establish the importance of whole-file transfer and caching in Andrew by comparing its performance with that of Sun Microsystems NFS tile system. We also show how the aggregation of files into volumes improves the operability of the system.

1,604 citations

Journal ArticleDOI
01 Nov 1987
TL;DR: This paper examines the consequences of the design decision to transfer whole files between servers and workstations rather than some smaller unit such as records or blocks, as almost all other distributed file systems do, and compares the whole file transfer strategy with that of a block-oriented file system, Sun Microsystems' NFS.
Abstract: Andrew is a distributed computing environment being developed in a joint project by Carnegie Mellon University and IBM. One of the major components of Andrew is a distributed file system which constitutes underlying mechanism for sharing information. The goals of the Andrew file system are to support growth up to at least 7000 workstations (one for each student, faculty member, and staff at Carnegie Mellon) while providing users, application programs, and system administrators with the amenities of a shared file system.A fundamental result of our concern with scale is the design decision to transfer whole files between servers and workstations rather than some smaller unit such as records or blocks, as almost all other distributed file systems do. This paper examines the consequences of this and other design decisions and features that bear on the scalability of Andrew.Large scale affects a distributed system in two ways: it degrades performance and it complicates administration and day-to-day operation. This paper addresses both concerns and shows that the mechanisms we have incorporated cope with them successfully. We start the initial prototype of the system, what we learned from it, and how we changed the system to improve performance. We compare its performance with that of a block-oriented file system, Sun Microsystems' NFS, in order to evaluate the whole file transfer strategy. We then turn to operability, and finish with issues related peripherally to scale and with the ways the present design could be enchanced.

663 citations

Journal ArticleDOI
01 Dec 1985
TL;DR: This paper presents the design and rationale of a distributed file system for a network of more than 5000 personal computer workstations, with careful attention paid to the goals of location transparency, user mobility and compatibility with existing operating system interfaces.
Abstract: This paper presents the design and rationale of a distributed file system for a network of more than 5000 personal computer workstations. While scale has been the dominant design influence, careful attention has also been paid to the goals of location transparency, user mobility and compatibility with existing operating system interfaces. Security is an important design consideration, and the mechanisms for it do not assume that the workstations or the network are secure. Caching of entire files at workstations is a key element in this design. A prototype of this system has been built and is in use by a user community of about 400 individuals. A refined implementation that will scale more gracefully and provide better performance is close to completion.

298 citations

Journal ArticleDOI
01 Nov 1987
TL;DR: An application of the Butler system known as gypsy servers, which allow network server programs to be run on idle workstations instead of using dedicated server machines, is described.
Abstract: The Butler system is a set of programs running on Andrew workstations at CMU that give users access to idle workstations. Current Andrew users use the system over 300 times per day. This paper describes the implementation of the Butler system and tells of our experience in using it. In addition, it describes an application of the system known as gypsy servers, which allow network server programs to be run on idle workstations instead of using dedicated server machines.

151 citations

01 Jan 1989
TL;DR: The second part of the thesis examines the performance of a particular file system, the Andrew File System (AFS), developed at CMU and examines the effects of proposed changes to the system, such as the use of encryption during transmission of file data.
Abstract: The recent move to workstation-based computing environments has introduced a new point in the design space of multiprocessors: a loosely-coupled collection of workstations using a network file system for shared memory. One problem with such a system is managing the available workstations and making them available to clients on demand. The Butler system has been running at CMU for three years and is used hundreds of times daily to allow students and faculty to use idle workstations. I discovered that the system is used far more for interactive programs than expected. Surprisingly, security attacks involving the Butler system have been quite rare, despite the large student population among its users. A natural class of U scNIX applications that can take advantage of idle workstations includes programs consisting of multiple processes communicating via a shared file system. With such applications, the file system becomes a bottleneck for performance. The second part of the thesis examines the performance of a particular file system, the Andrew File System (AFS), developed at CMU. The major tool for the AFS performance analysis is a discrete-event simulation of the file server and its client workstations. The simulation's accuracy is verified by comparison with experiments run on the file system. Experiments show that the model's parameters can be used to construct a simple linear equation model of the server. While this model is not accurate under conditions when resources are nearing exhaustion, it is useful for a wide range of normal operation. Using the simulation, I estimate the effects of various parameters on AFS performance, such as network latency, CPU speed, and disk seek time. In addition, I examine the effects of proposed changes to the system, such as the use of encryption during transmission of file data. The simulation provides a number of insights about the operation of AFS. These include the fact that AFS is very CPU-limited, that it achieves respectable performance while using relatively slow communications primitives, and that it can handle a wide range of workloads without thrashing. The conclusions give more general observations about AFS and the process of constructing its simulator.

12 citations


Cited by
More filters
Journal ArticleDOI
19 Oct 2003
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

5,429 citations

Proceedings ArticleDOI
22 Feb 1999
TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.
Abstract: This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.

3,562 citations

Journal ArticleDOI
12 Nov 2000
TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Abstract: OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

3,376 citations

Proceedings ArticleDOI
13 Jun 1988
TL;DR: The design, implementation, and performance of the Condor scheduling system, which operates in a workstation environment, are presented and a performance profile of the system is presented that is based on data accumulated from 23 stations during one month.
Abstract: The design, implementation, and performance of the Condor scheduling system, which operates in a workstation environment, are presented. The system aims to maximize the utilization of workstations with as little interference as possible between the jobs it schedules and the activities of the people who own workstations. It identifies idle workstations and schedules background jobs on them. When the owner of a workstation resumes activity at a station, Condor checkpoints the remote job running on the station and transfers it to another workstation. The system guarantees that the job will eventually complete, and that very little, if any, work will be performed more than once. A performance profile of the system is presented that is based on data accumulated from 23 stations during one month. >

2,570 citations

Journal ArticleDOI
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

2,303 citations