Home
/
Authors
/
David F. Nagle

Author

David F. Nagle

Other affiliations: University of Michigan

Bio: David F. Nagle is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Scalability & Mass storage. The author has an hindex of 23, co-authored 41 publications receiving 2607 citations. Previous affiliations of David F. Nagle include University of Michigan.

Topics: Scalability, Mass storage, Translation lookaside buffer, SCSI, Object storage ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A cost-effective, high-bandwidth storage architecture

[...]

Garth A. Gibson¹, David F. Nagle¹, Khalil Amiri¹, Jeff Butler¹, Fay W. Chang¹, Howard Gobioff¹, Charles Hardin¹, Erik Riedel¹, David Rochberg¹, Jim Zelenka¹ - Show less +6 more•Institutions (1)

Carnegie Mellon University¹

01 Oct 1998

TL;DR: Measurements of the prototype NASD system show that these services can be cost-effectively integrated into a next generation disk drive ASK, and show scaluble bandwidth for NASD-specialized filesystems.

...read moreread less

Abstract: This paper describes the Network-Attached Secure Disk (NASD) storage architecture, prototype implementations oj NASD drives, array management for our architecture, and three, filesystems built on our prototype. NASD provides scalable storage bandwidth without the cost of servers used primarily, for transferring data from peripheral networks (e.g. SCSI) to client networks (e.g. ethernet). Increasing datuset sizes, new attachment technologies, the convergence of peripheral and interprocessor switched networks, and the increased availability of on-drive transistors motivate and enable this new architecture. NASD is based on four main principles: direct transfer to clients, secure interfaces via cryptographic support, asynchronous non-critical-path oversight, and variably-sized data objects. Measurements of our prototype system show that these services can be cost-effectively integrated into a next generation disk drive ASK. End-to-end measurements of our prototype drive andfilesysterns suggest that NASD cun support conventional distributed filesystems without performance degradation. More importantly, we show scaluble bandwidth for NASD-specialized filesystems. Using a parallel data mining application, NASD drives deliver u linear scaling of 6.2 MB/s per clientdrive pair, tested with up to eight pairs in our lab.

...read moreread less

424 citations

Proceedings Article•DOI•

File server scaling with network-attached secure disks

[...]

Garth A. Gibson¹, David F. Nagle¹, Khalil Amiri¹, Fay W. Chang¹, Eugene M. Feinberg¹, Howard Gobioff¹, Chen Lee¹, Berend Ozceri¹, Erik Riedel¹, David Rochberg¹, Jim Zelenka¹ - Show less +7 more•Institutions (1)

Carnegie Mellon University¹

01 Jun 1997

TL;DR: An analytic model and replay experiments suggest that NetSCSI can reduce file server load during a burst of NFS or AFS activity by about 30% and with the NASD architecture, server load can be reduced by a factor of up to five for AFS and up to ten for NFS.

...read moreread less

Abstract: By providing direct data transfer between storage and client, network-attached storage devices have the potential to improve scalability for existing distributed file systems (by removing the server as a bottleneck) and bandwidth for new parallel and distributed file systems (through network striping and more efficient data paths). Together, these advantages influence a large enough fraction of the storage market to make commodity network-attached storage feasible. Realizing the technology's full potential requires careful consideration across a wide range of file system, networking and security issues. This paper contrasts two network-attached storage architectures---(1) Networked SCSI disks (NetSCSI) are network-attached storage devices with minimal changes from the familiar SCSI interface, while (2) Network-Attached Secure Disks (NASD) are drives that support independent client access to drive object services. To estimate the potential performance benefits of these architectures, we develop an analytic model and perform trace-driven replay experiments based on AFS and NFS traces. Our results suggest that NetSCSI can reduce file server load during a burst of NFS or AFS activity by about 30%. With the NASD architecture, server load (during burst activity) can be reduced by a factor of up to five for AFS and up to ten for NFS.

...read moreread less

312 citations

Journal Article•DOI•

Active disks for large-scale data processing

[...]

Erik Riedel¹, Christos Faloutsos², Garth A. Gibson², David F. Nagle²•Institutions (2)

Hewlett-Packard¹, Carnegie Mellon University²

01 Jun 2001-IEEE Computer

TL;DR: This work proposes using an active disk storage device that combines on-drive processing and memory with software downloadability to allow disks to execute application-level functions directly at the device.

...read moreread less

Abstract: As processor performance increases and memory cost decreases, system intelligence continues to move away from the CPU and into peripherals. Storage system designers use this trend toward excess computing power to perform more complex processing and optimizations inside storage devices. To date, such optimizations take place at relatively low levels of the storage protocol. Trends in storage density, mechanics, and electronics eliminate the hardware bottleneck and put pressure on interconnects and hosts to move data more efficiently. We propose using an active disk storage device that combines on-drive processing and memory with software downloadability to allow disks to execute application-level functions directly at the device. Moving portions of an application's processing to a storage device significantly reduces data traffic and leverages the parallelism already present in large systems, dramatically reducing the execution time for many basic data mining tasks.

...read moreread less

254 citations

Journal Article•DOI•

Reducing power by optimizing the necessary precision/range of floating-point arithmetic

[...]

J.Y.F. Tong¹, David F. Nagle¹, Rob A. Rutenbar¹•Institutions (1)

Carnegie Mellon University¹

01 Jun 2000-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper explores ways of reducing FP power consumption by minimizing the bitwidth representation of FP data by showing that up to 66% reduction in multiplier energy/operation can be achieved in the FP unit by this bitwidth reduction technique without sacrificing any program accuracy.

...read moreread less

Abstract: Low-power systems often find the power cost of floating-point (FP) hardware prohibitively expensive. This paper explores ways of reducing FP power consumption by minimizing the bitwidth representation of FP data. Analysis of several FP programs that manipulate low-resolution human sensory data shows that these programs suffer no loss of accuracy even with a significant reduction in bitwidth. Most FP programs in our benchmark suite maintain the same output even when the mantissa bitwidth is reduced by half. This FP bitwidth reduction can deliver a significant power saving through the use of a variable bitwidth FP unit. Our results show that up to 66% reduction in multiplier energy/operation can be achieved in the FP unit by this bitwidth reduction technique without sacrificing any program accuracy.

...read moreread less

168 citations

Proceedings Article•DOI•

Towards higher disk head utilization: extracting free bandwidth from busy disk drives

[...]

Christopher R. Lumb¹, Jiri Schindler¹, Gregory R. Ganger¹, David F. Nagle¹, Erik Riedel² - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Hewlett-Packard²

22 Oct 2000

TL;DR: Freeblock scheduling as mentioned in this paper is a new approach to utilize more of a disk's potential media bandwidth by filling rotational latency periods with useful media transfers with no effect on foreground response times.

...read moreread less

Abstract: Freeblock scheduling is a new approach to utilizing more of a disk's potential media bandwidth. By filling rotational latency periods with useful media transfers, 20-50% of a never-idle disk's bandwidth can often be provided to background applications with no effect on foreground response times. This paper describes freeblock scheduling and demonstrates its value with simulation studies of two concrete applications: segment cleaning and data mining. Free segment cleaning often allows an LFS file system to maintain its ideal write performance when cleaning overheads would otherwise reduce performance by up to a factor of three. Free data mining can achieve over 47 full disk scans per day on an active transaction processing system, with no effect on its disk performance.

...read moreread less

166 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Journal Article•DOI•

The Google file system

[...]

Sanjay Ghemawat¹, Howard Gobioff¹, Shun-Tak Albert Leung¹•Institutions (1)

Google¹

19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

...read moreread less

5,429 citations

Proceedings Article•DOI•

Ceph: a scalable, high-performance distributed file system

[...]

Sage A. Weil¹, Scott A. Brandt¹, Ethan L. Miller¹, Darrell D. E. Long¹, Carlos Maltzahn¹ - Show less +1 more•Institutions (1)

University of California, Santa Cruz¹

06 Nov 2006

TL;DR: Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.

...read moreread less

Abstract: We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.

...read moreread less

1,621 citations

Patent•

Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner

[...]

Eric C. Peters, Stanley Rabinowitz, Herbert R. Jacobs, Peter J. Fasciano

21 Dec 1998

TL;DR: In this article, the data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored.

...read moreread less

Abstract: Multiple applications request data from multiple storage units over a computer network. The data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored. At least one additional copy of each segment also is distributed randomly over the storage units, such that each segment is stored on at least two storage units. This random distribution of multiple copies of segments of data improves both scalability and reliability. When an application requests a selected segment of data, the request is processed by the storage unit with the shortest queue of requests. Random fluctuations in the load applied by multiple applications on multiple storage units are balanced nearly equally over all of the storage units. This combination of techniques results in a system which can transfer multiple, independent high-bandwidth streams of data in a scalable manner in both directions between multiple applications and multiple storage units.

...read moreread less

1,427 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse