Home
/
Authors
/
Frank B. Schmuck

Author

Frank B. Schmuck

Bio: Frank B. Schmuck is an academic researcher from IBM. The author has contributed to research in topics: File system & Stub file. The author has an hindex of 40, co-authored 120 publications receiving 5981 citations.

Topics: File system, Stub file, Versioning file system, Unix file types, Computer file ...read more

Papers published on a yearly basis

2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1995
1991
1990

Papers

PDF

Open Access

More filters

Proceedings Article•

GPFS: A Shared-Disk File System for Large Computing Clusters

[...]

Frank B. Schmuck¹, Roger L. Haskin¹•Institutions (1)

IBM¹

28 Jan 2002

TL;DR: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

...read moreread less

Abstract: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

...read moreread less

1,434 citations

Patent•

Providing a snapshot of a subset of a file system

[...]

Wayne A. Sawdon¹, Frank B. Schmuck¹•Institutions (1)

IBM¹

10 Jan 2003

TL;DR: In this article, a first snapshot of a first set of source files in a file system is generated and stored in each inode is a first identifier associated with the first set and a second identifier associated to the time of the first snapshot.

...read moreread less

Abstract: A system, method and computer readable medium for providing a snapshot of a subset of a file system. A first snapshot of a first set of source files in a file system is generated. The first snapshot includes an inode corresponding to each source file in the first set of files. Stored in each inode is a first identifier associated with the first set of files and a second identifier associated with the time of the first snapshot. Next, a second snapshot of a second set of source files is taken. The second snapshot includes an inode corresponding to each source file in the second set of files. Stored in each inode are a first identifier and a second identifier. Subsequent snapshots are taken every first period and every second period for the first set of files and the second set of files, respectively.

...read moreread less

283 citations

Patent•

Parallel file system with method using tokens for locking modes

[...]

Frank B. Schmuck¹, Anthony J. Zlotek¹, Boaz Shmueli¹, Benjamin Mandler¹, Zvi Yehudai¹, William A. Kish¹ - Show less +2 more•Institutions (1)

IBM¹

11 Jul 1997

TL;DR: In this paper, a shared disk file system running on multiple computers each having their own instance of an operating system is coupled for parallel data sharing access to files residing on network attached shared disks.

...read moreread less

Abstract: A computer system having a shared disk file system running on multiple computers each having their own instance of an operating system and being coupled for parallel data sharing access to files residing on network attached shared disks. A metadata node manages file metadata for parallel read and write actions. Metadata tokens are used for controlled access to the metadata and initial selection and changing of the metadata node.

...read moreread less

237 citations

Patent•

Parallel file system and method with extensible hashing

[...]

Frank B. Schmuck¹, James C. Wyllie¹, Thomas Eugene Engelsiepen¹•Institutions (1)

IBM¹

11 Jul 1997

TL;DR: In this article, the authors describe a parallel file system in a shared disk environment using scalable directory service method improvements to caching and cache performance developments balance pools for multiple accesses, where a metadata node manages file metadata, and locking techniques reduce the overhead of a token manager which is also used in the file system recovery if a computer participating in the management of shared disks becomes unavailable or failed.

...read moreread less

Abstract: A computer system having a shared parallel disk file system running on a network for multiple computers each having their own instance of an operating system and with a protocol that makes disks appear to be locally attached to each file system. This parallel file system in a shared disk environment uses scalable directory service method improvements to caching and cache performance developments balance pools for multiple accesses. A metadata node manages file metadata, and locking techniques reduce the overhead of a token manager which is also used in the file system recovery if a computer participating in the management of shared disks becomes unavailable or failed. Synchronous and asynchronous takeover of a metadata node occurs for correction of metadata which was under modification and a new computer node to be a metadata node for that file. Locks are not constantly required to allocate new blocks on behalf of a user. Hash buckets are used and each hash bucket is stored in a sparse file at an offset given as i*s, where i is the hash bucket number and s is the hash bucket size, an where a directory starts out as an empty file, where the file size increases to the size where it needs to be split by inserting records, and wherein upon a split, an additional bucket is written increasing the file size from s to 2*s upon the first split. Lookup operations are performed with a step of computing the hash value of the key being looked up, as well as a hash tree depth as log-base-2 of the file size divided by hash bucket size, and with compute steps also computed for an insert operation.

...read moreread less

227 citations

Patent•

Deferred copy-on-write of a snapshot

[...]

Wayne A. Sawdon¹, Frank B. Schmuck¹•Institutions (1)

IBM¹

15 Feb 2002

TL;DR: In this paper, a system, method and computer readable medium for deferring copy-on-write of a snapshot is disclosed, which includes the generation of snapshot of a source file.

...read moreread less

Abstract: A system, method and computer readable medium for deferring copy-on-write of a snapshot is disclosed. The method includes the generation of snapshot of a source file. Upon modification of a first data block referenced by the source file, the first data block is referenced by the snapshot and a second data block is allocated for the source file. Then, a first variable associated with the source file is set to a value indicating an incomplete source file data block and a second variable associated with the source file is set to a value indicating the valid portion of the second data block. Any portion of the second data block that is overwritten is considered valid. The second data block is then modified and the second variable is changed to reflect the modification. Upon reception of a read request, the corresponding portion of the second data block is retrieved.

...read moreread less

224 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Google file system

[...]

Sanjay Ghemawat¹, Howard Gobioff¹, Shun-Tak Albert Leung¹•Institutions (1)

Google¹

19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

...read moreread less

5,429 citations

Journal Article•DOI•

A scalable, commodity data center network architecture

[...]

Mohammad Al-Fares¹, Alexander Loukissas¹, Amin Vahdat¹•Institutions (1)

University of California, San Diego¹

17 Aug 2008

TL;DR: This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.

...read moreread less

Abstract: Today's data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance.In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

...read moreread less

3,549 citations

Proceedings Article•DOI•

Ceph: a scalable, high-performance distributed file system

[...]

Sage A. Weil¹, Scott A. Brandt¹, Ethan L. Miller¹, Darrell D. E. Long¹, Carlos Maltzahn¹ - Show less +1 more•Institutions (1)

University of California, Santa Cruz¹

06 Nov 2006

TL;DR: Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.

...read moreread less

Abstract: We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.

...read moreread less

1,621 citations

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Proceedings Article•

GPFS: A Shared-Disk File System for Large Computing Clusters

[...]

Frank B. Schmuck¹, Roger L. Haskin¹•Institutions (1)

IBM¹

28 Jan 2002

...read moreread less

1,434 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse