Home
/
Authors
/
Frank Dabek

Author

Frank Dabek

Other affiliations: Google

Bio: Frank Dabek is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Distributed hash table & Chord (peer-to-peer). The author has an hindex of 17, co-authored 18 publications receiving 9287 citations. Previous affiliations of Frank Dabek include Google.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Chord: a scalable peer-to-peer lookup protocol for Internet applications

[...]

Ion Stoica¹, Robert Morris², David Liben-Nowell², David R. Karger², M. Frans Kaashoek², Frank Dabek², Hari Balakrishnan² - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

01 Feb 2003-IEEE ACM Transactions on Networking

TL;DR: Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.

...read moreread less

Abstract: A fundamental problem that confronts peer-to-peer applications is the efficient location of the node that stores a desired data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.

...read moreread less

3,518 citations

Proceedings Article•DOI•

Wide-area cooperative storage with CFS

[...]

Frank Dabek¹, M. Frans Kaashoek¹, David R. Karger¹, Robert Morris¹, Ion Stoica² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

21 Oct 2001

TL;DR: The Cooperative File System is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval with a completely decentralized architecture that can scale to large systems.

...read moreread less

Abstract: The Cooperative File System (CFS) is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers provide a distributed hash table (DHash) for block storage. CFS clients interpret DHash blocks as a file system. DHash distributes and caches blocks at a fine granularity to achieve load balance, uses replication for robustness, and decreases latency with server selection. DHash finds blocks using the Chord location protocol, which operates in time logarithmic in the number of servers.CFS is implemented using the SFS file system toolkit and runs on Linux, OpenBSD, and FreeBSD. Experience on a globally deployed prototype shows that CFS delivers data to clients as fast as FTP. Controlled tests show that CFS is scalable: with 4,096 servers, looking up a block of data involves contacting only seven servers. The tests also demonstrate nearly perfect robustness and unimpaired performance even when as many as half the servers fail.

...read moreread less

1,733 citations

Proceedings Article•DOI•

Vivaldi: a decentralized network coordinate system

[...]

Frank Dabek¹, Russ Cox¹, M. Frans Kaashoek¹, Robert Morris¹•Institutions (1)

Massachusetts Institute of Technology¹

30 Aug 2004

TL;DR: Vivaldi is a simple, light-weight algorithm that assigns synthetic coordinates to hosts such that the distance between the coordinates of two hosts accurately predicts the communication latency between the hosts.

...read moreread less

Abstract: Large-scale Internet applications can benefit from an ability to predict round-trip times to other hosts without having to contact them first. Explicit measurements are often unattractive because the cost of measurement can outweigh the benefits of exploiting proximity information. Vivaldi is a simple, light-weight algorithm that assigns synthetic coordinates to hosts such that the distance between the coordinates of two hosts accurately predicts the communication latency between the hosts. Vivaldi is fully distributed, requiring no fixed network infrastructure and no distinguished hosts. It is also efficient: a new host can compute good coordinates for itself after collecting latency information from only a few other hosts. Because it requires little com-munication, Vivaldi can piggy-back on the communication patterns of the application using it and scale to a large number of hosts. An evaluation of Vivaldi using a simulated network whose latencies are based on measurements among 1740 Internet hosts shows that a 2-dimensional Euclidean model with height vectors embeds these hosts with low error (the median relative error in round-trip time prediction is 11 percent).

...read moreread less

1,233 citations

Book Chapter•DOI•

Towards a Common API for Structured Peer-to-Peer Overlays

[...]

Frank Dabek¹, Ben Y. Zhao², Peter Druschel³, John Kubiatowicz², Ion Stoica² - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, University of California², Rice University³

21 Feb 2003

TL;DR: An ongoing effort to define common APIs for structured peer-to-peer overlays and the key abstractions that can be built on them is described to facilitate independent innovation in overlay protocols, services, and applications, to allow direct experimental comparisons, and to encourage application development by third parties.

...read moreread less

Abstract: In this paper, we describe an ongoing effort to define common APIs for structured peer-to-peer overlays and the key abstractions that can be built on them. In doing so, we hope to facilitate independent innovation in overlay protocols, services, and applications, to allow direct experimental comparisons, and to encourage application development by third parties. We provide a snapshot of our efforts and discuss open problems in an effort to solicit feedback from the research community.

...read moreread less

578 citations

Proceedings Article•DOI•

Large-scale incremental processing using distributed transactions and notifications

[...]

Daniel Peng¹, Frank Dabek¹•Institutions (1)

Google¹

04 Oct 2010

TL;DR: Percolator is built, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index, which processes the same number of documents per day while reducing the average age of documents in Google search results by 50%.

...read moreread less

Abstract: Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google's indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

...read moreread less

530 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Chord: A scalable peer-to-peer lookup service for internet applications

[...]

Ion Stoica¹, Robert Morris², David R. Karger², M. Frans Kaashoek², Hari Balakrishnan² - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

27 Aug 2001

TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

...read moreread less

Abstract: A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

...read moreread less

10,286 citations

Book Chapter•DOI•

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

[...]

Antony Rowstron¹, Peter Druschel²•Institutions (2)

Microsoft¹, Rice University²

12 Nov 2001-Lecture Notes in Computer Science

TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.

...read moreread less

Abstract: This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications. Pastry performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry's scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.

...read moreread less

7,423 citations

Book Chapter•DOI•

The Sybil Attack

[...]

John R. Douceur¹•Institutions (1)

Microsoft¹

07 Mar 2002

TL;DR: It is shown that, without a logically centralized authority, Sybil attacks are always possible except under extreme and unrealistic assumptions of resource parity and coordination among entities.

...read moreread less

Abstract: Large-scale peer-to-peer systems face security threats from faulty or hostile remote computing elements. To resist these threats, many such systems employ redundancy. However, if a single faulty entity can present multiple identities, it can control a substantial fraction of the system, thereby undermining this redundancy. One approach to preventing these "Sybil attacks" is to have a trusted agency certify identities. This paper shows that, without a logically centralized authority, Sybil attacks are always possible except under extreme and unrealistic assumptions of resource parity and coordination among entities.

...read moreread less

4,816 citations

Proceedings Article•

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

[...]

Matei Zaharia¹, Mosharaf Chowdhury¹, Tathagata Das¹, Ankur Dave¹, Justin Ma¹, Murphy McCauley¹, Michael J. Franklin¹, Scott Shenker¹, Ion Stoica¹ - Show less +5 more•Institutions (1)

University of California, Berkeley¹

25 Apr 2012

TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.

...read moreread less

Abstract: We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.

...read moreread less

4,151 citations

Report•DOI•

Tor: the second-generation onion router

[...]

Roger Dingledine, Nick Mathewson, Paul Syverson¹•Institutions (1)

United States Naval Research Laboratory¹

13 Aug 2004

TL;DR: This second-generation Onion Routing system addresses limitations in the original design by adding perfect forward secrecy, congestion control, directory servers, integrity checking, configurable exit policies, and a practical design for location-hidden services via rendezvous points.

...read moreread less

Abstract: We present Tor, a circuit-based low-latency anonymous communication service. This second-generation Onion Routing system addresses limitations in the original design by adding perfect forward secrecy, congestion control, directory servers, integrity checking, configurable exit policies, and a practical design for location-hidden services via rendezvous points. Tor works on the real-world Internet, requires no special privileges or kernel modifications, requires little synchronization or coordination between nodes, and provides a reasonable tradeoff between anonymity, usability, and efficiency. We briefly describe our experiences with an international network of more than 30 nodes. We close with a list of open problems in anonymous communication.

...read moreread less

3,960 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse