scispace - formally typeset
Search or ask a question
Author

Burton H. Bloom

Bio: Burton H. Bloom is an academic researcher. The author has contributed to research in topics: Primary clustering & SWIFFT. The author has an hindex of 1, co-authored 1 publications receiving 6998 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Abstract: In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency.The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods.In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to “catch” the small fraction of errors associated with the new methods. An example is discussed which illustrates possible areas of application for the new methods.Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

7,390 citations


Cited by
More filters
Proceedings Article
01 Jan 2006
TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.
Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

4,843 citations

Amin Vahdat1
01 Jan 2000
TL;DR: This work introduces Epidemic Routing, where random pair-wise exchanges of messages among mobile hosts ensure eventual message delivery and achieves eventual delivery of 100% of messages with reasonable aggregate resource consumption in a number of interesting scenarios.
Abstract: Mobile ad hoc routing protocols allow nodes with wireless adaptors to communicate with one another without any pre-existing network infrastructure. Existing ad hoc routing protocols, while robust to rapidly changing network topology, assume the presence of a connected path from source to destination. Given power limitations, the advent of short-range wireless networks, and the wide physical conditions over which ad hoc networks must be deployed, in some scenarios it is likely that this assumption is invalid. In this work, we develop techniques to deliver messages in the case where there is never a connected path from source to destination or when a network partition exists at the time a message is originated. To this end, we introduce Epidemic Routing, where random pair-wise exchanges of messages among mobile hosts ensure eventual message delivery. The goals of Epidemic Routing are to: i) maximize message delivery rate, ii) minimize message latency, and iii) minimize the total resources consumed in message delivery. Through an implementation in the Monarch simulator, we show that Epidemic Routing achieves eventual delivery of 100% of messages with reasonable aggregate resource consumption in a number of interesting scenarios.

4,355 citations

Journal ArticleDOI
12 Nov 2000
TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Abstract: OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

3,376 citations

Journal ArticleDOI
TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.
Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

3,259 citations

Book
01 May 2011
TL;DR: The SPIN Model Checker as mentioned in this paper is used for both teaching software verification techniques, and for validating large scale applications, and it has been estimated that up to three-quarters of the $400 billion spent annually to hire programmers in the United States is ultimately spent on debugging.
Abstract: The SPIN Model Checker is used for both teaching software verification techniques, and for validating large scale applications. The growing number of users has created a need for a more comprehensive user guide and a standard reference manual that describes the most recent version of the tool. This book fills that need. SPIN is used in over 40 countries. The offical SPIN web site, spinroot.com receives between 2500 and 3000 hits per day. It has been estimated that up to three-quarters of the $400 billion spent annually to hire programmers in the United States is ultimately spent on debugging

2,530 citations