scispace - formally typeset
Proceedings ArticleDOI

Optimum algorithms for two random sampling problems

Jeffrey Scott Vitter
- pp 65-75
TLDR
Several fast new algorithms are presented for sampling n records at random from a file containing N records, one of which deals with sampling when N is known, and the other considers the caseWhen N is unknown.
Abstract
Several fast new algorithms are presented for sampling n records at random from a file containing N records. The first problem we solve deals with sampling when N is known, and the the second problem considers the case when N is unknown. The two main results in this paper are Algorithms D and Z. Algorithm D solves the first problem by doing the sampling with a small constant amount of space and in O(n) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations are performed during the sampling The sample is selected sequentially and online; it answers an open problem in [Knuth 81]. Algorithm Z solves the second problem by doing the sampling using O(n) space, roughly n ln(N/n) uniform random variates and O(n(1 + log(N/n))) time, on the average. Both algorithms are time- and space-optimum and are short and easy to implement.

read more

Citations
More filters
Journal ArticleDOI

Random sampling with a reservoir

TL;DR: Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin, and an efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use.
Book ChapterDOI

Frequency Estimation of Internet Packet Streams with Limited Space

TL;DR: In this article, the authors consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream and present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which is best possible in the worst case.
Journal ArticleDOI

New applications of random sampling in computational geometry

TL;DR: This paper gives several new demonstrations of the usefulness of random sampling techniques in computational geometry by creating a search structure for arrangements of hyperplanes by sampling the hyperplanes and using information from the resulting arrangement to divide and conquer.
Journal ArticleDOI

Faster methods for random sampling

TL;DR: The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O(n) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations are performed during the sampling.
Patent

Method and apparatus determining and using hash functions and hash values

David Cossock
TL;DR: In this paper, the hash functions are created using linear arithmetic and 4-byte machine register operations and thus can be created very quickly, and can be used to determine and use nearly uniform independent hash functions.
References
More filters
Journal ArticleDOI

Random sampling with a reservoir

TL;DR: Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin, and an efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use.
Journal ArticleDOI

Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers

TL;DR: Sequential selection techniques are presented which can be utilized to determine items for inclusion in samples as if these items were selected by classical, non-sequential, sampling plans.
Journal ArticleDOI

Faster methods for random sampling

TL;DR: The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O(n) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations are performed during the sampling.
Journal ArticleDOI

The Design and Analysis of BucketSort for Bubble Memory Secondary Storage

TL;DR: A hypothetical Bucket-Sort implementation that uses bubble memory is described and a new software marking technique is introduced that reduces the effective time for an associative search.
Journal ArticleDOI

A note on sampling a tape-file

TL;DR: A means of obtaining a random sample of n integers r1, r2, …,rn selected from the N integers 1, 2, … , N is available.
Related Papers (5)