scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Average case analysis of heap building by repeated insertion

02 Jan 1991-Journal of Algorithms (Academic Press, Inc.)-Vol. 12, Iss: 1, pp 126-153
TL;DR: It is shown that the average number of swaps required to construct a heap on n keys by Williams’ method of repeated insertion is (! + o(1))n, where the constant ! is about 1.3.
About: This article is published in Journal of Algorithms.The article was published on 1991-01-02 and is currently open access. It has received 19 citations till now. The article focuses on the topics: Heap (data structure).

Summary (1 min read)

1 Introduction

  • The heap is a much used and much studied data structure (for example, see Knuth [K]).
  • The key observation of this section, Observation 2.1, is the cornerstone of their analysis.

2 Qualitative results for expected times

  • And give some results which will be used later to establish bounds on ω.the authors.
  • Another is to assume that the keys are n independent random variables, each uniformly distributed on the interval [0,1].
  • The authors have now completed part (a) of the proof of Theorem 1.3.
  • Let R(n) denote the rank of the nth key among the first n keys.

5 Probability bounds

  • In this section the authors shall prove part (c) of Theorem 1.3.
  • The authors follow roughly the route taken in section 2 for investigating E[Wn].
  • Finally, the authors may use the half of the lemma already proved to handle the sum above, and one further application of inequality (5.1) to handle the last term.

6 Repeated insertion into equi-probable heaps

  • The authors consider the simplified approximation of the average case behaviour of Williams’ heap construction in which they repeatedly insert a uniform random key into a uniform random heap.
  • The main result of this section is the following.
  • The proof of part (c) will then follow from Hoeffding’s inequality (5.1).
  • Proof Continuing to mimic previous notation, the authors define x̃k as the aver- age of the expected insertion costs along the kth level Lk, namely ∑2k+1−1 t=2k Ĩ(t)/2k. (See also Lemma 2.6.).

7 Concluding remarks

  • The authors have described fairly precisely the average case behaviour of Williams’ method of constructing a heap by repeated insertion.
  • The most interesting related open problem is the average case analysis of heapsort (with any of its trickledown variants).
  • While some empirical data (e.g. see [K]) and partial theoretical results (e.g. [C]) are known, the problem of determining average numbers of comparisons is open.
  • The following series of tables shows the average expected values of the levels of a heap built by Williams’ algorithm, for perfect heaps of sizes 3 to 4095.

Did you find this useful? Give us your feedback

Citations
More filters
05 Mar 2013
TL;DR: For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. as discussed by the authors introduces the basic concepts in the design and analysis of randomized algorithms and provides a comprehensive and representative selection of the algorithms that might be used in each of these areas.
Abstract: For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. This book introduces the basic concepts in the design and analysis of randomized algorithms. The first part of the text presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications. Algorithmic examples are also given to illustrate the use of each tool in a concrete setting. In the second part of the book, each chapter focuses on an important area to which randomized algorithms can be applied, providing a comprehensive and representative selection of the algorithms that might be used in each of these areas. Although written primarily as a text for advanced undergraduates and graduate students, this book should also prove invaluable as a reference for professionals and researchers.

785 citations

Book ChapterDOI
TL;DR: The experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods.
Abstract: We present an efficient implementation of a write-only top-down construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8:5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated not before it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods.

104 citations

Book ChapterDOI
TL;DR: Analysis of the behaviour of three methods for constructing a binary heap shows that, under reasonable assumptions, repeated insertion and layerwise construction both incur at most cN/B cache misses, whereas repeated merging, as programmed by Floyd, can incur more than (dN log2 B)/B caches misses.
Abstract: The behaviour of three methods for constructing a binary heap is studied. The methods considered are the original one proposed by Williams [1964], in which elements are repeatedly inserted into a single heap; the improvement by Floyd [1964], in which small heaps are repeatedly merged to bigger heaps; and a recent method proposed, e. g., by Fadel et al. [1999] in which a heap is built layerwise. Both the worst-case number of instructions and that of cache misses are analysed. It is well-known that Floyd's method has the best instruction count. Let N denote the size of the heap to be constructed, B the number of elements that fit into a cache line, and let c and d be some positive constants. Our analysis shows that, under reasonable assumptions, repeated insertion and layerwise construction both incur at most cN/B cache misses, whereas repeated merging, as programmed by Floyd, can incur more than (dN log2 B)/B cache misses. However, for a memory-tuned version of repeated merging the number of cache misses incurred is close to the optimal bound N/B.

17 citations


Cites background from "Average case analysis of heap build..."

  • ...There are two reasonsfor this:(1) In the average case the number of instructions executed by Williams' programis linear [Hayward and McDiarmid 1991], which is guaranteed in the worst caseby Floyd's program....

    [...]

Book
24 Sep 1999
TL;DR: Experiments with List Ranking for Explicit Multi-Threaded Instruction Parallelism and Evaluation of an Algorithm for the Transversal Hypergraph Problem.
Abstract: Invited Lectures.- Selecting Problems for Algorithm Evaluation.- BSP Algorithms - "Write Once, Run Anywhere".- Ten Years of LEDA: Some Thoughts.- Contributed Papers.- Computing the K Shortest Paths: A New Algorithm and an Experimental Comparison.- Efficient Implementation of Lazy Suffix Trees.- Experiments with List Ranking for Explicit Multi-Threaded (XMT) Instruction Parallelism.- Finding Minimum Congestion Spanning Trees.- Evaluation of an Algorithm for the Transversal Hypergraph Problem.- Construction Heuristics and Domination Analysis for the Asymmetric TSP.- Counting in Mobile Networks: Theory and Experimentation.- Dijkstra's Algorithm On-Line: An Empirical Case Study from Public Railroad Transport.- Implementation and Experimental Evaluation of Graph Connectivity Algorithms Using LEDA.- On-Line Zone Construction in Arrangements of Lines in the Plane.- The Design and Implementation of Planar Maps in CGAL.- An Easy to Use Implementation of Linear Perturbations within Cupgal.- Analysing Cache Effects in Distribution Sorting.- Fast Regular Expression Search.- An Experimental Evaluation of Hybrid Data Structures for Searching.- LEDA-SM: Extending LEDA to Secondary Memory.- A Priority Queue Transform.- Implementation Issues and Experimental Study of a Wavelength Routing Algorithm for Irregular All-Optical Networks.- Estimating Large Distances in Phylogenetic Reconstruction.- The Performance of Concurrent Red-Black Tree Algorithms.- Performance Engineering Case Study: Heap Construction.- A Fast and Simple Local Search for Graph Coloring.- BALL: Biochemical Algorithms Library.- An Experimental Study of Priority Queues in External Memory.

17 citations

Book ChapterDOI
01 Jan 2002
TL;DR: A number of general (not restricting to special subsequences) asymptotic results are presented that give insight on the difficulties encountered in the asymPTotic study of the number of heaps of a given size and of the cost of heap construction.
Abstract: Heaps constitute a well-known data structure allowing the implementation of an efficient O(n log n) sorting algorithm as well as the design of fast priority queues. Although heaps have been known for long, their combinatorial properties are still partially worked out: exact summation formulae have been stated, but most of the asymptotic behaviors are still unknown. In this paper, we present a number of general (not restricting to special subsequences) asymptotic results that give insight on the difficulties encountered in the asymptotic study of the number of heaps of a given size and of the cost of heap construction. In particular, we exhibit the influence of arithmetic functions in the apparently chaotic behavior of these quantities and study their extremal and average properties. It is also shown that the distribution function of the cost of heap construction using Floyd’s algorithm and other variants is asymptotically normal.

14 citations


Cites background or methods from "Average case analysis of heap build..."

  • ...In particular, this rule applies to the heap construction algorithms in [3, 19, 12, 29], the basic ideas of improvement being more or less due to Floyd....

    [...]

  • ...The average case analysis of its behavior is more difficult; see [2, 8, 12]....

    [...]

References
More filters
Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations


"Average case analysis of heap build..." refers methods in this paper

  • ...Knuth [K] (see also Doberkat [Do]) shows that the expected number of comparisons for the basic method is about 1....

    [...]

  • ...The heap is a much used and much studied data structure (for example, see Knuth [K])....

    [...]

  • ...Floyd’s method of heap building (see Floyd [F] or Knuth [K]) involves repeatedly merging small heaps to form bigger heaps....

    [...]

  • ...see [K]) and partial theoretical results (e....

    [...]

Book ChapterDOI
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Abstract: Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.

8,655 citations

Book
01 Jan 1973
TL;DR: The first revision of this third volume is a survey of classical computer techniques for sorting and searching that extends the treatment of data structures to consider both large and small databases and internal and external memories.
Abstract: The first revision of this third volume is a survey of classical computer techniques for sorting and searching. It extends the treatment of data structures in Volume 1 to consider both large and small databases and internal and external memories.

1,716 citations

Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "Average case analysis of heap building by repeated insertion" ?

The heap is a much used and much studied data structure ( for example, see Knuth [ K ] ). 

by storing the probability arrays P (n, i, ·) only for nodes i on IP+(n), it is possible to compute E[Wn] in only O(n lg n) space but O(n 3 lg n) time. 

Observe that during the insertion of some key with R(n) = k, a swap takes place at node i of the insertion path if and only if k ≤ An−1[i]. 

When bubbling up the nodes in Lk, the expected number of swaps along the two links incident with the root is ∑2k+1−1j=2k 1/j > ln 2. 

The rank of a number x in a set of numbers is its placement in the ordered set; thus the smallest number has rank 1, the next smallest rank 2, etc. Let An[i] be the rank of A[i] among A[1, . . . , n] after exactly n keys have been inserted. 

Heapsize 31level avg. exp. val.0 0.031250000000000 1 0.092036756202444 2 0.205202826266408 3 0.401927398975604 4 0.703027874420290 

Here and for the rest of this section, array indices are understood to be integers; any fraction x/y used an array index is actually ⌊x/y⌋. 

Hence for j ≥ k − k2 + 1,Prob{2−jBj,k > 2 j−k+2 + 2−s+1} ≤ (2 lg lg k) exp(−2kk2(lg k)8 ) .Now Σ2 ≤ ∑k−k1j=k−k2+1 2−jBj,k, soProb{Σ2 > 2 −k1+3 + (k2 − k1)2 −s+1} ≤ (k2 − k1)(2 lg lg k) exp(− 2kk2(lg k)8 ) . 

For each node t in level Lk (k > 0) let IP (t) = {⌊t/2j⌋ : j = 1, . . . , k} be the set of nodes on the insertion path from node t to the root. 

This method requires (2+o(1))n comparisons in the worst case to build a heap with n keys, which is less than Williams’ method takes on average.