scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Average case selection

01 Apr 1989-Journal of the ACM (ACM)-Vol. 36, Iss: 2, pp 270-279
TL;DR: It is shown that n + k - O comparisons are necessary, on average, to find the smallest of n numbers and this lower bound matches the behavior of the technique of Floyd and Rivest to within a lower-order term.
Abstract: It is shown that n + k - O(1) comparisons are necessary, on average, to find the kth smallest of n numbers (k l n/2). This lower bound matches the behavior of the technique of Floyd and Rivest to within a lower-order term. 7n/4 ± o(n) comparisons, on average, are shown to be necessary and sufficient to find the maximum and median of a set. An upper bound of 9n/4 ± o(n) and a lower bound of 2n - o(n) are shown for the max-min-median problem.
Citations
More filters
Proceedings ArticleDOI
TL;DR: The VAMSplit R-tree provided better overall performance than all competing structures the authors tested for main memory and secondary memory applications, and modest improvements relative to optimized k-d tree variants.
Abstract: Efficient indexing support is essential to allow content-based image and video databases using similarity-based retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6-100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized k-d tree we call the VAM k-d tree, and provide algorithms to create an optimized R-tree we call the VAMSplit R-tree. We found that the VAMSplit R-tree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*-tree and SS-tree in secondary memory applications, and modest improvements relative to optimized k-d tree variants.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

227 citations

Journal Article
TL;DR: The authors showed that the median of a set containing $n$ elements can always be found using at most $c \cdot n$ comparisons, where c < 2.95.
Abstract: Improving a long-standing result of Schonhage, Paterson, and Pippenger [ J. Comput. System Sci., 13 (1976), pp. 184--199] we show that the median of a set containing $n$ elements can always be found using at most $c \cdot n$ comparisons, where c<2.95.

94 citations

Proceedings ArticleDOI
24 Jun 1996
TL;DR: This work sort n general keys by using a partitioning scheme that achieves the requirements of efficiency and insensitivity against data skew and gives a precise worst-case estimation of the maximum imbalance which might occur.
Abstract: We present new BSP algorithms for deterministic sorting and rmdomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the step distance, which can be adapted to meet the worstcase requirements of our application). Although we employ sampling in order to realize efficiency, we can give a precise worst-case estimation of the maximum imbalance which might occur. We also investigate optimal randomized BSP algorithms for the problem of finding the median of n elements that require, with high-probability, 3rz/(2p) + o(n/p) number of comparisons, for a wide range of values of n and p. Experimental results for the two algorithms are also presented

74 citations


Additional excerpts

  • ...The algorithm of [8] requires, with high-probability, computation and com­munication time (1 + 0(1) )(2 . eq/P) and O(g~seq/(p k n)) respectively, for all values of n and p such that p = nl , O < c < 1, where Tseq is the time required by sequential sorting....

    [...]

Journal ArticleDOI
TL;DR: The first nontrivial lower bounds on time-space trade-offs for the selection problem are established, and deterministic lower bounds for I/O-efficient algorithms as well are got.
Abstract: We establish the first nontrivial lower bounds on time-space trade-offs for the selection problem. We prove that any comparison-based randomized algorithm for finding the median requires Ω(nlog logSn) expected time in the RAM model (or more generally in the comparison branching program model), if we have S bits of extra space besides the read-only input array. This bound is tight for all S > log n, and remains true even if the array is given in a random order. Our result thus answers a 16-year-old question of Munro and Raman l1996r, and also complements recent lower bounds that are restricted to sequential access, as in the multipass streaming model lChakrabarti et al. 2008br.We also prove that any comparison-based, deterministic, multipass streaming algorithm for finding the median requires Ω(nloga(n/s)+ nlogsn) worst-case time (in scanning plus comparisons), if we have s cells of space. This bound is also tight for all s >log2n. We get deterministic lower bounds for I/O-efficient algorithms as well.The proofs in this article are self-contained and do not rely on communication complexity techniques.

56 citations

Journal ArticleDOI
TL;DR: A simple and robust algorithm for compressive sensing (CS) signal reconstruction based on the weighted median (WM) operator that achieves a better performance for different noise distributions, as the distribution tails become heavier the performance gain achieved by the proposed approach increases significantly.
Abstract: In this paper, we propose a simple and robust algorithm for compressive sensing (CS) signal reconstruction based on the weighted median (WM) operator. The proposed approach addresses the reconstruction problem by solving a l0-regularized least absolute deviation (l0-LAD) regression problem with a tunable regularization parameter, being suitable for applications where the underlying contamination follows a statistical model with heavier-than-Gaussian tails. The solution to this regularized LAD regression problem is efficiently computed, under a coordinate descent framework, by an iterative algorithm that comprises two stages. In the first stage, an estimation of the sparse signal is found by recasting the reconstruction problem as a parameter location estimation for each entry in the sparse vector leading to the minimization of a sum of weighted absolute deviations. The solution to this one-dimensional minimization problem turns out to be the WM operator acting on a shifted-and-scaled version of the measurement samples with weights taken from the entries in the measurement matrix. The resultant estimated value is then passed to a second stage that identifies whether the corresponding entry is relevant or not. This stage is achieved by a hard threshold operator with adaptable thresholding parameter that is suitably tuned as the algorithm progresses. This two-stage operation, WM operator followed by a hard threshold operator, adds the desired robustness to the estimation of the sparse signal and, at the same time, ensures the sparsity of the solution. Extensive simulations demonstrate the reconstruction capability of the proposed approach under different noise models. We compare the performance of the proposed approach to those yielded by state-of-the-art CS reconstruction algorithms showing that our approach achieves a better performance for different noise distributions. In particular, as the distribution tails become heavier the performance gain achieved by the proposed approach increases significantly.

54 citations


Cites methods from "Average case selection"

  • ...Appendix A presents a pseudocode of an efficient implementation [45] to compute the WM operator based on a fast computation of the sample median [46], [ 47 ]....

    [...]

  • ...This can be achieved by extending the concepts used in the QuickSelect algorithm [46], [ 47 ], [62] for the median operator to the weighted median operator leading to a complexity of order for the WM computation [45]....

    [...]

  • ...Notice that the computation of the sample median can be performed in time using a Quickselect algorithm like the one introduced in [ 47 ] leading to an overall computation time of [45]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The number of comparisons required to select the i-th smallest of n numbers is shown to be at most a linear function of n by analysis of a new selection algorithm-PICK.

1,384 citations


"Average case selection" refers background in this paper

  • ...In 197 1, at essentially the same time as they were involved in developing a worstcase linear-time selection algorithm [2], Floyd and Rivest discovered a very practical selection algorithm requiring n + k + o(n) comparisons on average [3] (an improvement on the original linear scheme of [4])....

    [...]

Journal ArticleDOI

446 citations

Journal ArticleDOI
TL;DR: A new selection algorithm is presented which is shown to be very efficient on the average, both theoretically and practically.
Abstract: A new selection algorithm is presented which is shown to be very efficient on the average, both theoretically and practically. The number of comparisons used to select the ith smallest of n numbers is n + min(i,n-i) + o(n). A lower bound within 9 percent of the above formula is also derived.

319 citations

Journal ArticleDOI
TL;DR: The procedures RANGESUB, RANGEMPY, and RANGEDVD provide for the remaining fundamental operations in range ari thmetic, and real a, b, c, d, e, f is a non-local real procedure.
Abstract: b e g i n p r o c e d u r e RANGESUM (a, b, c, d, e, f); rea l a , b , c , d , e , f ; c o m m e n t The term \"range number\" was used by P. S. Dwyer, Linear Computations (Wiley, 1951). Machine procedures for range ari thmetic were developed about 1958 by Ramon Moore, \"Automatic Error Analysis in Digital Computa t ion ,\" LMSD Report 48421, 28 Jan. 1959, Lockheed Missiles and Space Division, Palo Alto, California, 59 pp. If a _< x -< b and c ~ y ~ d, then RANGESUM yields an interval [e, f] such tha t e =< (x + y) f. Because of machine operation (truncation or rounding) the machine sums a -4c and b -4d may not provide safe end-points of the output interval. Thus RANGESUM requires a non-local real procedure ADJUSTSUM which will compensate for the machine ari thmetic. The body of ADJUSTSUM will be dependent upon the type of machine for which it is wri t ten and so is not given here. (An example, however, appears below.) I t is assumed tha t ADJUSTSUM has as parameters real v and w, and integer i, and is accompanied by a non-local real procedure CORRECTION which gives an upper bound to the magnitude of the error involved in the machine representat ion of a number. The output ADJUSTSUM provides the left end-point of the output interval of RANGESUM when ADJUSTSUM is called with i = --1, and the right end-point when called with i = 1 The procedures RANGESUB, RANGEMPY, and RANGEDVD provide for the remaining fundamental operations in range ari thmetic. RANGESQR gives an interval within which the square of a range nmnber must lie. RNGSUMC, RNGSUBC, RNGMPYC and RNGDVDC provide for range ari thmetic with complex range arguments, i.e. the real and imaginary parts are range numbers~ b e g i n e := ADJUSTSUM (a, c, 1 ) ; f : = ADJUSTSUM (b, d, 1) end RANGESUM; p r o c e d u r e RANGESUB (a, b, c, d, e, f) ; real a, b ,c , d ,e , f; c o m m e n t RANGESUM is a non-local procedure; b e g i n RANGESUM (a, b, d , --c, e, f) en d RANGESUB ; p r o c e d u r e RANGEMPY (a, b, c, d, e, f); real a, b, c, d, e, f; c o m m e n t ADJUSTPROD, which appears at the end of this procedure, is analogous to ADJUSTSUM above and is a nonlocal real procedure. MAX and MIN find the maximum and minimum of a set of real numbers and are non-local; b e g i n rea l v, w; i f a < 0 A c => 0 t h e n 1: b e g i n v : = c ; c : = a ; a : = v ; w : = d ; d : = b ; b : = w end 1; i f a => O t h e n 2: b e g i n i f c >= 0 t h e n 3 :beg in e : = a X e ; f := b X d ; g o t o 8 e n d 3 ; e : = b X c ; i f d ~ 0 t h e n 4: b e g i n f : = b X d ; g o t o 8 e n d 4; f : = a X d ; g o t o 8 5: e n d 2; i f b > 0 t h e n 6: b e g i n i f d > 0 t h e n b e g i n e := MIN(a X d, b X c); f : = MAX(a X c , b X d); go t o 8 e n d 6; e : = b X c; f : = a X c; go t o 8 e n d 5; f : = a X c ; i f d _-< O t h e n 7: b e g i n e : = b X d ; g o t o 8 e n d 7 ; e : = a X d ; 8: e : = ADJUSTPROD (e, 1 ) ; f := ADJUSTPROD (f, 1) e n d RANGEMPY; p r o c e d u r e RANGEDVD (a, b, c, d, e, f) ; real a, b, c, d, e, f; c o m m e n t If the range divisor includes zero the program exists to a non-local label \"zerodvsr\" . RANGEDVD assumes a non-local real procedure ADJUSTQUOT which is analogous (possibly identical) to ADJUSTPROD; b e g i n i f c =< 0 A d ~ 0 t h e n go to zer0dvsr; i f c < 0 t h e n 1: b e g i n i f b > 0 t h e n 2: b e g i n e : = b /d ; go t o 3 e n d 2; e : = b /c ; 3: i f a -->_ 0 t h e n 4: b e g i n f : = a /c ; go to 8 e n d 4; f : = a /d ; go to 8 e n d 1 ; i f a < 0 t h e n 5: b e g i n e : = a/c; go t o 6 e n d 5 ; e : = a /d ; 6: i f b > 0 t h e n 7: b e g i n f : = b/c ; go t o 8 e n d 7 ; f : = b /d ; 8: e := ADJUSTQUOT (e, 1 ) ; f : = ADJUSTQUOT (f,1) e n d RANGEDVD ; p r o c e d u r e RANGESQR (a, b, e, f); rea l a, b, e, f; c o m m e n t ADJUSTPROD is a non-10cal procedure; b e g i n i f a < 0 t h e n

179 citations

Journal ArticleDOI
TL;DR: A technique for proving min-max norms of sorting algorithms is given and one new algorithm for finding the minimum and maximum elements of a set with fewest comparisons is proved optimal with this technique.
Abstract: A technique for proving min-max norms of sorting algorithms is given. One new algorithm for finding the minimum and maximum elements of a set with fewest comparisons is proved optimal with this technique.

57 citations


"Average case selection" refers result in this paper

  • ...PROOF. The theorem is the average-case analogue of the well-known (and almost identical) bounds that Pohl [ 5 ] proved for the worst case....

    [...]