scispace - formally typeset
Search or ask a question

Showing papers by "Rolf Fagerberg published in 2008"


Journal ArticleDOI
TL;DR: A carefully implemented cache-oblivious sorting algorithm, which can be faster than the best Quicksort implementation the authors are able to find for input sizes well within the limits of RAM and at least as fast as the recent cache-aware implementations included in the test.
Abstract: This paper is an algorithmic engineering study of cache-oblivious sorting. We investigate by empirical methods a number of implementation issues and parameter choices for the cache-oblivious sorting algorithm Lazy Funnelsort and compare the final algorithm with Quicksort, the established standard for comparison-based sorting, as well as with recent cache-aware proposals. The main result is a carefully implemented cache-oblivious sorting algorithm, which, our experiments show, can be faster than the best Quicksort implementation we are able to find for input sizes well within the limits of RAM. It is also at least as fast as the recent cache-aware implementations included in the test. On disk, the difference is even more pronounced regarding Quicksort and the cache-aware algorithms, whereas the algorithm is slower than a careful implementation of multiway Mergesort, such as TPIE.

59 citations


Journal ArticleDOI
TL;DR: It is demonstrated empirically that the actual running time of Quicksort is adaptive with respect to the presortedness measure Inv, and it is proved that for the randomized version of quicksort, the number of element swaps performed is provably adaptive withrespect to the measure Inv.
Abstract: Quicksort was first introduced in 1961 by Hoare. Many variants have been developed, the best of which are among the fastest generic-sorting algorithms available, as testified by the choice of Quicksort as the default sorting algorithm in most programming libraries. Some sorting algorithms are adaptive, i.e., they have a complexity analysis that is better for inputs, which are nearly sorted, according to some specified measure of presortedness. Quicksort is not among these, as it uses Ω(n log n) comparisons even for sorted inputs. However, in this paper, we demonstrate empirically that the actual running time of Quicksort is adaptive with respect to the presortedness measure Inv. Differences close to a factor of two are observed between instances with low and high Inv value. We then show that for the randomized version of Quicksort, the number of element swaps performed is provably adaptive with respect to the measure Inv. More precisely, we prove that randomized Quicksort performs expected O(n(1 + log(1 + Inv/n))) element swaps, where Inv denotes the number of inversions in the input sequence. This result provides a theoretical explanation for the observed behavior and gives new insights on the behavior of Quicksort. We also give some empirical results on the adaptive behavior of Heapsort and Mergesort.

27 citations


Journal Article

5 citations


Journal ArticleDOI
TL;DR: Two algorithms for calculating the quartet distance between all pairs of trees in a set of binary evolutionary trees on a common set of species are presented, performing significantly better on large sets of trees compared to performing distinct pairwise distance calculations.
Abstract: We present two algorithms for calculating the quartet distance between all pairs of trees in a set of binary evolutionary trees on a common set of species. The algorithms exploit common substructure among the trees to speed up the pairwise distance calculations, thus performing significantly better on large sets of trees compared to performing distinct pairwise distance calculations, as we illustrate experimentally, where we see a speedup factor of around 130 in the best case.

5 citations


01 Jan 2008
TL;DR: A two-level I/O-model is introduced, consisting of a fast memory of size M and a slower memory of infinite size, with data transferred between the levels in blocks of consecutive elements, to better account for the effects of the memory hierarchy.
Abstract: The memory system of contemporary computers consists of a hierarchy of memory levels, with each level acting as a cache for the next; a typical hierarchy consists of registers, level 1 cache, level 2 cache, level 3 cache, main memory, and disk (Fig. 1). One characteristics of the hierarchy is that the memory levels get larger and slower the further they get from the processor, with the access time increasing most dramatically between RAM memory and disk. Another characteristics is that data is moved between levels in blocks of consecutive elements. As a consequence of the differences in access time between the levels, the cost of a memory access depends highly on what is the current lowest memory level holding the element accessed. Hence, the memory access pattern of an algorithm has a major influence on its practical running time. Unfortunately, the RAM model (Fig. 2) traditionally used to design and analyze algorithms is not capable of capturing this, as it assumes that all memory accesses take equal time. To better account for the effects of the memory hierarchy, a number of computational models have been proposed. The simplest and most successful is the two-level I/O-model introduced by Aggarwal and Vitter [3] (Fig. 3). In this model a two-level memory hierarchy is assumed, consisting of a fast memory of size M and a slower memory of infinite size, with data transferred between the levels in blocks of B consecutive elements. Computation can only be performed on data in the fast memory, and algorithms are assumed to have complete control over transfers of blocks between the two levels. Such a block transfer is denoted a memory transfer. The complexity measure is the number of memory transfers performed. The strength of the I/O-model is that it captures part of the memory hierarchy while being sufficiently simple to make design and analysis of algorithms feasible. Over the last two decades, a large body of results for the I/O-model has been produced, covering most areas of algorithmics. For an overview, see the surveys [5, 32, 34–36]. More elaborate models of multilevel memory have been proposed (see e.g., [34] for an overview) but these models have been less successful than the I/O-model, mainly because of their complexity which makes analysis of algorithms harder. All these models, including the I/O-model, assume that the characteristics of the memory hierarchy (the level and block sizes) are known.

2 citations


01 Jan 2008
TL;DR: The subject of the present chapter is that of efficient dictionary structures for the cache-oblivious model, an algorithm formulated in the RAM model, but analyzed in the I/O-model, with an analysis valid for any value of B and M .
Abstract: Computers contain a hierarchy of memory levels, with vastly differing access times. Hence, the time for a memory access depends strongly on what is the innermost level containing the data accessed. In analysis of algorithms, the standard RAM (or von Neumann) model cannot capture this effect, and external memory models were introduced to better model the situation. The most widely used of these models is the two-level I/O-model [4], also called the external memory model or the disk access model. The I/O-model approximates the memory hierarchy by modeling two levels, with the inner level having size M , the outer level having infinite size, and transfers between the levels taking place in blocks of B consecutive elements. The cost of an algorithm is the number of such transfers it makes. The cache-oblivious model, introduced by Frigo et al. [26], elegantly generalizes the I/O-model to a multilevel memory model by a simple measure: the algorithm is not allowed to know the value of B and M . More precisely, a cache-oblivious algorithm is an algorithm formulated in the RAM model, but analyzed in the I/O-model, with an analysis valid for any value of B and M . Cache replacement is assumed to take place automatically by an optimal off-line cache replacement strategy. Since the analysis holds for any B and M , it holds for all levels simultaneously (for a detailed version of this statement, see [26]). The subject of the present chapter is that of efficient dictionary structures for the cache-oblivious model.

1 citations