scispace - formally typeset
Search or ask a question

Showing papers in "ACM Journal of Experimental Algorithms in 2000"


Journal ArticleDOI
TL;DR: This study considers the scenario of a central information server in the realm of public railroad transport on wide-area networks, based on the timetable data of all German trains and on a "snapshot" of half a million customer queries for Dijkstra's algorithm.
Abstract: Traffic information systems are among the most prominent real-world applications of Dijkstra's algorithm for shortest paths. We consider the scenario of a central information server in the realm of public railroad transport on wide-area networks. Such a system has to process a large number of on-line queries for optimal travel connections in real time. In practice, this problem is usually solved by heuristic variations of Dijkstra's algorithm, which do not guarantee an optimal result. We report results from a pilot study, in which we focused on the travel time as the only optimization criterion. In this study, various speed-up techniques for Dijkstra's algorithm were analysed empirically. This analysis was based on the timetable data of all German trains and on a "snapshot" of half a million customer queries.

201 citations


Journal ArticleDOI
TL;DR: A new automaton to recognize suffixes of patterns with classes of characters is introduced, which seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area.
Abstract: The most important features of a string matching algorithm are its efficiency and its flexibility. Efficiency has traditionally received more attention, while flexibility in the search pattern is becoming a more and more important issue. Most classical string matching algorithms are aimed at quickly finding an exact pattern in a text, being Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) family the most famous ones. A recent development uses deterministic "suffix automata" to design new optimal string matching algorithms, e.g. BDM and TurboBDM. Flexibility has been addressed quite separately by the use of "bit-parallelism", which simulates automata in their nondeterministic form by using bits and exploiting the intrinsic parallelism inside the computer word, e.g. the Shift-Or algorithm. Those algorithms are extended to handle classes of characters and errors in the pattern and/or in the text, their drawback being their inability to skip text characters. In this paper we merge bit-parallelism and suffix automata, so that a nondeterministic suffix automaton is simulated using bit-parallelism. The resulting algorithm, called BNDM, obtains the best from both worlds. It is much simpler to implement than BDM and nearly as simple as Shift-Or. It inherits from Shift-Or the ability to handle flexible patterns and from BDM the ability to skip characters. BNDM is 30%-40% faster than BDM and up to 7 times faster than Shift-Or. When compared to the fastest existing algorithms on exact patterns (which belong to the BM family), BNDM is from 20% slower to 3 times faster, depending on the alphabet size. With respect to flexible pattern searching, BNDM is by far the fastest technique to deal with classes of characters and is competitive to search allowing errors. In particular, BNDM seems very adequate for computational biology applications, since it is the fastest algorithm to search on DNA sequences and flexible searching is an important problem in that area. As a theoretical development related to flexible pattern matching, we introduce a new automaton to recognize suffixes of patterns with classes of characters. To the best of our knowledge, this automaton has not been studied before.

182 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects, and applied these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Grobner bases, and local improvement algorithms.
Abstract: We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Grobner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.

118 citations


Journal ArticleDOI
Peter Sanders1
TL;DR: In this article, a fast priority queue for external memory and cached memory that is based on k-way merging is proposed, which is at least two times faster than an optimized implementation of binary heaps and 4-ary heaps for large inputs.
Abstract: The cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms that perform well in practice. This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on k-way merging. It improves previous external memory algorithms by constant factors crucial for transferring it to cached memory. Running in the cache hierarchy of a workstation the algorithm is at least two times faster than an optimized implementation of binary heaps and 4-ary heaps for large inputs.

86 citations


Journal ArticleDOI
TL;DR: This study restructure the mergesort and quicksort algorithms further by integrating tiling, padding, and buffering techniques and by repartitioning the data set to reduce other types of cache misses.
Abstract: Memory hierarchy considerations during sorting algorithm design and implementation play an important role in significantly improving execution performance. Existing algorithms mainly attempt to reduce capacity misses on direct-mapped caches. To reduce other types of cache misses that occur in the more common set-associative caches and the TLB, we restructure the mergesort and quicksort algorithms further by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows that substantial performance improvements can be obtained using our new methods.

51 citations


Journal ArticleDOI
TL;DR: A detailed software architecture is presented that allows flexible, efficient and accurate assessment of the practical implications of new move-based algorithms and partitioning formulations and discusses the current level of sophistication in implementation know-how and experimental evaluation.
Abstract: We summarize the techniques of implementing move-based hypergraph partitioning heuristics and evaluating their performance in the context of VLSI design applications. Our first contribution is a detailed software architecture, consisting of seven reusable components, that allows flexible, efficient and accurate assessment of the practical implications of new move-based algorithms and partitioning formulations. Our second contribution is an assessment of the modern context for hypergraph partitioning research for VLSI design applications. In particular, we discuss the current level of sophistication in implementation know-how and experimental evaluation, and we note how requirements for real-world partitioners - if used as motivation for research - should affect the evaluation of prospective contributions. Two "implicit decisions" in the implementation of the Fiduccia-Mattheyses heuristic are used to illustrate the difficulty of achieving meaningful experimental evaluation of new algorithmic ideas.

46 citations


Journal ArticleDOI
Eyal Flato1, Dan Halperin1, Iddo Hanniel1, Oren Nechushtan1, Eti Ezra1 
TL;DR: The planar map package of CGAL--a Computational Geometry Algorithms Library is described and the two main classes of the design--planar maps and topological maps that enable the convenient separation between geometry and topology are introduced.
Abstract: Planar maps are fundamental structures in computational geometry. They are used to represent the subdivision of the plane into regions and have numerous applications. We describe the planar map package of CGAL--a Computational Geometry Algorithms Library. We discuss its modular design and implementation. In particular we introduce the two main classes of the design--planar maps and topological maps that enable the convenient separation between geometry and topology. The modular design is implemented using a generic programming approach. By switching a template parameter--the geometric traits class, one can use the same code for planar maps of different objects such as line segments or circular arcs. More flexibility is achieved by choosing a point location algorithm out of three implemented algorithms or plugging in an algorithm implemented by the user. The user of the planar maps package can benefit both from its flexibility and robustness. We present several examples of geometric traits classes and point location algorithms which demonstrate the possibility to adapt the general package to specific needs.

46 citations


Journal ArticleDOI
TL;DR: An approximate analysis for distribution sorting uniform keys is presented which predicts the expected cache misses of Flashsort1 quite well and it is shown that the integer distribution sorting algorithm MSB radix sort performs well on both uniform integer and uniform floating-point keys.
Abstract: We study cache effects in distribution sorting algorithms for sorting keys drawn independently at random from a uniform distribution ('uniform keys'). We note that the performance of a recently-published distribution sorting algorithm, Flashsort1, which sorts n uniform floating-point keys in O(n) expected time, does not scale well with the input size due to poor cache utilisation. We present an approximate analysis for distribution sorting uniform keys which, as validated by simulation results, predicts the expected cache misses of Flashsort1 quite well. Using this analysis, we design a multiple-pass variant of Flashsort1 which outperforms Flashsort1 and comparison-based algorithms on uniform floating-point keys for moderate to large values of n. Using experimental results we also show that the integer distribution sorting algorithm MSB radix sort performs well on both uniform integer and uniform floating-point keys.

27 citations


Journal ArticleDOI
TL;DR: The variegate spectrum of experimental results gives a good picture of the features of these priority queues, thus being helpful to anyone interested in the use of such data structures on very large data sets.
Abstract: In this paper we compare the performance of eight different priority queue implementations: four of them are explicitly designed to work in an external-memory setting, the others are standard internal-memory queues available in the LEDA library [Mehlhorn and Naher 1999]. Two of the external-memory priority queues are obtained by engineering known internal-memory priority queues with the aim of achieving effective performance on external storage devices (i.e., Radix heaps [Ahuja et al. 1990] and array heaps [Thorup 1996]). Our experimental framework includes some simple tests, like random sequences of insert or delete-minimum operations, as well as more advanced tests consisting of intermixed sequences of update operations and "application driven" update sequences originated by simulations of Dijkstra's algorithm on large graph instances. Our variegate spectrum of experimental results gives a good picture of the features of these priority queues, thus being helpful to anyone interested in the use of such data structures on very large data sets.

26 citations


Journal ArticleDOI
TL;DR: The experiments confirm that a fractional jump-start speeds up the algorithm, and indicate that a variant based on pairing heaps is slightly superior to a k-heap variant, and that scaling of large
Abstract: We present an experimental study of an implementation of weighted perfect b-matching based on the primal-dual blossom algorithm. Although this problem is well-understood in theory and efficient algorithms are known, only little experience with implementations is available. In this paper several algorithmic variants are compared on synthetic and application problem data of very sparse graphs. This study was motivated by the practical need for an efficient b-matching solver for the latter application, namely as a subroutine in our approach to a mesh refinement problem in computer-aided design (CAD).Linear regression and operation counting is used to analyze code variants. The experiments confirm that a fractional jump-start speeds up the algorithm, they indicate that a variant based on pairing heaps is slightly superior to a k-heap variant, and that scaling of large b-values is not necessary, whereas a delayed blossom shrinking heuristic significantly improves running times only for graphs with average degree two. The fastest variant of our implementation appears to be highly superior to a code by Miller and Pekny (1995).

24 citations


Journal ArticleDOI
TL;DR: An experimental study shows that two heuristics with complexity linear in k are much faster than the exact algorithm also in practice, and a polynomial time algorithm is presented.
Abstract: Given a weighted graph G = (V, E), a positive integer k, and a penalty function wp, we want to find k spanning trees on G, not necessarily disjoint, of minimum total weight, such that the weight of each edge is subject to a penalty given by wp if it belongs to more than one tree. The objective function to be minimized is Σe∈EWe(ie), where ie is the number of times edge e appears in the solution and We(ie) = iewp(e, ie) is the aggregate cost of using edge e ie times. For the case when We is weakly convex, which should have wide application in congestion problems, we present a polynomial time algorithm; the algorithm's complexity is quadratic in k. We also present two heuristics with complexity linear in k. In an experimental study we show that these heuristics are much faster than the exact algorithm also in practice. These experiments present a diverse combination of input families (four), varying k (up to 1000), and penalty functions (two). In most inputs tested the solutions given by the heuristics were within 1% of optimal or much better, especially for large k. The worst quality observed was 3.2% of optimal.

Journal ArticleDOI
TL;DR: Looking at random two dimensional Euclidean instances and the large instances from TSPLIB, this work ran experiments to evaluate several strategies for picking among the violated constraints and found some information about which constraints to prefer, which resulted in modest gains, but were unable to get large improvements in performance.
Abstract: Given an instance of the Traveling Salesman Problem (TSP), a reasonable way to get a lower bound on the optimal answer is to solve a linear programming relaxation of an integer programming formulation of the problem. These linear programs typically have an exponential number of constraints, but in theory they can be solved efficiently with the ellipsoid method as long as we have an algorithm that can take a solution and either declare it feasible or find a violated constraint. In practice, it is often the case that many constraints are violated, which raises the question of how to choose among them so as to improve performance. For the simplest TSP formulation it is possible to efficiently find all the violated constraints, which gives us a good chance to try to answer this question empirically. Looking at random two dimensional Euclidean instances and the large instances from TSPLIB, we ran experiments to evaluate several strategies for picking among the violated constraints. We found some information about which constraints to prefer, which resulted in modest gains, but were unable to get large improvements in performance.

Journal ArticleDOI
TL;DR: Analysis of the behaviour of three methods for constructing a binary heap on a computer with a hierarchical memory shows that, under reasonable assumptions, repeated insertion and layerwise construction both incur at most at most cN/B cache misses, whereas repeated merging, as programmed by Floyd, can incur more than more than (dN log2 B) cache misses.
Abstract: The behaviour of three methods for constructing a binary heap on a computer with a hierarchical memory is studied. The methods considered are the original one proposed by Williams [1964], in which elements are repeatedly inserted into a single heap; the improvement by Floyd [1964], in which small heaps are repeatedly merged to bigger heaps; and a recent method proposed, e.g., by Fadel et al. [1999] in which a heap is built layerwise. Both the worst-case number of instructions and that of cache misses are analysed. It is well-known that Floyd's method has the best instruction count. Let N denote the size of the heap to be constructed, B the number of elements that fit into a cache line, and let c and d be some positive constants. Our analysis shows that, under reasonable assumptions, repeated insertion and layerwise construction both incur at most cN/B cache misses, whereas repeated merging, as programmed by Floyd, can incur more than (dN log2 B)/B cache misses. However, for our memory-tuned versions of repeated insertion and repeated merging the number of cache misses incurred is close to the optimal bound N/B. In addition to these theoretical findings, we communicate many practical experiences which we hope to be valuable for others doing experimental algorithmic work.

Journal ArticleDOI
TL;DR: BALL is designed and implemented, the first object-oriented application framework for rapid prototyping in Molecular Modeling, and provides fundamental components for import/export of data in various file formats, Molecular Mechanics simulations, three-dimensional visualization, and more complex ones like a numerical solver for the Poisson-Boltzmann equation.
Abstract: In the next century, virtual laboratories will play a key role in biotechnology. Computer experiments will not only replace some of the time-consuming and expensive real-world experiments, but they will also provide insights that cannot be obtained using "wet" experiments. The field that deals with the modeling of atoms, molecules, and their reactions is called Molecular Modeling. The advent of Life Sciences gave rise to numerous new developments in this area. However, the implementation of new simulation tools is extremely time-consuming. This is mainly due to the large amount of supporting code that is required in addition to the code necessary to implement the new idea. The only way to reduce the development time is to reuse reliable code, preferably using object-oriented approaches. We have designed and implemented BALL, the first object-oriented application framework for rapid prototyping in Molecular Modeling. By the use of the composite design pattern and polymorphism we were able to model the multitude of complex biochemical concepts in a well-structured and comprehensible class hierarchy, the BALL kernel classes. The isomorphism between the biochemical structures and the kernel classes leads to an intuitive interface. Since BALL was designed for rapid software prototyping, ease of use, extensibility, and robustness were our principal design goals. Besides the kernel classes, BALL provides fundamental components for import/export of data in various file formats, Molecular Mechanics simulations, three-dimensional visualization, and more complex ones like a numerical solver for the Poisson-Boltzmann equation.

Journal ArticleDOI
TL;DR: Good speedups for much smaller inputs are possible and the first finding is based on a new variant of a 1984 algorithm, called the No-Cut algorithm, which provides an interesting example where experimental research and theoretical analysis complement one another.
Abstract: Algorithms for the problem of list ranking are empirically studied with respect to the Explicit Multi-Threaded (XMT) platform for instruction-level parallelism (ILP) The main goal of this study is to understand the differences between XMT and more traditional parallel computing implementation platforms/models as they pertain to the well studied list ranking problem The main two findings are: (i) good speedups for much smaller inputs are possible and (ii) in part, the first finding is based on a new variant of a 1984 algorithm, called the No-Cut algorithm The paper incorporates analytic (non-asymptotic) performance analysis into experimental performance analysis for relatively small inputs This provides an interesting example where experimental research and theoretical analysis complement one another Explicit Multi-Threading (XMT) is a fine-grained computation framework introduced in our SPAA'98 paper Building on some key ideas of parallel computing, XMT covers the spectrum from algorithms through architecture to implementation; the main implementation related innovation in XMT was through the incorporation of low-overhead hardware and software mechanisms (for more effective fine-grained parallelism) The reader is referred to that paper for detail on these mechanisms The XMT platform aims at faster single-task completion time by way of ILP

Journal ArticleDOI
Tetsuo Shibuya1
TL;DR: New algorithms that compute the set of shortest paths efficiently by using the A* algorithm are proposed.
Abstract: Computation of all the shortest paths between multiple sources and multiple destinations on various networks is required in many problems, such as the traveling salesperson problem (TSP) and the vehicle routing problem (VRP). This paper proposes new algorithms that compute the set of shortest paths efficiently by using the A* algorithm. The efficiency and properties of these algorithms are examined by using the results of experiments on an actual road network.

Journal Article
TL;DR: Experimental results indicate that the leaf-correspondence method generally leads to a faster double-ended priority queue than either total or dual correspondence.
Abstract: We describe three general methods--total, dual, and leaf correspondence--that may be used to derive efficient double-ended priority queues from single-ended priority queues. These methods are illustrated by developing double-ended priority queues based on the classical heap. Experimental results indicate that the leaf-correspondence method generally leads to a faster double-ended priority queue than either total or dual correspondence. On randomly generated test sets, however, the splay tree outperforms the tested correspondence-based double-ended priority queues.