scispace - formally typeset
Search or ask a question
Author

Gerth Stølting Brodal

Bio: Gerth Stølting Brodal is an academic researcher from Aarhus University. The author has contributed to research in topics: Data structure & Priority queue. The author has an hindex of 39, co-authored 166 publications receiving 4420 citations. Previous affiliations of Gerth Stølting Brodal include National Research Foundation of South Africa & Max Planck Society.


Papers
More filters
Journal ArticleDOI
26 Sep 2013-Biology
TL;DR: A series of algorithmic improvements that have been used during the last decade to develop more efficient algorithms are reviewed by exploiting two different strategies for this; one based on dynamic programming and another based on coloring leaves in one tree and updating a hierarchical decomposition of the other.
Abstract: Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four leaves, respectively, where the topologies of the induced subtrees differ. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to time complexities at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counte dimplicitly, however, and in this paper, we review a series of algorithmic improvements that have been used during the last decade to develop more efficient algorithms by exploiting two different strategies for this; one based on dynamic programming and another based oncoloring leaves in one tree and updating a hierarchical decomposition of the other.

18 citations

Book ChapterDOI
15 Aug 2005
TL;DR: It is proved that a sorting algorithm using O(dn log n) comparisons performs Ω(n logdn) branch mispredictions, and it is shown that Multiway MergeSort achieves this tradeoff by adopting a multiway merger with a low number of branch mis Predictions.
Abstract: Branch mispredictions is an important factor affecting the running time in practice. In this paper we consider tradeoffs between the number of branch mispredictions and the number of comparisons for sorting algorithms in the comparison model. We prove that a sorting algorithm using O(dn log n) comparisons performs Ω(n logdn) branch mispredictions. We show that Multiway MergeSort achieves this tradeoff by adopting a multiway merger with a low number of branch mispredictions. For adaptive sorting algorithms we similarly obtain that an algorithm performing O(dn(1 + log(1 + Inv/n))) comparisons must perform Ω(n logd(1 + Inv/n)) branch mispredictions, where Inv is the number of inversions in the input. This tradeoff can be achieved by GenericSort by Estivill-Castro and Wood by adopting a multiway division protocol and a multiway merging algorithm with a low number of branch mispredictions.

18 citations

DOI
01 Jan 2004
TL;DR: University of Aarhus 11.1 Finger Searching and Applications • Arbitrary Merging Order • List Splitting • Adaptive Merging and Sorting.
Abstract: University of Aarhus 11.1 Finger Searching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1 11.2 Dynamic Finger Search Trees . . . . . . . . . . . . . . . . . . . . . . . 11-2 11.3 Level Linked (2,4)-Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3 11.4 Randomized Finger Search Trees . . . . . . . . . . . . . . . . . . . 11-4 Treaps • Skip Lists 11.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6 Optimal Merging and Set Operations • Arbitrary Merging Order • List Splitting • Adaptive Merging and Sorting

18 citations

Book ChapterDOI
10 Sep 2012
TL;DR: Using a novel technique--the discrepancy properties of Fibonacci lattices--an indexing data structure for 2D-RMQs is given that uses O(N/c) bits additional space with O(clogc(loglogc)2) query time, for any parameter c, 4≤c≤N.
Abstract: Given a matrix of size N, two dimensional range minimum queries (2D-RMQs) ask for the position of the minimum element in a rectangular range within the matrix. We study trade-offs between the query time and the additional space used by indexing data structures that support 2D-RMQs. Using a novel technique--the discrepancy properties of Fibonacci lattices--we give an indexing data structure for 2D-RMQs that uses O(N/c) bits additional space with O(clogc(loglogc)2) query time, for any parameter c, 4≤c≤N. Also, when the entries of the input matrix are from {0,1}, we show that the query time can be improved to O(clogc) with the same space usage.

18 citations

Journal ArticleDOI
TL;DR: In this article, an output-dependent expected running time of O((m+n@?)loglog@s+sort) and O(m) space was given for the 3-letter alphabet case.

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted and outlines the beginnings of a comprehensive statistical framework for applying split network methods.
Abstract: The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.

7,273 citations

Journal ArticleDOI
TL;DR: FastTree is a method for constructing large phylogenies and for estimating their reliability, instead of storing a distance matrix, that uses sequence profiles of internal nodes in the tree to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(NLa + N) memory and O(Nlog (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes–Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

3,500 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

01 Jan 2007
TL;DR: This paper provides a brief introduction to the key elements of BOLD, discusses their functional capabilities, and concludes by examining computational resources and future prospects.
Abstract: The Barcode of Life Data System ( BOLD ) is an informatics workbench aiding the acquisition, storage, analysis and publication of DNA barcode records. By assembling molecular, morphological and distributional data, it bridges a traditional bioinformatics chasm. BOLD is freely available to any researcher with interests in DNA barcoding. By providing specialized services, it aids the assembly of records that meet the standards needed to gain BARCODE designation in the global sequence databases. Because of its web-based delivery and flexible data security model, it is also well positioned to support projects that involve broad research alliances. This paper provides a brief introduction to the key elements of BOLD , discusses their functional capabilities, and concludes by examining computational resources and future prospects.

1,859 citations