scispace - formally typeset
Search or ask a question
Author

Gerth Stølting Brodal

Bio: Gerth Stølting Brodal is an academic researcher from Aarhus University. The author has contributed to research in topics: Data structure & Priority queue. The author has an hindex of 39, co-authored 166 publications receiving 4420 citations. Previous affiliations of Gerth Stølting Brodal include National Research Foundation of South Africa & Max Planck Society.


Papers
More filters
Journal ArticleDOI
TL;DR: It is shown that every algorithm enabled to access A during the query and using a data structure of size O(N/c) bits requires Ω(c) query time, for any c where 1≤c≤N, and the lower bound holds for arrays of any dimension.
Abstract: The two dimensional range minimum query problem is to preprocess a static m by n matrix (two dimensional array) A of size N=m⋅n, such that subsequent queries, asking for the position of the minimum element in a rectangular range within A, can be answered efficiently. We study the trade-off between the space and query time of the problem. We show that every algorithm enabled to access A during the query and using a data structure of size O(N/c) bits requires Ω(c) query time, for any c where 1≤c≤N. This lower bound holds for arrays of any dimension. In particular, for the one dimensional version of the problem, the lower bound is tight up to a constant factor. In two dimensions, we complement the lower bound with an indexing data structure of size O(N/c) bits which can be preprocessed in O(N) time to support O(clog 2 c) query time. For c=O(1), this is the first O(1) query time algorithm using a data structure of optimal size O(N) bits. For the case where queries can not probe A, we give a data structure of size O(N⋅min {m,log n}) bits with O(1) query time, assuming m≤n. This leaves a gap to the space lower bound of Ω(Nlog m) bits for this version of the problem.

55 citations

Book ChapterDOI
05 Dec 2009
TL;DR: A data structure in the RAM model is presented supporting queries that given two indices i ≤ j and an integer k report the k smallest elements in the subarray A[i..j] in sorted order in optimal O(k) time.
Abstract: We study the following one-dimensional range reporting problem: On an array A of n elements, support queries that given two indices i ≤ j and an integer k report the k smallest elements in the subarray A[i..j] in sorted order. We present a data structure in the RAM model supporting such queries in optimal O(k) time. The structure uses O(n) words of space and can be constructed in O(n logn) time. The data structure can be extended to solve the online version of the problem, where the elements in A[i..j] are reported one-by-one in sorted order, in O(1) worst-case time per element. The problem is motivated by (and is a generalization of) a problem with applications in search engines: On a tree where leaves have associated rank values, report the highest ranked leaves in a given subtree. Finally, the problem studied generalizes the classic range minimum query (RMQ) problem on arrays.

54 citations

Proceedings Article
08 Jul 1998

54 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: In this article, a static cache-oblivious dictionary structure for string prefix queries is presented, which performs prefix queries in O(logBn + |P|/B) I/Os, where n is the number of leaves in the trie, P is the query string, and B is the block size.
Abstract: We present static cache-oblivious dictionary structures for strings which provide analogues of tries and suffix trees in the cache-oblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or a suffix tree, and creates a cache-oblivious data structure which performs prefix queries in O(logBn + |P|/B) I/Os, where n is the number of leaves in the trie, P is the query string, and B is the block size. This query cost is optimal for unbounded alphabets. The data structure uses linear space.

53 citations

01 Jan 1996
TL;DR: In this article, the problem of answering d-queries was considered, where a d-query is to report if there exists a string in the set within Hamming distance d of α.
Abstract: Given a set of n binary strings of length m each. We consider the problem of answering d-queries. Given a binary query string α of length m, a d-query is to report if there exists a string in the set within Hamming distance d of α.

53 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted and outlines the beginnings of a comprehensive statistical framework for applying split network methods.
Abstract: The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.

7,273 citations

Journal ArticleDOI
TL;DR: FastTree is a method for constructing large phylogenies and for estimating their reliability, instead of storing a distance matrix, that uses sequence profiles of internal nodes in the tree to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(NLa + N) memory and O(Nlog (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes–Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

3,500 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

01 Jan 2007
TL;DR: This paper provides a brief introduction to the key elements of BOLD, discusses their functional capabilities, and concludes by examining computational resources and future prospects.
Abstract: The Barcode of Life Data System ( BOLD ) is an informatics workbench aiding the acquisition, storage, analysis and publication of DNA barcode records. By assembling molecular, morphological and distributional data, it bridges a traditional bioinformatics chasm. BOLD is freely available to any researcher with interests in DNA barcoding. By providing specialized services, it aids the assembly of records that meet the standards needed to gain BARCODE designation in the global sequence databases. Because of its web-based delivery and flexible data security model, it is also well positioned to support projects that involve broad research alliances. This paper provides a brief introduction to the key elements of BOLD , discusses their functional capabilities, and concludes by examining computational resources and future prospects.

1,859 citations