scispace - formally typeset
Search or ask a question
Author

Norbert Zeh

Other affiliations: Max Planck Society, Carleton University, Carleton College  ...read more
Bio: Norbert Zeh is an academic researcher from Dalhousie University. The author has contributed to research in topics: Planar graph & Data structure. The author has an hindex of 24, co-authored 112 publications receiving 1650 citations. Previous affiliations of Norbert Zeh include Max Planck Society & Carleton University.


Papers
More filters
Journal ArticleDOI
TL;DR: The algorithm for computing maximum acyclic agreement forests is the first depth-bounded search algorithm for this problem and substantially outperform the best previous algorithms for these problems.
Abstract: We present new and improved fixed-parameter algorithms for computing maximum agreement forests of pairs of rooted binary phylogenetic trees. The size of such a forest for two trees corresponds to their subtree prune-and-regraft distance and, if the agreement forest is acyclic, to their hybridization number. These distance measures are essential tools for understanding reticulate evolution. Our algorithm for computing maximum acyclic agreement forests is the first depth-bounded search algorithm for this problem. Our algorithms substantially outperform the best previous algorithms for these problems.

97 citations

Journal ArticleDOI
TL;DR: This work successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla, and allowed direct inference of highways of gene transfer between bacterial classes and genera.
Abstract: Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest- based methods can reconcile trees with hundreds of taxa and > 50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson- Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. (Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson-Foulds; subtree prune-and-regraft; supertrees.)

92 citations

Journal ArticleDOI
TL;DR: In this article, the Delaunay triangulation is used to approximate a shortest path between two points p and q in the Delane triangulated graph, whose length is less than or equal to 2π/(3 cos(π/6) times the Euclidean distance |pq|.
Abstract: In a geometric bottleneck shortest path problem, we are given a set S of n points in the plane, and want to answer queries of the following type: given two points p and q of S and a real number L, compute (or approximate) a shortest path between p and q in the subgraph of the complete graph on S consisting of all edges whose lengths are less than or equal to L. We present efficient algorithms for answering several query problems of this type. Our solutions are based on Euclidean minimum spanning trees, spanners, and the Delaunay triangulation. A result of independent interest is the following. For any two points p and q of S, there is a path between p and q in the Delaunay triangulation, whose length is less than or equal to 2π/(3 cos(π/6)) times the Euclidean distance |pq| between p and q, and all of whose edges have length at most |pq|.

59 citations

Book ChapterDOI
20 May 2010
TL;DR: In this paper, the authors improved the running time of the algorithm to O(2.42kn) and O( 2.42k logn, respectively, by introducing new branching rules.
Abstract: We improve on earlier FPT algorithms for computing a rooted maximum agreement forest (MAF) or a maximum acyclic agreement forest (MAAF) of a pair of phylogenetic trees. Their sizes give the subtree-prune-and-regraft (SPR) distance and the hybridization number of the trees, respectively. We introduce new branching rules that reduce the running time of the algorithms from O(3kn) and O(3kn logn) to O(2.42kn) and O(2.42kn logn), respectively. In practice, the speed up may be much more than predicted by the worst-case analysis. We confirm this intuition experimentally by computing MAFs for simulated trees and trees inferred from protein sequence data. We show that our algorithm is orders of magnitude faster and can handle much larger trees and SPR distances than the best previous methods, treeSAT and sprdist.

59 citations

Journal ArticleDOI
01 Mar 2003
TL;DR: A data structure is constructed for answering shortest path queries on planar graphs that uses O(N3/2/B) blocks of external memory and allows for a shortest path query to be answered in O((√N + K)/DB) I/Os, where K is the number of vertices on the reported path.
Abstract: We present results related to satisfying shortest path queries on a planar graph stored in external memory. Let N denote the number of vertices in the graph and sort(N) denote the number of input/output (I/O) operations required to sort an array of length N: (1) We describe a blocking for rooted trees to support bottom-up traversals of these trees in O(K/B) I/Os, where K is the length of the traversed path. The space required to store the tree is O(N/B) blocks, where N is the number of vertices of the tree and B is the block size. (2) We give an algorithm for computing a 2/3-separator of size O(√N) for a given embedded planar graph. Our algorithm takes O(sort(N)) I/Os, provided that a breadth-first spanning tree is given. (3) We give an algorithm for triangulating embedded planar graphs in O(sort(N)) I/Os. We use these results to construct a data structure for answering shortest path queries on planar graphs. The data structure uses O(N3/2/B) blocks of external memory and allows for a shortest path query to be answered in O((√N + K)/DB) I/Os, where K is the number of vertices on the reported path and D is the number of parallel disks.

58 citations


Cited by
More filters
ReportDOI
01 May 2014
TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.
Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

907 citations

Book ChapterDOI
Eric V. Denardo1
01 Jan 2011
TL;DR: This chapter sees how the simplex method simplifies when it is applied to a class of optimization problems that are known as “network flow models” and finds an optimal solution that is integer-valued.
Abstract: In this chapter, you will see how the simplex method simplifies when it is applied to a class of optimization problems that are known as “network flow models.” You will also see that if a network flow model has “integer-valued data,” the simplex method finds an optimal solution that is integer-valued.

828 citations

Journal ArticleDOI
TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

751 citations

Journal ArticleDOI
TL;DR: Analysis of the genomes of 30 Lachnospiraceae isolates demonstrates that adaptation to an ecological niche and acquisition of defining functional roles within a microbiome can arise through a combination of both habitat-specific gene loss and LGT.
Abstract: Several bacterial families are known to be highly abundant within the human microbiome, but their ecological roles and evolutionary histories have yet to be investigated in depth. One such family, Lachnospiraceae (phylum Firmicutes, class Clostridia) is abundant in the digestive tracts of many mammals and relatively rare elsewhere. Members of this family have been linked to obesity and protection from colon cancer in humans, mainly due to the association of many species within the group with the production of butyric acid, a substance that is important for both microbial and host epithelial cell growth. We examined the genomes of 30 Lachnospiraceae isolates to better understand the origin of butyric acid capabilities and other ecological adaptations within this group. Butyric acid production-related genes were detected in fewer than half of the examined genomes with the distribution of this function likely arising in part from lateral gene transfer (LGT). An investigation of environment-specific functional signatures indicated that human gut-associated Lachnospiraceae possess genes for endospore formation, whereas other members of this family lack key sporulation-associated genes, an observation supported by analysis of metagenomes from the human gut, oral cavity, and bovine rumen. Our analysis demonstrates that adaptation to an ecological niche and acquisition of defining functional roles within a microbiome can arise through a combination of both habitat-specific gene loss and LGT.

542 citations