scispace - formally typeset
Search or ask a question
Author

Mareike Fischer

Bio: Mareike Fischer is an academic researcher from University of Greifswald. The author has contributed to research in topics: Phylogenetic tree & Maximum parsimony. The author has an hindex of 11, co-authored 88 publications receiving 404 citations. Previous affiliations of Mareike Fischer include University of Veterinary Medicine Vienna & University of Canterbury.


Papers
More filters
Journal ArticleDOI
TL;DR: This article shows that this new distance is a metric and provides a lower bound to the well-known Subtree Prune and Regraft (SPR) distance, and shows that to compute the MP distance it is sufficient to consider only characters that are convex on one of the trees, and proves several additional structural properties of the distance.
Abstract: Within the field of phylogenetics there is great interest in distance measures to quantify the dissimilarity of two trees. Here, based on an idea of Bruen and Bryant, we propose and analyze a new distance measure: theMaximum Parsimony (MP) distance. This is based on the difference of the parsimony scores of a single character on both trees under consideration, and the goal is to find the character which maximizes this difference. In this article we show that this new distance is a metric and provides a lower bound to the well-known Subtree Prune and Regraft (SPR) distance. We also show that to compute the MP distance it is sufficient to consider only characters that are convex on one of the trees, and prove several additional structural properties of the distance. On the complexity side, we prove that calculating the MP distance is in general NP-hard, and identify an interesting island of tractability in which the distance can be calculated in polynomial time.

31 citations

Journal ArticleDOI
TL;DR: In this paper, an idealised form of the problem was analyzed, where the terminal edges of a symmetric four-taxon tree are some factor ( λ ) times the length of the interior edge.

31 citations

Journal ArticleDOI
TL;DR: Two different definitions of maximum parsimony on networks, “hardwired” and “softwired,” are discussed and the complexity of computing them given a network topology and a character is examined, showing that both the hardwired and the softwired parsimony scores can be computed efficiently using integer linear programming.
Abstract: Phylogenetic networks are used to display the relationship among different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as maximum parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of maximum parsimony on networks, “hardwired” and “softwired,” and examine the complexity of computing them given a network topology and a character. By exploiting a link with the problem Multiterminal Cut, we show that computing the hardwired parsimony score for 2-state characters is polynomial-time solvable, while for characters with more states this problem becomes NP-hard but is still approximable and fixed parameter tractable in the parsimony score. On the other hand we show that, for the softwired definition, obtaining even weak approximation guarantees is already difficult for binary characters and restricted network topologies, and fixed-parameter tractable algorithms in the parsimony score are unlikely. On the positive side we show that computing the softwired parsimony score is fixed-parameter tractable in the level of the network, a natural parameter describing how tangled reticulate activity is in the network. Finally, we show that both the hardwired and the softwired parsimony scores can be computed efficiently using integer linear programming. The software has been made freely available

28 citations

Posted Content
TL;DR: An idealised form of this problem in which the terminal edges of a symmetric four-taxon tree are some factor (lambda) times the length of the interior edge, and an order lambda(2) lower bound on the growth rate for the sequence length required to resolve the tree is determined.
Abstract: In evolutionary biology, genetic sequences carry with them a trace of the underlying tree that describes their evolution from a common ancestral sequence. The question of how many sequence sites are required to recover this evolutionary relationship accurately depends on the model of sequence evolution, the substitution rate, divergence times and the method used to infer phylogenetic history. A particularly challenging problem for phylogenetic methods arises when a rapid divergence event occurred in the distant past. We analyse an idealised form of this problem in which the terminal edges of a symmetric four--taxon tree are some factor ($p$) times the length of the interior edge. We determine an order $p^2$ lower bound on the growth rate for the sequence length required to resolve the tree (independent of any particular branch length). We also show that this rate of sequence length growth can be achieved by existing methods (including the simple `maximum parsimony' method), and compare these order $p^2$ bounds with an order $p$ growth rate for a model that describes low-homoplasy evolution. In the final section, we provide a generic bound on the sequence length requirement for a more general class of Markov processes.

26 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that computing the MP distance on two binary phylogenetic trees is NP-hard even if only two states are available, and a simple Integer Linear Program (ILP) formulation was given for small trees and for larger trees when only a small number of character states were available.
Abstract: Within the field of phylogenetics there is great interest in distance measures to quantify the dissimilarity of two trees. Recently, a new distance measure has been proposed: the Maximum Parsimony (MP) distance. This is based on the difference of the parsimony scores of a single character on both trees under consideration, and the goal is to find the character which maximizes this difference. Here we show that computation of MP distance on two binary phylogenetic trees is NP-hard. This is a highly nontrivial extension of an earlier NP-hardness proof for two multifurcating phylogenetic trees, and it is particularly relevant given the prominence of binary trees in the phylogenetics literature. As a corollary to the main hardness result we show that computation of MP distance is also hard on binary trees if the number of states available is bounded. In fact, via a different reduction we show that it is hard even if only two states are available. Finally, as a first response to this hardness we give a simple Integer Linear Program (ILP) formulation which is capable of computing the MP distance exactly for small trees (and for larger trees when only a small number of character states are available) and which is used to computationally verify several auxiliary results required by the hardness proofs.

21 citations


Cited by
More filters
Journal ArticleDOI

3,734 citations

Journal ArticleDOI
TL;DR: This work examines how incomplete lineage sorting, phylogenetic signal of individual loci, and missing data affect the absolute and the relative accuracy of species tree estimation methods and shows how these properties affect methods' responses to gene filtering strategies.
Abstract: With the increasing availability of whole genome data, many species trees are being constructed from hundreds to thousands of loci. Although concatenation analysis using maximum likelihood is a standard approach for estimating species trees, it does not account for gene tree heterogeneity, which can occur due to many biological processes, such as incomplete lineage sorting. Coalescent species tree estimation methods, many of which are statistically consistent in the presence of incomplete lineage sorting, include Bayesian methods that coestimate the gene trees and the species tree, summary methods that compute the species tree by combining estimated gene trees, and site-based methods that infer the species tree from site patterns in the alignments of different loci. Due to concerns that poor quality loci will reduce the accuracy of estimated species trees, many recent phylogenomic studies have removed or filtered genes on the basis of phylogenetic signal and/or missing data prior to inferring species trees; little is known about the performance of species tree estimation methods when gene filtering is performed. We examine how incomplete lineage sorting, phylogenetic signal of individual loci, and missing data affect the absolute and the relative accuracy of species tree estimation methods and show how these properties affect methods' responses to gene filtering strategies. In particular, summary methods (ASTRAL-II, ASTRID, and MP-EST), a site-based coalescent method (SVDquartets within PAUP*), and an unpartitioned concatenation analysis using maximum likelihood (RAxML) were evaluated on a heterogeneous collection of simulated multilocus data sets, and the following trends were observed. Filtering genes based on gene tree estimation error improved the accuracy of the summary methods when levels of incomplete lineage sorting were low to moderate but did not benefit the summary methods under higher levels of incomplete lineage sorting, unless gene tree estimation error was also extremely high (a model condition with few replicates). Neither SVDquartets nor concatenation analysis using RAxML benefited from filtering genes on the basis of gene tree estimation error. Finally, filtering genes based on missing data was either neutral (i.e., did not impact accuracy) or else reduced the accuracy of all five methods. By providing insight into the consequences of gene filtering, we offer recommendations for estimating species tree in the presence of incomplete lineage sorting and reconcile seemingly conflicting observations made in prior studies regarding the impact of gene filtering.

167 citations

Journal ArticleDOI
TL;DR: A Monte Carlo approach to estimating power to resolve as well as deriving a nearly equivalent faster deterministic calculation are developed and implemented and predicted power of resolution for the loci analyzed.
Abstract: A principal objective for phylogenetic experimental design is to predict the power of a data set to resolve nodes in a phylogenetic tree. However, proactively assessing the potential for phylogenetic noise compared with signal in a candidate data set has been a formidable challenge. Understanding the impact of collection of additional sequence data to resolve recalcitrant internodes at diverse historical times will facilitate increasingly accurate and cost-effective phylogenetic research. Here, we derive theory based on the fundamental unit of the phylogenetic tree, the quartet, that applies estimates of the state space and the rates of evolution of characters in a data set to predict phylogenetic signal and phylogenetic noise and therefore to predict the power to resolve internodes. We develop and implement a Monte Carlo approach to estimating power to resolve as well as deriving a nearly equivalent faster deterministic calculation. These approaches are applied to describe the distribution of potential signal, polytomy, or noise for two example data sets, one recent (cytochrome c oxidase I and 28S ribosomal rRNA sequences from Diplazontinae parasitoid wasps) and one deep (eight nuclear genes and a phylogenomic sequence for diverse microbial eukaryotes including Stramenopiles, Alveolata, and Rhizaria). The predicted power of resolution for the loci analyzed is consistent with the historic use of the genes in phylogenetics.

128 citations

Proceedings ArticleDOI
19 Jun 2016
TL;DR: A power 2.5 separation between bounded-error randomized and quantum query complexity for a total Boolean function is shown, refuting the widely believed conjecture that the best such separation could only be quadratic (from Grover's algorithm).
Abstract: We show a power 2.5 separation between bounded-error randomized and quantum query complexity for a total Boolean function, refuting the widely believed conjecture that the best such separation could only be quadratic (from Grover's algorithm). We also present a total function with a power 4 separation between quantum query complexity and approximate polynomial degree, showing severe limitations on the power of the polynomial method. Finally, we exhibit a total function with a quadratic gap between quantum query complexity and certificate complexity, which is optimal (up to log factors). These separations are shown using a new, general technique that we call the cheat sheet technique, which builds upon the techniques of Ambainis et al. [STOC 2016]. The technique is based on a generic transformation that converts any (possibly partial) function into a new total function with desirable properties for showing separations. The framework also allows many known separations, including some recent breakthrough results of Ambainis et al. [STOC 2016], to be shown in a unified manner.

91 citations