scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Linear-Time Majority Tree Algorithm

TL;DR: A randomized linear-time algorithm for computing the majority rule consensus tree is given, widely used for summarizing a set of phylogenetic trees, which is usually a post-processing step in constructing a phylogeny.
Abstract: We give a randomized linear-time algorithm for computing the majority rule consensus tree. The majority rule tree is widely used for summarizing a set of phylogenetic trees, which is usually a post-processing step in constructing a phylogeny. We are implementing the algorithm as part of an interactive visualization system for exploring distributions of trees, where speed is a serious concern for real-time interaction. The linear running time is achieved by using succinct representation of the subtrees and efficient methods for the final tree reconstruction.

Summary (3 min read)

1 Introduction

  • With the recent explosion in the amount of genomic data available, and exponential increases in computing power, biologists are now able to consider larger scale problems in phylogeny: that is, the construction of evolutionary trees on hundreds or thousands of taxa, and ultimately of the entire “Tree of Life” which would include millions of taxa.
  • Large sets of trees arise given any kind of input data on the taxa (e.g. gene sequence, gene order, character) and whatever optimization criterion is used to select the “best” tree.
  • Maximum likelihood estimation, also computationally hard, generally produces trees with unique scores.
  • The authors visualization system is designed to support both kinds of projects.

1.1 Notation

  • Without loss of generality, the authors assume the input trees are rooted at the branch connecting a distinguished taxon s0, known as the outgroup, to the rest of the tree.
  • Consider a node i in an input tree Tj. Removing the branch from i towards the root divides.
  • The induced bipartition of the taxa set into two subsets identifies the combinatorial type of node i.
  • The majority rule tree, or Ml tree, includes nodes for exactly those bipartitions which occur in more than half of the input trees, or more generally in more than some fraction l of the input trees.
  • While this example shows binary trees, the algorithm also works for input trees with polytomies (internal nodes of degree greater than three).

1.2 Prior Work

  • The authors algorithm follows the same intuitive scheme as most previous algorithms.
  • In the first stage, the authors read through the input trees and count the occurrences of each bipartition, storing the counts in a table.
  • This requires n/w machine words per node, and accounts for (n/w) factor in the bound.
  • If the authors assume that the size of a machine word is O(lg x), so that for instance they can compare two bipartitions in O(1) time, then they say that Day’s algorithm achieves an optimal O(tn) running time.
  • Majority trees are also computed by PAUP [17], using an unknown (to us) algorithm.

2 Majority Rule Tree Algorithm

  • The authors algorithm has two main stages: scanning the trees to find the majority bipartitions (details in Section 2.1) and then constructing the majority rule tree from these bipartitions (details in Section 2.2).
  • It ends by checking the output tree for errors due to (very unlikely) bad random choices.
  • Figure 3 contains pseudo-code for the algorithm.

2.1 Finding Majority Bipartitions

  • In the first stage of the algorithm, the authors traverse each input tree in post-order, determining each bipartition as they complete the traversal of its subtree.
  • To handle collisions, the authors use a standard strategy called chaining: instead of storing a count at each table address, they store a linked list of counts, one for each bipartition which has hashed to that address.
  • Similarly if B1 and B2 are bipartitions corresponding to leaves the authors can detect the double collision immediately by checking that the two taxa match before incrementing the count.
  • A similar statement of course holds for h2, and when B has more than two children.
  • The authors can use this fact to compute the hash code recursively during the postorder traversal.

2.2 Constructing the Majority Tree

  • Once the authors have all the counts in the table they are ready to compute the majority rule consensus tree.
  • The counts let us identify which are the majority bipartitions that appear in more than lt trees.
  • For any majority bipartition B and its parent.
  • When the authors are done, each node B in the output tree, interior or leaf, points to the node of smallest cardinality that was an ancestor in any one of the input trees.
  • Assuming there was no double collision, Facts 2, 3, and 4 imply that the output tree is the correct majority rule consensus tree.

2.3 Final Check

  • After constructing the majority rule tree, the authors check it against the hash table in order to detect any occurrence of the final remaining case of a double collision, when two bipartitions B1, B2 of the same cardinality k > 1 have the same value for both h1 and h2.
  • Recall that, if B1, B2 are singletons or have different cardinalities, double collisions would already have been detected when putting the data into the hash table.
  • To check the tree, the authors do a post-order traversal of the completed majority rule tree, recursively computing the cardinality of the bipartition at each node, and checking that these cardinalities match those in the corresponding records in the hash table.
  • Whenever B1 or B2 was encountered during the first stage of the algorithm, the count for B was incremented.
  • Notice that since B1, B2 have the same cardinality, one cannot be the ancestor of the other; so the two sets S(B1), S(B2) are disjoint.

2.4 Analysis Summary

  • The majority rule consensus tree algorithm runs in O(tn) time.
  • It does two traversals of the input set, and every time it visits a node it does a constant number of operations, each of which requires constant expected time (again, assuming that w = O(lg x)).
  • The final check of the majority tree takes O(n) time.
  • The probability that any double collision occurs is 1/c, where c is the constant such that m2 > ctn.
  • Thus the probability that the algorithm succeeds on its first try is 1 − 1/c, the probability that r attempts will be required decreases exponentially with r, and the expected number of attempts is less than two.

3 Weighted Trees

  • The majority rule tree has an interesting characterization as the median of the set of input trees, which is useful for extending the definition to weighted trees.
  • Now consider the medial weight of each bipartition over all input trees, including those that do not contain the bipartition.
  • Note that it is simple, although space-consuming, to compute this median weight for each majority bipartition in O(nt) time.
  • In the second pass through the set of input trees, the authors store the weights for each majority edge in a linked list, as they are encountered.
  • Since there are O(n) majority bipartitions and t trees the number of weights stored is O(nt).

4 Implementation

  • The authors majority rule consensus tree algorithm is implemented as part of their treeset visualization system, which in turn is implemented within Mesquite [10].
  • Mesquite is a framework for phylogenetic analysis written by Wayne and David Maddison, available for download at their Web site [10].
  • Mesquite is organized into cooperating of modules.
  • The authors visualization system has been implemented in such a module, TreeSetVisualization, the first published version of which can be downloaded from their webpage [1].
  • The majority tree implementation will be part of the next version of the module.

5 Acknowledgments

  • The first author was also supported by an Alfred P. Sloan Foundation Research Fellowship.
  • The authors thank Jeff Klingner for the tree set visualization module and Wayne and David Maddison for Mesquite, and for encouraging us to consider the majority tree.
  • The second and third authors would like to thank the Department of Computer Sciences and the Center for Computational Biology and Bioinformatics at University of Texas, and the Computer Science Department at the University of California, Davis for hosting them for several visits during 2002 and 2003.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Linear-Time Majority Tree Algorithm
Nina Amenta
1
, Frederick Clarke
2
, and Katherine St. John
2,3
1
Computer Science Department
University of California, 2063 Engineering II
One Sheilds Ave, Davis, CA 95616.
amenta@cs.ucdavis.edu
2
Dept. of Mathematics & Computer Science
Lehman College– City University of New York
Bronx, NY 12581
fclarke72@aol.com, stjohn@lehman.cuny.edu
3
Department of Computer Science
CUNY Graduate Center, New York, NY 10016
Abstract. We give a randomized linear-time algorithm for computing
themajorityruleconsensustree.Themajorityruletreeiswidelyused
for summarizing a set of phylogenetic trees, which is usually a post-
processing step in constructing a phylogeny. We are implementing the
algorithm as part of an interactive visualization system for exploring dis-
tributions of trees, where speed is a serious concern for real-time interac-
tion. The linear running time is achieved by using succinct representation
of the subtrees and efficient methods for the final tree reconstruction.
1 Introduction
Making sense of large quantities of data is a fundamental challenge in com-
putational biology in general and phylogenetics in particular. With the recent
explosion in the amount of genomic data available, and exponential increases in
computing power, biologists are now able to consider larger scale problems in
phylogeny: that is, the construction of evolutionary trees on hundreds or thou-
sands of taxa, and ultimately of the entire “Tree of Life” which would include
millions of taxa. One difficulty with this program is that most programs used
for phylogeny reconstruction [8,9,17] are based upon heuristics for NP-hard opti-
mization problems, and instead of producing a single optimal tree they generally
output hundreds or thousands of likely candidates for the optimal tree. The usual
way this large volume of data is summarized is with a consensus tree.
A consensus tree for a set of input trees is a single tree which includes features
on which all or most of the input trees agree. There are several kinds of consensus
trees. The simplest is the strict consensus tree, which includes only nodes that
appear in all of the input trees. A node here is identified by the set of taxa in the
subtree rooted at the node; the roots of two subtrees with different topologies,
but on the same subset of taxa, are considered the same node. For some sets
of input trees, the strict consensus tree works well, but for others, it produces
G. Benson and R. Page (Eds.): WABI 2003, LNBI 2812, pp. 216–227, 2003.
c
Springer-Verlag Berlin Heidelberg 2003

A Linear-Time Majority Tree Algorithm 217
Fig. 1. The tree visualization module in Mesquite. The window on the left shows a
projection of the distribution of trees. The user interactively selects subsets of trees
with the mouse, and, in response, the consensus tree of the subset is computed on-the-
fly and displayed in the window on the right. Two selected subsets and their majority
trees are shown.
a tree with very few interior (non-terminal) nodes, since if a node is missing in
even one input tree it is not in the strict consensus. The majority rule consensus
tree includes all nodes that appear in a majority of input trees, rather than all
of them. The majority rule tree is interesting for a much broader range of inputs
than the strict consensus tree. Other kinds of consensus tree, such as Adams
consensus, are also used (see [3], §6.2, for an excellent overview of consensus
methods). The maximum agreement subtree, which includes a maximal subset
of taxa for which the subtrees induced by the input trees agree, gives meaningful
results in some cases in which the majority rule tree does not, but the best
algorithm has an O(tn
3
+n
d
) running time [7] (where d is the maximum outdegree
of the trees), which is not as practical for large trees as the majority rule tree.
Much recent work has been done on the related question of combining trees on
overlapping, but not identical, sets of taxa ([2,13,14,15,16]).
In this paper, we present a randomized algorithm to compute the majority
rule consensus tree, where the expected running time is linear both in the number
t of trees and in the number n of taxa. Earlier algorithms were quadratic in n,
which will be problematic for larger phylogenies. Our O(tn) expected running
time is optimal, since just reading a set of t trees on n taxa requires (tn)
time. The expectation in the running time is over random choices made during

218 N. Amenta, F. Clarke, and K. St. John
the course of the algorithm, independent of the input; thus, on any input, the
running time is linear with high probability.
We were motivated to find an efficient algorithm for the majority rule tree,
because we wanted to compute it on-the-fly in an interactive visualization appli-
cation [1]. The goal of the visualization system is to give the user a more sensitive
description of the distribution of a set of trees than can be presented with a sin-
gle consensus tree. Figure 1 shows a screen shot. The window on the left shows
a representation of the distribution of trees, where each point corresponds to a
tree. The user interactively selects subsets of trees and, in response, the consen-
sus tree of the subset is computed on-the-fly and displayed. This package is built
as a module within Mesquite [10], a framework for phylogenetic computation by
Wayne and David Maddison. See Section 4 for more details.
Our original version of the visualization system computed only strict con-
sensus trees. We found in our prototype implementation that a simple O(tn
2
)
algorithm for the strict consensus tree was unacceptably slow for real-time in-
teraction, and we implemented instead the O(tn) strict consensus algorithm of
Day [6]. This inspired our search for a linear-time majority tree algorithm.
Having an algorithm which is efficient in t is essential, and most earlier al-
gorithms focus on this. Large sets of trees arise given any kind of input data on
the taxa (e.g. gene sequence, gene order, character) and whatever optimization
criterion is used to select the “best” tree. The heuristic searches used for max-
imizing parsimony often return large sets of trees with equal parsimony scores.
Maximum likelihood estimation, also computationally hard, generally produces
trees with unique scores. While technically one of these is the optimal tree, there
are many others for which the likelihood is only negligibly sub-optimal. So, the
output of the computation is again more accurately represented by a consensus
tree.
Handling larger sets of taxa is also becoming increasingly important. Maxi-
mum parsimony and maximum likelihood have been used on sets of about 500
taxa, while researchers are exploring other methods, including genetic algorithms
and super-tree methods, for constructing very large phylogenies, with the ulti-
mate goal of estimating the entire “Tree of Life”. Our visualization system is
designed to support both kinds of projects. It is also important for the visual-
ization application to have an algorithm which is efficient when n>t,sothat
when a user selects a small subset of trees on many taxa some efficiency can be
realized.
1.1 Notation
Let S represent a set of taxa, with |S| = n.LetT = {T
1
,T
2
,...,T
t
} be the
input set of trees, each with n leaves labeled by S,with|T | = t.
Without loss of generality, we assume the input trees are rooted at the branch
connecting a distinguished taxon s
0
,knownastheoutgroup,totherestofthe
tree. If T is given as unrooted trees, or trees rooted arbitrarily, we choose an
arbitrary taxon as s
0
and use it to root (or re-root) the trees.

A Linear-Time Majority Tree Algorithm 219
s
0
s
1
s
2
s
3
s
4
s
0
s
1
s
2
s
3
s
4
s
0
s
1
s
2
s
3
s
4
s
0
s
1
s
2
s
3
s
4
T
1
T
2
T
3
Majority rule
consensus tree
Fig. 2. Three input trees, rooted at the branch connecting s
0
,andtheirmajoritytree
(for a > 1/2 majority). The input trees need not be binary.
Consider a node i in an input tree T
j
. Removing the branch from i towards
the root divides T
j
into the subtree below i and the remainder of the tree (in-
cluding s
0
). The induced bipartition of the taxa set into two subsets identifies
the combinatorial type of node i. We can represent the bipartition by the subset
of taxa which does not include s
0
; that is, by the taxa at the leaves of the sub-
tree rooted at i.IfB is the bipartition, this set is S(B). We will says that the
cardinality of B,andofi, is the cardinality of S(B). For example, in Figure 2,
s
1
s
2
| s
0
s
3
s
4
s
5
is a bipartition of tree T
1
and S(s
1
s
2
| s
0
s
3
s
4
s
5
)={s
1
s
2
}.The
cardinality of this bipartition is 2.
The majority rule tree,orM
l
tree, includes nodes for exactly those bipar-
titions which occur in more than half of the input trees, or more generally in
more than some fraction l of the input trees. Margush and McMorris [11] showed
that this set of bipartitions does indeed constitute a tree for any 1/2 <l 1.
McMorris, Meronk and Neumann [12] called this family of trees the M
l
trees
(e.g. the M
1
tree is the strict consensus tree); we shall call them all generically
majority rule trees, regardless of the size of the majority.
See Figure 2 for a simple example. While this example shows binary trees, the
algorithm also works for input trees with polytomies (internal nodes of degree
greater than three).
1.2 Prior Work
Our algorithm follows the same intuitive scheme as most previous algorithms.
In the first stage, we read through the input trees and count the occurrences
of each bipartition, storing the counts in a table. Then, in the second stage, we
create nodes for the bipartitions that occur in a majority of input trees - the
majority nodes - and “hook them together” into a tree.
An algorithm along these lines is implemented in PHYLIP [8] by Felsenstein
et al.. The overall running time as implemented seems to be O((n/w)(tn+x lg x+
n
2
)) where x is the number of bipartitions found (O(tn) in the worst case, but
often O(n)), and w is the number of bits in a machine word. The bipartition B

220 N. Amenta, F. Clarke, and K. St. John
of each of tn input nodes is represented as a bit-string:astringofn bits, one
per taxon, with a one for every taxon in S(B) set and a zero for every taxon not
in S(B). This requires n/w machine words per node, and accounts for (n/w)
factor in the bound. The first term is for counting the bipartitions. The x lg x
term is for sorting the bipartitions by the number of times each appears; it could
be eliminated if the code was intended only to compute majority trees. The n
2
term is the running time for the subroutine for hooking together the majority
nodes. For each majority node, every other majority node is tested to see if it is
its parent, each in n/w time.
For the strict consensus tree, Day’s deterministic algorithm uses a clever
O((lg x)/w) representation for bipartitions. If we assume that the size of a ma-
chine word is O(lg x), so that for instance we can compare two bipartitions in
O(1) time, then we say that Day’s algorithm achieves an optimal O(tn) running
time. Day’s algorithm does not seem to generalize to other M
l
trees, however.
Wareham, in his undergraduate thesis at the Memorial University of Newfound-
land with Day [18], developed an O(n
2
+ t
2
n) algorithm, which only uses O(n)
space. It uses Day’s data structure to test each bipartition encountered sepa-
rately against all of the other input trees. Majority trees are also computed by
PAUP [17], using an unknown (to us) algorithm.
Our algorithm follows the same general scheme, but we introduce a new
representation for each bipartition of size O((lg x)/w) O(1), giving an O(tn)
algorithm for the first counting step, and we also give an O(tn) algorithm for
hooking together the majority nodes.
2 Majority Rule Tree Algorithm
Our algorithm has two main stages: scanning the trees to find the majority
bipartitions (details in Section 2.1) and then constructing the majority rule tree
from these bipartitions (details in Section 2.2). It ends by checking the output
tree for errors due to (very unlikely) bad random choices. Figure 3 contains
pseudo-code for the algorithm.
2.1 Finding Majority Bipartitions
In the first stage of the algorithm, we traverse each input tree in post-order,
determining each bipartition as we complete the traversal of its subtree. We count
the number of times each bipartition occurs, storing the counts in a table. With
the record containing the count, we also store the cardinality of the bipartition,
which turns out to be needed as well.
A first thought might be to use the bit-string representation of a bipartition
as an address into the table of counts, but this would be very space-inefficient:
there are at most O(tn) distinct bipartitions, but 2
n
possible bit-strings. A better
idea, used in our algorithm and in PHYLIP, is to store the counts in a hash-table.

Citations
More filters
Journal ArticleDOI
TL;DR: This article proposes stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and reports on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of the proposed criteria.
Abstract: Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.

699 citations

Book ChapterDOI
14 May 2009
TL;DR: This paper proposes stopping criteria, that is, thresholds computed at runtime to determine when enough replicates have been generated, and reports on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of the proposed criteria.
Abstract: Phylogenetic Bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on the most popular, Maximum Likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the quality of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1---2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this paper, we propose stopping criteria , that is, thresholds computed at runtime to determine when enough replicates have been generated, and report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA, single-gene as well as multi-gene, datasets, that include between 125 and 2,554 sequences. We find that our stopping criteria typically stop computations after 100---500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus two-fold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through bootstrapping; and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100---500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2 and BS with our stopping criteria is included in RAxML 7.1.0.

567 citations

Book
02 Nov 2017
TL;DR: The author provides key analytical techniques to prove theoretical properties about methods, as well as addressing performance in practice for methods for estimating trees, in the broad and exciting field of computational phylogenetics.
Abstract: A comprehensive account of both basic and advanced material in phylogeny estimation, focusing on computational and statistical issues. No background in biology or computer science is assumed, and there is minimal use of mathematical formulas, meaning that students from many disciplines, including biology, computer science, statistics, and applied mathematics, will find the text accessible. The mathematical and statistical foundations of phylogeny estimation are presented rigorously, following which more advanced material is covered. This includes substantial chapters on multi-locus phylogeny estimation, supertree methods, multiple sequence alignment techniques, and designing methods for large-scale phylogeny estimation. The author provides key analytical techniques to prove theoretical properties about methods, as well as addressing performance in practice for methods for estimating trees. Research problems requiring novel computational methods are also presented, so that graduate students and researchers from varying disciplines will be able to enter the broad and exciting field of computational phylogenetics.

115 citations

Journal ArticleDOI
TL;DR: A systematic 'divide and conquer' methodology for analyzing three-dimensional (3D) multi-parameter images of brain tissue to delineate and classify key structures, and compute quantitative associations among them is presented.

115 citations

Journal ArticleDOI
TL;DR: A randomized approximation scheme that provides, in sublinear time and with high probability, a (1 + epsilon) approximation of the true RF metric, and gives a unified framework for edge-based tree algorithms in which implementation tradeoffs are clear.
Abstract: The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, in sublinear time and with high probability, a (1 + ɛ) approximation of the true RF metric. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We complement our algorithm by presenting an efficient embedding procedure, thereby resolving an open issue from the preliminary version of this paper. We have also improved the performance of Day's (exact) algorithm in practice by using techniques discovered while implementing our approximation scheme. Indeed, we give a unified framework for edge-based tree algorithms in which implementation tradeoffs are clear. Finally, we p...

76 citations


Cites background from "A Linear-Time Majority Tree Algorit..."

  • ...It is possible to hash edges more conventionally [Amenta et al. (2003)]....

    [...]

References
More filters
Book
01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Abstract: From the Publisher: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures. Like the first edition,this text can also be used for self-study by technical professionals since it discusses engineering issues in algorithm design as well as the mathematical aspects. In its new edition,Introduction to Algorithms continues to provide a comprehensive introduction to the modern study of algorithms. The revision has been updated to reflect changes in the years since the book's original publication. New chapters on the role of algorithms in computing and on probabilistic analysis and randomized algorithms have been included. Sections throughout the book have been rewritten for increased clarity,and material has been added wherever a fuller explanation has seemed useful or new information warrants expanded coverage. As in the classic first edition,this new edition of Introduction to Algorithms presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers. Further,the algorithms are presented in pseudocode to make the book easily accessible to students from all programming language backgrounds. Each chapter presents an algorithm,a design technique,an application area,or a related topic. The chapters are not dependent on one another,so the instructor can organize his or her use of the book in the way that best suits the course's needs. Additionally,the new edition offers a 25% increase over the first edition in the number of problems,giving the book 155 problems and over 900 exercises thatreinforcethe concepts the students are learning.

21,651 citations

Journal ArticleDOI
TL;DR: The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo, and an executable is available at http://brahms.rochester.edu/software.html.
Abstract: Summary: The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo. Availability: MRBAYES, including the source code, documentation, sample data files, and an executable, is available at http://brahms.biology.rochester.edu/software.html.

20,627 citations


Additional excerpts

  • ...One difficulty with this program is that most programs used for phylogeny reconstruction [8,9,17] are based upon heuristics for NP-hard optimization problems, and instead of producing a single optimal tree they generally output hundreds or thousands of likely candidates for the optimal tree....

    [...]

Journal Article

16,851 citations

Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "A linear-time majority tree algorithm" ?

Benson et al. this paper presented a linear time algorithm for computing the majority rule consensus tree, where the expected running time is linear both in the number t of trees and in n of taxa.