scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

01 Jul 1987-Molecular Biology and Evolution (Oxford University Press)-Vol. 4, Iss: 4, pp 406-425
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations

Journal ArticleDOI
TL;DR: The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models, inferring ancestral states and sequences, and estimating evolutionary rates site-by-site.
Abstract: Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

39,110 citations


Cites methods from "The neighbor-joining method: a new ..."

  • ...MEGA5 automatically infers the evolutionary tree by the NeighborJoining (NJ) algorithm that uses a matrix of pairwise distances estimated under the Jones–Thornton–Taylor (JTT) model for amino acid sequences or the Tamura and Nei (1993) model for nucleotide sequences (Saitou and Nei 1987; Jones et al. 1992; Tamura and Nei 1993; Tamura et al. 2004)....

    [...]

  • ...…or generated automatically by applying NJ and BIONJ algorithms to a matrix of pairwise distances estimated using a maximum composite likelihood approach for nucleotide sequences and a JTT model for amino acid sequences (Saitou and Nei 1987; Jones et al. 1992; Gascuel 1997; Tamura et al. 2004)....

    [...]

  • ...…the NeighborJoining (NJ) algorithm that uses a matrix of pairwise distances estimated under the Jones–Thornton–Taylor (JTT) model for amino acid sequences or the Tamura and Nei (1993) model for nucleotide sequences (Saitou and Nei 1987; Jones et al. 1992; Tamura and Nei 1993; Tamura et al. 2004)....

    [...]

  • ...The initial tree for the ML search can be supplied by the user (Newick format) or generated automatically by applying NJ and BIONJ algorithms to a matrix of pairwise distances estimated using a maximum composite likelihood approach for nucleotide sequences and a JTT model for amino acid sequences (Saitou and Nei 1987; Jones et al. 1992; Gascuel 1997; Tamura et al. 2004)....

    [...]

Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations


Cites methods from "The neighbor-joining method: a new ..."

  • ...Distance matrices are clustered using UPGMA (11), which we ®nd to give slightly improved results over neighbor-joining (12), despite the expectation that neighbor-joining will give a more reliable estimate of the evolutionary tree....

    [...]

Journal ArticleDOI
TL;DR: The latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine, has been optimized for use on 64-bit computing systems for analyzing larger datasets.
Abstract: We present the latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, Mega has been optimized for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of thousands of sequences in Mega The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit Mega is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OS X. The command line Mega is available as native applications for Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

33,048 citations


Cites methods from "The neighbor-joining method: a new ..."

  • ...For the Neighbor-Joining (NJ) method (Saitou and Nei 1987), memory usage increased at a polynomial rate as the number of sequences was increased....

    [...]

Journal ArticleDOI
TL;DR: Version 4 of MEGA software expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses.
Abstract: We announce the release of the fourth version of MEGA software, which expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses. Version 4 includes a unique facility to generate captions, written in figure legend format, in order to provide natural language descriptions of the models and methods used in the analyses. This facility aims to promote a better understanding of the underlying assumptions used in analyses, and of the results generated. Another new feature is the Maximum Composite Likelihood (MCL) method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages. This MCL method also can be used to estimate transition/transversion bias and nucleotide substitution pattern without knowledge of the phylogenetic tree. This new version is a native 32-bit Windows application with multi-threading and multi-user supports, and it is also available to run in a Linux desktop environment (via the Wine compatibility layer) and on Intel-based Macintosh computers under the Parallels program. The current version of MEGA is available free of charge at (http://www.megasoftware.net).

29,021 citations


Cites methods from "The neighbor-joining method: a new ..."

  • ...the Neighbor-Joining method ( Saitou and Nei 1987 ), as the use of the MCL distances leads to a...

    [...]

  • ...…from https://academic.oup.com/mbe/article-abstract/24/8/1596/1105236 by Zhejiang University user on 26 June 2018 Neighbor-Joining method (Saitou and Nei 1987), as the use of the MCL distances leads to a much higher accuracy (Tamura, Nei, and Kumar 2004)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A method of generating all such minimum mutation fits is described, which is the assignment which permits representation of the data in a minimum number of symbols, which seems compelling in its own right.
Abstract: SUMMARY A number of objects, such as species, lie at the ends of a known evolutionary tree. A variable taking a finite number of possible values is specified on this set of objects. How can the values of the variable be estimated for the ancestors of the objects? One way is to assign to the ancestors those values which have the minimum number of mutations (or changes) in going from ancestors to their immediate descendants. In this paper, a method of generating all such minimum mutation fits is described. An evolutionary model for a set of objects is a family tree of possibly hypothetical ancestors through which each object may be traced back to the same primordial ancestor. Evolutionary models are used in the classification of plant and animal life, languages, motor cars, cultures, religions. The construction of the family tree is a difficult problem requiring synthesis of many types of knowledge. Suppose that the family tree is given, and that a variable V (such as number of limbs, for animals) is given for the set of objects (such as species, or families) at the ends of the tree. What values will V take for the hypothetical ancestors? A complete answer to this question is a probability distribution over the set of all possible values that the ancestors might take. A more modest answer is to assign values of V to the ancestors in such a way that the minimum number of changes in V occur, between ancestors and their immediate descendants. This "minimum mutation" fit is most likely under some reasonable probability models, but seems compelling in its own right. It is the assignment which permits representation of the data in a minimum number of symbols. Camin and Sokal [1965] consider the problem Qf finding an evolutionary tree when each variable has an ordered set of values, and mutation can only take place from a lower to a higher value. Estabrook [1968] extends this structure on the values of the variable to be a partial order with tree structure-for each variable, an evolutionary tree is known connecting the values. In both of these formulations, the minimum mutation fit to a given tree is not a serious problem. The optimal value for an ancestor is always the most primitive value in its descendants. Cavalli-Sforza and Edwards [1967] consider minimum mutation fits 53

253 citations


"The neighbor-joining method: a new ..." refers methods in this paper

  • ...However, since the algorithm turns out to be very similar to that of Hartigan (1973), we shall not present it here....

    [...]

Journal ArticleDOI
TL;DR: The probability of obtaining the correct tree (topology) from nucleotide sequence data is evaluated using models of evolutionary trees that are close to the tree of mitochondrial DNAs from human, chimpanzee, gorilla, orangutan, and gibbon.
Abstract: A mathematical theory for computing the probabilities of various nucleotide configurations among related species is developed, and the probability of obtaining the correct tree (topology) from nucleotide sequence data is evaluated using models of evolutionary trees that are close to the tree of mitochondrial DNAs from human, chimpanzee, gorilla, orangutan, and gibbon. Special attention is given to the number of nucleotides required to resolve the branching order among the three most closely related organisms (human, chimpanzee, and gorilla). If the extent of DNA divergence is close to that obtained by Brown et al. for mitochondrial DNA and if sequence data are available only for the three most closely related organisms, the number of nucleotides (m*) required to obtain the correct tree with a probability of 95% is about 4700. If sequence data for two outgroup species (orangutan and gibbon) are available, m* becomes about 2600–2700 when the transformed distance, distance-Wagner, maximum parsimony, or compatibility method is used. In the unweighted pair-group method, m* is not affected by the availability of data from outgroup species. When these five different tree-making methods, as well as Fitch and Margoliash's method, are applied to the mitochondrial DNA data (1834 bp) obtained by Brown et al. and by Hixson and Brown, they all give the same phylogenetic tree, in which human and chimpanzee are most closely related. However, the trees considered here are “gene trees,” and to obtain the correct “species tree,” sequence data for several independent loci must be used.

201 citations

Journal ArticleDOI
TL;DR: The present method appears to be preferable to the UPG method for analysis of data from populations that have not differentiated much and an application of the present method to gene frequency data from some Amerindian populations gives a tree topology far more reasonable than that obtained by theUPG method.
Abstract: A simple method is proposed for constructing phylogenetic trees from distance matrices. The procedure for constructing tree topologies is similar to that of the unweighted pair-group method (UPG method) but makes corrections for unequal rates of evolution among lineages. The procedure for estimating branch lengths is the same as that of the Fitch and Margoliash method (F-M method) except that it allows no negative branch lengths. The performance of the present procedure for the construction of tree topologies is compared with that of the UPG method, the F-M method, Farris' method, and the modified Farris method by using Tateno's simulation outputs for nucleotide sequence divergence and his results for the performances of the latter four methods [Tateno, Y. (1978) Dissertation (Univ. Texas, Houston, TX). In this limited comparison, the present method performs considerably better than the UPG method and the F-M method and about equally well as the last two methods. The present method appears to be preferable to the UPG method for analysis of data from populations that have not differentiated much. Indeed, an application of the present method to gene frequency data from some Amerindian populations gives a tree topology far more reasonable than that obtained by the UPG method.

170 citations

Book ChapterDOI
01 Jan 1977
TL;DR: I shall devote most of my discussion to attempts to elucidate what appear to me to be the most fundamental principles of phenetic taxonomy and to obviate the purely terminological aspects of the debate through an evaluation of both phenetic and non-phenetic taxonomic methods on the basis of these principles.
Abstract: I consider the general subject of phenetic classification to possess two major subdivisions. The first is the matter of definition: what is meant by phenetic classification? The second is the matter of motivation: on what grounds do pheneticists advocate their particular methods for constructing classifications? The question of motivation can be looked at in two ways. First, what principles are involked by pheneticists in selecting the methods which they advocate; and second, what drawbacks do pheneticists ascribe to the methods of classification proposed by other schools of taxonomy? The definition of phenetic taxonomy is necessarily purely a matter of convention, and I shall therefore consider it only in enough detail to avoid ambiguity. The motivations of phenetic taxonomy are of much greater importance, for they touch on the long-standing debate among taxonomists of the phenetic, phylogenetic, and evolutionary schools concerning the proper basis upon which to select classificatory methods. This debate has been perpetuated at least in part by the tendency of some reviewers (for example, Mayr, 1974; Sokal, 1975) to criticize the principles of other schools of taxonomy on a superficial, terminological level. I shall devote most of my discussion to attempts to elucidate what appear to me to be the most fundamental principles of phenetic taxonomy and to obviate the purely terminological aspects of the debate through an evaluation of both phenetic and non-phenetic taxonomic methods on the basis of these principles.

127 citations


"The neighbor-joining method: a new ..." refers methods in this paper

  • ...Some examples are the distance Wagner (DW) method (Farris 1972), modified Farris (MF) methods (Tateno et al. 1982; Faith 1985), and the neighborliness methods of Sattath and Tversky (ST method; 1977) and Fitch ( 198 1)....

    [...]

  • ...…simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Far-r-is’s method, Sattath and Tversky’s method, Li’s method, and Tateno et al.‘s modified Fan-is…...

    [...]