The neighbor-joining method: a new method for reconstructing phylogenetic trees.

doi:10.1093/OXFORDJOURNALS.MOLBEV.A040454

Home
/
Papers
/
The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Journal Article•DOI•

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Naruya Saitou¹, Masatoshi Nei•Institutions (1)

University of Texas Health Science Center at Houston¹

01 Jul 1987-Molecular Biology and Evolution (Oxford University Press)-Vol. 4, Iss: 4, pp 406-425

TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.

read less

Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

[...]

Julie D. Thompson, Desmond G. Higgins, Toby J. Gibson

11 Nov 1994-Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

...read moreread less

63,427 citations

Journal Article•DOI•

MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

[...]

Koichiro Tamura¹, Daniel S. Peterson², Nicholas Peterson², Glen Stecher², Masatoshi Nei³, Sudhir Kumar² - Show less +2 more•Institutions (3)

Tokyo Metropolitan University¹, Arizona State University², Pennsylvania State University³

01 Oct 2011-Molecular Biology and Evolution

TL;DR: The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models, inferring ancestral states and sequences, and estimating evolutionary rates site-by-site.

...read moreread less

Abstract: Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

...read moreread less

39,110 citations

Cites methods from "The neighbor-joining method: a new ..."

...MEGA5 automatically infers the evolutionary tree by the NeighborJoining (NJ) algorithm that uses a matrix of pairwise distances estimated under the Jones–Thornton–Taylor (JTT) model for amino acid sequences or the Tamura and Nei (1993) model for nucleotide sequences (Saitou and Nei 1987; Jones et al. 1992; Tamura and Nei 1993; Tamura et al. 2004)....
[...]
...…or generated automatically by applying NJ and BIONJ algorithms to a matrix of pairwise distances estimated using a maximum composite likelihood approach for nucleotide sequences and a JTT model for amino acid sequences (Saitou and Nei 1987; Jones et al. 1992; Gascuel 1997; Tamura et al. 2004)....
[...]
...…the NeighborJoining (NJ) algorithm that uses a matrix of pairwise distances estimated under the Jones–Thornton–Taylor (JTT) model for amino acid sequences or the Tamura and Nei (1993) model for nucleotide sequences (Saitou and Nei 1987; Jones et al. 1992; Tamura and Nei 1993; Tamura et al. 2004)....
[...]
...The initial tree for the ML search can be supplied by the user (Newick format) or generated automatically by applying NJ and BIONJ algorithms to a matrix of pairwise distances estimated using a maximum composite likelihood approach for nucleotide sequences and a JTT model for amino acid sequences (Saitou and Nei 1987; Jones et al. 1992; Gascuel 1997; Tamura et al. 2004)....
[...]

Journal Article•DOI•

MUSCLE: multiple sequence alignment with high accuracy and high throughput

[...]

Robert C. Edgar

01 Mar 2004-Nucleic Acids Research

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.

...read moreread less

Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

...read moreread less

37,524 citations

Cites methods from "The neighbor-joining method: a new ..."

...Distance matrices are clustered using UPGMA (11), which we ®nd to give slightly improved results over neighbor-joining (12), despite the expectation that neighbor-joining will give a more reliable estimate of the evolutionary tree....
[...]

Journal Article•DOI•

MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets

[...]

Sudhir Kumar¹, Glen Stecher², Koichiro Tamura³•Institutions (3)

King Abdulaziz University¹, Temple University², Tokyo Metropolitan University³

22 Mar 2016-Molecular Biology and Evolution

TL;DR: The latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine, has been optimized for use on 64-bit computing systems for analyzing larger datasets.

...read moreread less

Abstract: We present the latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, Mega has been optimized for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of thousands of sequences in Mega The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit Mega is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OS X. The command line Mega is available as native applications for Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

...read moreread less

33,048 citations

Cites methods from "The neighbor-joining method: a new ..."

...For the Neighbor-Joining (NJ) method (Saitou and Nei 1987), memory usage increased at a polynomial rate as the number of sequences was increased....
[...]

Journal Article•DOI•

MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0

[...]

Koichiro Tamura¹, Joel T. Dudley¹, Masatoshi Nei², Sudhir Kumar¹•Institutions (2)

Arizona State University¹, Pennsylvania State University²

01 Aug 2007-Molecular Biology and Evolution

TL;DR: Version 4 of MEGA software expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses.

...read moreread less

Abstract: We announce the release of the fourth version of MEGA software, which expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses. Version 4 includes a unique facility to generate captions, written in figure legend format, in order to provide natural language descriptions of the models and methods used in the analyses. This facility aims to promote a better understanding of the underlying assumptions used in analyses, and of the results generated. Another new feature is the Maximum Composite Likelihood (MCL) method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages. This MCL method also can be used to estimate transition/transversion bias and nucleotide substitution pattern without knowledge of the phylogenetic tree. This new version is a native 32-bit Windows application with multi-threading and multi-user supports, and it is also available to run in a Linux desktop environment (via the Wine compatibility layer) and on Intel-based Macintosh computers under the Parallels program. The current version of MEGA is available free of charge at (http://www.megasoftware.net).

...read moreread less

29,021 citations

Cites methods from "The neighbor-joining method: a new ..."

...the Neighbor-Joining method ( Saitou and Nei 1987 ), as the use of the MCL distances leads to a...
[...]
...…from https://academic.oup.com/mbe/article-abstract/24/8/1596/1105236 by Zhejiang University user on 26 June 2018 Neighbor-Joining method (Saitou and Nei 1987), as the use of the MCL distances leads to a much higher accuracy (Tamura, Nei, and Kumar 2004)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•DOI•

Major Patterns in Vertebrate Evolution

[...]

W. Frank Blair, Max K. Hecht, Peter Charles Goody, Bessie M. Hecht

01 Jan 1977-Copeia

377 citations

Journal Article•DOI•

Minimum mutation fits to a given tree

[...]

J. A. Hartigan

01 Mar 1973-Biometrics

TL;DR: A method of generating all such minimum mutation fits is described, which is the assignment which permits representation of the data in a minimum number of symbols, which seems compelling in its own right.

...read moreread less

Abstract: SUMMARY A number of objects, such as species, lie at the ends of a known evolutionary tree. A variable taking a finite number of possible values is specified on this set of objects. How can the values of the variable be estimated for the ancestors of the objects? One way is to assign to the ancestors those values which have the minimum number of mutations (or changes) in going from ancestors to their immediate descendants. In this paper, a method of generating all such minimum mutation fits is described. An evolutionary model for a set of objects is a family tree of possibly hypothetical ancestors through which each object may be traced back to the same primordial ancestor. Evolutionary models are used in the classification of plant and animal life, languages, motor cars, cultures, religions. The construction of the family tree is a difficult problem requiring synthesis of many types of knowledge. Suppose that the family tree is given, and that a variable V (such as number of limbs, for animals) is given for the set of objects (such as species, or families) at the ends of the tree. What values will V take for the hypothetical ancestors? A complete answer to this question is a probability distribution over the set of all possible values that the ancestors might take. A more modest answer is to assign values of V to the ancestors in such a way that the minimum number of changes in V occur, between ancestors and their immediate descendants. This "minimum mutation" fit is most likely under some reasonable probability models, but seems compelling in its own right. It is the assignment which permits representation of the data in a minimum number of symbols. Camin and Sokal [1965] consider the problem Qf finding an evolutionary tree when each variable has an ordered set of values, and mutation can only take place from a lower to a higher value. Estabrook [1968] extends this structure on the values of the variable to be a partial order with tree structure-for each variable, an evolutionary tree is known connecting the values. In both of these formulations, the minimum mutation fit to a given tree is not a serious problem. The optimal value for an ancestor is always the most primitive value in its descendants. Cavalli-Sforza and Edwards [1967] consider minimum mutation fits 53

...read moreread less

253 citations

"The neighbor-joining method: a new ..." refers methods in this paper

...However, since the algorithm turns out to be very similar to that of Hartigan (1973), we shall not present it here....
[...]

Journal Article•DOI•

The number of nucleotides required to determine the branching order of three species, with special reference to the human-chimpanzee-gorilla divergence.

[...]

Naruya Saitou¹, Masatoshi Nei¹•Institutions (1)

University of Texas Health Science Center at Houston¹

01 Jan 1986-Journal of Molecular Evolution

TL;DR: The probability of obtaining the correct tree (topology) from nucleotide sequence data is evaluated using models of evolutionary trees that are close to the tree of mitochondrial DNAs from human, chimpanzee, gorilla, orangutan, and gibbon.

...read moreread less

Abstract: A mathematical theory for computing the probabilities of various nucleotide configurations among related species is developed, and the probability of obtaining the correct tree (topology) from nucleotide sequence data is evaluated using models of evolutionary trees that are close to the tree of mitochondrial DNAs from human, chimpanzee, gorilla, orangutan, and gibbon. Special attention is given to the number of nucleotides required to resolve the branching order among the three most closely related organisms (human, chimpanzee, and gorilla). If the extent of DNA divergence is close to that obtained by Brown et al. for mitochondrial DNA and if sequence data are available only for the three most closely related organisms, the number of nucleotides (m*) required to obtain the correct tree with a probability of 95% is about 4700. If sequence data for two outgroup species (orangutan and gibbon) are available, m* becomes about 2600–2700 when the transformed distance, distance-Wagner, maximum parsimony, or compatibility method is used. In the unweighted pair-group method, m* is not affected by the availability of data from outgroup species. When these five different tree-making methods, as well as Fitch and Margoliash's method, are applied to the mitochondrial DNA data (1834 bp) obtained by Brown et al. and by Hixson and Brown, they all give the same phylogenetic tree, in which human and chimpanzee are most closely related. However, the trees considered here are “gene trees,” and to obtain the correct “species tree,” sequence data for several independent loci must be used.

...read moreread less

201 citations

Journal Article•DOI•

Simple method for constructing phylogenetic trees from distance matrices.

[...]

Wen-Hsiung Li

01 Feb 1981-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The present method appears to be preferable to the UPG method for analysis of data from populations that have not differentiated much and an application of the present method to gene frequency data from some Amerindian populations gives a tree topology far more reasonable than that obtained by theUPG method.

...read moreread less

Abstract: A simple method is proposed for constructing phylogenetic trees from distance matrices. The procedure for constructing tree topologies is similar to that of the unweighted pair-group method (UPG method) but makes corrections for unequal rates of evolution among lineages. The procedure for estimating branch lengths is the same as that of the Fitch and Margoliash method (F-M method) except that it allows no negative branch lengths. The performance of the present procedure for the construction of tree topologies is compared with that of the UPG method, the F-M method, Farris' method, and the modified Farris method by using Tateno's simulation outputs for nucleotide sequence divergence and his results for the performances of the latter four methods [Tateno, Y. (1978) Dissertation (Univ. Texas, Houston, TX). In this limited comparison, the present method performs considerably better than the UPG method and the F-M method and about equally well as the last two methods. The present method appears to be preferable to the UPG method for analysis of data from populations that have not differentiated much. Indeed, an application of the present method to gene frequency data from some Amerindian populations gives a tree topology far more reasonable than that obtained by the UPG method.

...read moreread less

170 citations

Book Chapter•DOI•

On the Phenetic Approach to Vertebrate Classification

[...]

James S. Farris¹•Institutions (1)

State University of New York System¹

01 Jan 1977

TL;DR: I shall devote most of my discussion to attempts to elucidate what appear to me to be the most fundamental principles of phenetic taxonomy and to obviate the purely terminological aspects of the debate through an evaluation of both phenetic and non-phenetic taxonomic methods on the basis of these principles.

...read moreread less

Abstract: I consider the general subject of phenetic classification to possess two major subdivisions. The first is the matter of definition: what is meant by phenetic classification? The second is the matter of motivation: on what grounds do pheneticists advocate their particular methods for constructing classifications? The question of motivation can be looked at in two ways. First, what principles are involked by pheneticists in selecting the methods which they advocate; and second, what drawbacks do pheneticists ascribe to the methods of classification proposed by other schools of taxonomy? The definition of phenetic taxonomy is necessarily purely a matter of convention, and I shall therefore consider it only in enough detail to avoid ambiguity. The motivations of phenetic taxonomy are of much greater importance, for they touch on the long-standing debate among taxonomists of the phenetic, phylogenetic, and evolutionary schools concerning the proper basis upon which to select classificatory methods. This debate has been perpetuated at least in part by the tendency of some reviewers (for example, Mayr, 1974; Sokal, 1975) to criticize the principles of other schools of taxonomy on a superficial, terminological level. I shall devote most of my discussion to attempts to elucidate what appear to me to be the most fundamental principles of phenetic taxonomy and to obviate the purely terminological aspects of the debate through an evaluation of both phenetic and non-phenetic taxonomic methods on the basis of these principles.

...read moreread less

127 citations

"The neighbor-joining method: a new ..." refers methods in this paper

...Some examples are the distance Wagner (DW) method (Farris 1972), modified Farris (MF) methods (Tateno et al. 1982; Faith 1985), and the neighborliness methods of Sattath and Tversky (ST method; 1977) and Fitch ( 198 1)....
[...]
...…simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Far-r-is’s method, Sattath and Tversky’s method, Li’s method, and Tateno et al.‘s modified Fan-is…...
[...]