A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.

doi:10.1007/BF01731581

Home
/
Papers
/
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.

Journal Article•DOI•

A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.

Motoo Kimura¹•Institutions (1)

National Institute of Genetics¹

01 Dec 1980-Journal of Molecular Evolution (J Mol Evol)-Vol. 16, Iss: 2, pp 111-120

TL;DR: Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.

read less

Abstract: Some simple formulae were obtained which enable us to estimate evolutionary distances in terms of the number of nucleotide substitutions (and, also, the evolutionary rates when the divergence times are known). In comparing a pair of nucleotide sequences, we distinguish two types of differences; if homologous sites are occupied by different nucleotide bases but both are purines or both pyrimidines, the difference is called type I (or “transition” type), while, if one of the two is a purine and the other is a pyrimidine, the difference is called type II (or “transversion” type). Letting P and Q be respectively the fractions of nucleotide sites showing type I and type II differences between two sequences compared, then the evolutionary distance per site is K = — (1/2) ln {(1 — 2P — Q) }. The evolutionary rate per year is then given by k = K/(2T), where T is the time since the divergence of the two sequences. If only the third codon positions are compared, the synonymous component of the evolutionary base substitutions per site is estimated by K'S = — (1/2) ln (1 — 2P — Q). Also, formulae for standard errors were obtained. Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

[...]

Julie D. Thompson, Desmond G. Higgins, Toby J. Gibson

11 Nov 1994-Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

...read moreread less

63,427 citations

Journal Article•DOI•

MODELTEST: testing the model of DNA substitution.

[...]

David Posada¹, Keith A. Crandall•Institutions (1)

Brigham Young University¹

01 Jan 1998-Bioinformatics

TL;DR: The program MODELTEST uses log likelihood scores to establish the model of DNA evolution that best fits the data.

...read moreread less

Abstract: Summary: The program MODELTEST uses log likelihood scores to establish the model of DNA evolution that best fits the data. Availability: The MODELTEST package, including the source code and some documentation is available at http://bioag.byu.edu/zoology/crandall―lab/modeltest.html. Contact: dp47@email.byu.edu.

...read moreread less

20,105 citations

Journal Article•DOI•

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

[...]

Stéphane Guindon¹, Olivier Gascuel¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Oct 2003-Systematic Biology

TL;DR: This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches.

...read moreread less

Abstract: The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. (Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.) The size of homologous sequence data sets has in- creased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. More- over, current probabilistic sequence evolution models (Swofford et al., 1996 ; Page and Holmes, 1998 ), notably those including rate variation among sites (Uzzell and Corbin, 1971 ; Jin and Nei, 1990 ; Yang, 1996 ), require an increasing number of calculations. Therefore, the speed of phylogeny reconstruction methods is becoming a sig- nificant requirement and good compromises between speed and accuracy must be found. The maximum likelihood (ML) approach is especially accurate for building molecular phylogenies. Felsenstein (1981) brought this framework to nucleotide-based phy- logenetic inference, and it was later also applied to amino acid sequences (Kishino et al., 1990). Several vari- ants were proposed, most notably the Bayesian meth- ods (Rannala and Yang 1996; and see below), and the discrete Fourier analysis of Hendy et al. (1994), for ex- ample. Numerous computer studies (Huelsenbeck and Hillis, 1993; Kuhner and Felsenstein, 1994; Huelsenbeck, 1995; Rosenberg and Kumar, 2001; Ranwez and Gascuel, 2002) have shown that ML programs can recover the cor- rect tree from simulated data sets more frequently than other methods can. Another important advantage of the ML approach is the ability to compare different trees and evolutionary models within a statistical framework (see Whelan et al., 2001, for a review). However, like all optimality criterion-based phylogenetic reconstruction approaches, ML is hampered by computational difficul- ties, making it impossible to obtain the optimal tree with certainty from even moderate data sets (Swofford et al., 1996). Therefore, all practical methods rely on heuristics that obtain near-optimal trees in reasonable computing time. Moreover, the computation problem is especially difficult with ML, because the tree likelihood not only depends on the tree topology but also on numerical pa- rameters, including branch lengths. Even computing the optimal values of these parameters on a single tree is not an easy task, particularly because of possible local optima (Chor et al., 2000). The usual heuristic method, implemented in the pop- ular PHYLIP (Felsenstein, 1993 ) and PAUP ∗ (Swofford, 1999 ) packages, is based on hill climbing. It combines stepwise insertion of taxa in a growing tree and topolog- ical rearrangement. For each possible insertion position and rearrangement, the branch lengths of the resulting tree are optimized and the tree likelihood is computed. When the rearrangement improves the current tree or when the position insertion is the best among all pos- sible positions, the corresponding tree becomes the new current tree. Simple rearrangements are used during tree growing, namely "nearest neighbor interchanges" (see below), while more intense rearrangements can be used once all taxa have been inserted. The procedure stops when no rearrangement improves the current best tree. Despite significant decreases in computing times, no- tably in fastDNAml (Olsen et al., 1994 ), this heuristic becomes impracticable with several hundreds of taxa. This is mainly due to the two-level strategy, which sepa- rates branch lengths and tree topology optimization. In- deed, most calculations are done to optimize the branch lengths and evaluate the likelihood of trees that are finally rejected. New methods have thus been proposed. Strimmer and von Haeseler (1996) and others have assembled four- taxon (quartet) trees inferred by ML, in order to recon- struct a complete tree. However, the results of this ap- proach have not been very satisfactory to date (Ranwez and Gascuel, 2001 ). Ota and Li (2000, 2001) described

...read moreread less

16,261 citations

Cites methods from "A simple method for estimating evol..."

...The current version implements several models of nucleotide sequence evolution: JC69 (Jukes and Cantor, 1969 ), F81 (Felsenstein, 1981), K2P (Kimura, 1980), F84 (Felsenstein, 1993), HKY (Hasegawa et al., 1985) and TN93 (Tamura and Nei, 1993)....
[...]
...The current version implements several models of nucleotide sequence evolution: JC69 (Jukes and Cantor, 1969 ), F81 (Felsenstein, 1981), K2P (Kimura, 1980), F84 (Felsenstein, 1993), HKY (Hasegawa et al....
[...]
...Sequences 500 base pairs (bp) in length were generated from these phylogenies using Seq-Gen (Rambaut and Grassly, 1997 ) under the Kimura two-parameter (K2P) model (Kimura, 1980 ), with a transition/transversion ratio of 2.0....
[...]

Journal Article•DOI•

Arlequin (version 3.0): An integrated software package for population genetics data analysis

[...]

Laurent Excoffier¹, Guillaume Laval¹, Stefan W. Schneider¹•Institutions (1)

University of Bern¹

01 Jan 2005-Evolutionary Bioinformatics

TL;DR: Arlequin ver 3.0 as discussed by the authors is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework.

...read moreread less

Abstract: Arlequin ver 3.0 is a software package integrating several basic and advanced methods for population genetics data analysis, like the computation of standard genetic diversity indices, the estimation of allele and haplotype frequencies, tests of departure from linkage equilibrium, departure from selective neutrality and demographic equilibrium, estimation or parameters from past population expansions, and thorough analyses of population subdivision under the AMOVA framework. Arlequin 3 introduces a completely new graphical interface written in C++, a more robust semantic analysis of input files, and two new methods: a Bayesian estimation of gametic phase from multi-locus genotypes, and an estimation of the parameters of an instantaneous spatial expansion from DNA sequence polymorphism. Arlequin can handle several data types like DNA sequences, microsatellite data, or standard multi-locus genotypes. A Windows version of the software is freely available on http://cmpg.unibe.ch/software/arlequin3.

...read moreread less

14,271 citations

Journal Article•DOI•

MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment

[...]

Sudhir Kumar¹, Koichiro Tamura², Masatoshi Nei³•Institutions (3)

Biodesign Institute¹, Tokyo Metropolitan University², Pennsylvania State University³

01 Jun 2004-Briefings in Bioinformatics

TL;DR: An overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA is provided.

...read moreread less

Abstract: With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.

...read moreread less

12,124 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

CHAPTER 24 – Evolution of Protein Molecules

[...]

Thomas H. Jukes¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1969

10,262 citations

Journal Article•DOI•

Evolutionary Rate at the Molecular Level

[...]

Motoo Kimura¹•Institutions (1)

National Institute of Genetics¹

17 Feb 1968-Nature

TL;DR: Calculating the rate of evolution in terms of nucleotide substitutions seems to give a value so high that many of the mutations involved must be neutral ones.

...read moreread less

Abstract: Calculating the rate of evolution in terms of nucleotide substitutions seems to give a value so high that many of the mutations involved must be neutral ones.

...read moreread less

3,297 citations

Journal Article•DOI•

Non-Darwinian Evolution

[...]

Jack Lester King, Thomas H. Jukes

16 May 1969-Science

TL;DR: NonDarwinian evolution of protein and DNA, comparing expectations of evolution models for protein and amino acid changes is compared.

...read moreread less

Abstract: NonDarwinian evolution of protein and DNA, comparing expectations of evolution models for protein and amino acid changes

...read moreread less

1,480 citations

Journal Article•DOI•

Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution

[...]

Motoo Kimura¹•Institutions (1)

National Institute of Genetics¹

19 May 1977-Nature

TL;DR: By comparative studies of messenger RNA (mRNA) sequences reliable estimates can be obtained of the evolutionary rates (in terms of mutant substitutions) at the third positions of the codon, and that the estimates conform remarkably well with the framework of the neutral theory.

...read moreread less

Abstract: ACCORDING to the neutral mutation–random drift hypothesis of molecular evolution and polymorphism1,2, most mutant substitutions detected through comparative studies of homologous proteins (and the nucleotide sequences) are the results of random fixation of selectively neutral or nearly neutral mutations. This is in sharp contrast to the orthodox neo-Darwinian view that practically all mutant substitutions occurring within species in the course of evolution are caused by positive Darwinian selection3–5. This paper shows that by comparative studies of messenger RNA (mRNA) sequences reliable estimates can be obtained of the evolutionary rates (in terms of mutant substitutions) at the third positions of the codon, and that the estimates conform remarkably well with the framework of the neutral theory.

...read moreread less

585 citations

Journal Article•DOI•

On Some Principles Governing Molecular Evolution

[...]

Motoo Kimura, Tomoko Ohta

01 Jul 1974-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Five Pillars of Evolution were culled from the accumulated evidence on molecular evolution and theoretical considerations of the population dynamics of mutant substitutions.

...read moreread less

Abstract: The following five principles were deduced from the accumulated evidence on molecular evolution and theoretical considerations of the population dynamics of mutant substitutions: (i) for each protein, the rate of evolution in terms of amino acid substitutions is approximately constant/site per year for various lines, as long as the function and tertiary structure of the molecule remain essentially unaltered (ii) Functionally less important molecules or parts of a molecule evolve (in terms of mutant substitutions) faster than more important ones (iii) Those mutant substitutions that disrupt less the existing structure and function of a molecule (conservative substitutions) occur more frequently in evolution than more disruptive ones (iv) Gene duplication must always precede the emergence of a gene having a new function (v) Selective elimination of definitely deleterious mutants and random fixation of selectively neutral or very slightly deleterious mutants occur far more frequently in evolution than positive Darwinian selection of definitely advantageous mutants

...read moreread less

467 citations