scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets

22 Mar 2016-Molecular Biology and Evolution (Oxford University Press)-Vol. 33, Iss: 7, pp 1870-1874
TL;DR: The latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine, has been optimized for use on 64-bit computing systems for analyzing larger datasets.
Abstract: We present the latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, Mega has been optimized for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of thousands of sequences in Mega The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit Mega is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OS X. The command line Mega is available as native applications for Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

Content maybe subject to copyright    Report

MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0
forBiggerDatasets
Sudhir Kumar,
1,2,3
Glen Stecher
1
and Koichiro Tamura*
,4,5
1
Institute for Genomics and Evolutionary Medicine, Temple University
2
Department of Biology, Temple University
3
Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
4
Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
5
Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
*Corresponding author: E-mail: ktamura@tmu.ac.jp
Associate editor: Joel Dudley
Abstract
We present the latest version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which contains many
sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, MEGA has been optimized
for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of
thousands of sequences in M
EGA. The new version also provides an advanced wizard for building timetrees and includes a
new functionality to automatically predict gene duplication events in gene family trees. The 64-bit M
EGA is made available
in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows
application that can also be used on Mac OS X. The command line M
EGA is available as native applicatio ns for
Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions
are available from www.megasoftware.net free of charge.
Key words: gene families, timetree, software, evolution.
Molecular Evolutionary Genetics Analysis (M
EGA) software is now
being applied to increasingly bigger datasets (
Kumar et al. 1994;
Tamura et al. 2013). This necessitated technological advance-
ment of the computation core and the user interface of M
EGA.
Researchers also need to conduct high-throughput and scripted
analyses on their operating system of choice, which requires
that MEGA be available in native cross-platform imple mentation.
We have advanced the MEGA software suite to address these
needs of researchers performing comparative analyses of DNA
and protein sequences of increasing larger datasets.
Addressing the Need to Analyze Bigger Datasets
Contemporary personal computers and workstations pack
much greater computing power and system memory than
ever before. It is now common to have many gigabytes of mem-
ory with a 64-bit architecture and an operating system to match.
To harness this power in evolutionary analyses, we have ad-
vanced the M
EGA source code to fully utilize 64-bit computing
resources and memory in data handling, file processing, and
evolutionary analytics. M
EGA’s internal data structures have
been upgraded, and the refactored source code has been tested
extensively using automated test harnesses.
We benchmarked 64-bit M
EGA7 performance using 16S ribo-
somal RNA sequence alignments obtained from the SILVA
rRNA database project (
Quast et al. 2013; Yilmaz et al. 2014)
with thousands of sites and increasingly greater number of se-
quences (as many as 10,000).
Figure 1 shows that their
computationa l analys is requires large amounts of memory
and computing power. For the Neighbor-Joining (NJ) method
(Saitou and Nei 1987), memory usage increased at a polynomial
rate as the number of sequences was increased. The peak mem-
ory usage was 1.7 GB for the full dataset of 10,000 rRNA se-
quences (
fig. 1B). For the Maximum Likelihood (ML) analyses,
memory usage increased linearly and the peak memory usage
was at 18.6 GB (
fig. 1D).Thetimetocompletethecomputation
(
fig. 1A and C)showedapolynomialtrendforNJandalinear
trend for ML. ML required an order of magnitude greater time
and memory. We also benchmarked M
EGA7 for datasets with
increasing number of sites. Computational time and peak mem-
ory showed a linear trend. In addition, we compared the mem-
ory and time needs for 32- and 64-bit versions (M
EGA6and
MEGA7, respectively), and found no significant difference for NJ
and ML analys es. This is primarily because both M
EGA6and
MEGA7 use 8-byte floating point data types. However, the 32-
bit M
EGA6 could only carry out ML analysis for fewer than 3,000
sequences of the same length. Therefore, M
EGA7isasignificant
upgrade that does not incur any discernible computational or
resource penalty.
Upgrading the Tree Explorer
The ability to construct a phylogenetic tree of >10,000 se-
quences required a major upgrade of the Tree Explorer as well,
because it needed to display very large trees. This was accom-
plished by replacing the native Windows scroll box with a
Brief communication
ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
1870 Mol. Biol. Evol. 33(7):1870–1874 doi:10.1093/molbev/msw054 Advance Access publication March 22, 2016
Downloaded from https://academic.oup.com/mbe/article/33/7/1870/2579089 by guest on 20 August 2022

custom virtual scroll box, which increased the number of taxa
that can be displayed in the Tree Explorer window from
4,000 in M
EGA6 to greater than 100,000 sequences in
M
EGA7. This is made possible by our new adaptive approach
to render the tree to ensure the best display quality and
exploration performance. To display a tree, we first evaluate
if the tree can be rendered as a device-dependent bitmap
(DDB), which depends on the power of the available graphics
processing unit. If successful, the tree image is stored in video
memory, which enhances performance. For example, in a
computer equipped with GeForce GT 640 graphics card,
Tree Explorer successfully rendered trees with more than
100,000 sequences and responded quickly to the user scroll-
ing and display changes. When a DDB is not possible to
generate, then Tree Explorer renders the tree as a device in-
dependent bitmap. Because of the extensive system memory
requirements, we automatically choose a pixel format that
maximizes the number of sequences displayed. Basically, the
pixel format dictates the number of colors used: 24 (2
24
colors), 18, 8, 4, or 1 bit (monochrome) per pixel. Memory
needs scale proportional to the number of bits used per pixel.
Cross-Platform MEGA-CC for High-Throughput and
Scripted Analyses
We have now refactored MEGA’s computation core (CC,
Kumar et al. 2012)sothatitcanbecompilednativelyfor
Linux, Windows, and Mac OS X systems in order to avoid the
need for emulation or virtualization. This required porting the
computation core source code to a cross-platform program-
ming language and replacing all the Microsoft Windows sys-
tem API calls. For instance, the App Linker system, which
integrates the MUSCLE (Edgar 2004) sequence alignment ap-
plication with M
EGA, relied heavily on the Windows API for
inter-process communication and was refactored extensively.
In order to configure analyses in M
EGA7-CC, we have chosen
to continue requiring an analysis options file (called .mao file)
that specifies all the input parameters to the command-line
driven M
EGA-CC application; see
figure 1 in Kumar et al. (2012).
To generate this control file, we provide native prototyper
applications (M
EGA-PROTO) for Windows, Linux, and Mac OS
X. MEGA-PROTO obviates the need to learn a large number of
commands, and, thus, avoids a steep learning curve and po-
tential mistakes for inter-dependent options. It also enables us
to deliver exactly the same experience and options for those
who will use both GUI and CC versions of M
EGA7.
Marking Gene Duplication Events in Gene Family Trees
We have added a new functionality in MEGA to mark tree
nodes where gene duplications are predicted to occur. This
system works with or without a species tree. If a species tree is
provided, then we mark gene duplications following
Zmasek
and Eddy (2001) algorithm. This algorithm posits the smallest
number of gene duplications in the tree such that the min-
imum number of unobserved genes, due to losses or partial
sampling are invoked. When no species tree is provided, then
all internal nodes in the tree that contain one or more
FIG.1. Time and memory requirements for phylogenetic analyses using the NJ method (A, B) and the ML analysis (C, D). For NJ analysis, we used the
Tamura–Nei (1993) model, uniform rates of evolution among sites, and pairwise deletion option to deal with the missing data. Time usage
increases polynomially with the number of sequences (third degree polynomial, R
2
¼ 1), as does the peak memory used (R
2
¼ 1) (A, B). The same
model and parameters were used for ML tree inference, where the time taken and the memory needs increased linearly with the number of
sequences. For ML analysis, the SPR (Subtree–Pruning–Regrafting) heuristic was used for tree searching and all 5,287 sites in the sequence
alignment were included. All the analyses were performed on a Dell Optiplex 9010 computer with an Intel Core-i7-3770 3.4 GHz processor, 20 GB
of RAM, NVidia GeForce GT 640 graphics card, and a 64-bit Windows 7 Enterprise operating system.
Molecular Evolutionary Genetics Analysis
.
doi:10.1093/molbev/msw054 MBE
1871
Downloaded from https://academic.oup.com/mbe/article/33/7/1870/2579089 by guest on 20 August 2022

common species in the two descendant clades are marked as
gene duplication events. This algorithm provides a minimum
number of duplication events, because many duplication
nodes will remain undetected when the gene sampling is
incomplete. Nevertheless, it is useful for cases where species
trees are not well established.
Realizing that the root of the gene family tree is not always
obvious, M
EGA runs the above analysis by automatically root-
ing the tree on each branch and selecting a root such that the
number of gene duplications inferred is minimized. This is
done only when the user does not specify a root explicitly. A
Gene Duplication Wizard (
fig. 2)walkstheuserthroughallthe
necessary steps for this analysis. Results are displayed in the
Tree Explorer (
fig. 3) which marks gene duplications with blue
solid diamonds. When a species tree is provided, speciation
events are marked with open red diamonds. Results can also
be exported to Newick formatted text files where gene du-
plications and speciation events are labeled using comments
in square brackets. In the future, we plan to extend this sys-
tem with the capability to automatically retrieve species tree
from external databases, including the NCBI Taxonomy
(http://www.ncbi.nlm.nih.gov/guide/taxonomy/) and the
timetree of life (
Hedges et al. 2015).
Timetree System Updates
We have now upgraded the Timetree Wizard (similar to
the wizard shown in
fig. 2), which guides researchers
through a multi-step process of building a molecular phy-
logeny scaled to time using a sequence alignment and a
phylogenetic tree topology. Th is wizar d accepts Newick
formatted tree files, assists users in defining the out-
group(s) on which the tree will be rooted, and allows users
to set divergence time cali bra tion const raints. Setting
time constrai nts in order to calibrate the final timetree
is optional in the RelTime method (
Tamura et al. 2012), so
M
EGA7 does not require that calibration constraints be
available and it does not assume a molecular clock. If no
calibrations are used, M
EGA7 will produce relative diver-
gence times for nodes, which are useful for determining
the ordering and spacing of divergence events in species
and gene family trees. However, users can obtain absolute
divergence time estimates for each node by providing
FIG.2. The Gene Duplication Wizard (A) to guide users through the process of searching gene duplication events in a gene family tree. In the first
step, the user loads a gene tree from a Newick formatted text file. Second, species associated with sequences are specified using a graphical
interface. In the third step, the user has the option to load a trusted species tree, in which case it will be possible to identify all duplication events in
the gene tree, from a Newick file. Fourth, the user has the option to specify the root of the gene tree in a graphical interface. If the user provides a
trusted species tree, then they must designate the root of that tree. Finally, the user launches the analysis and the results are displayed in the Tree
Explorer window (see fig. 3).
Kumar et al.
.
doi:10.1093/molbev/msw054 MBE
1872
Downloaded from https://academic.oup.com/mbe/article/33/7/1870/2579089 by guest on 20 August 2022

calibrations with minimum and/or maximum constraints
(
Tamura et al. 2013). It is important to note that M
EGA7
does not use calibrations that are present in the clade
containing the outgroup(s), because that would require
an assumption of equal rates of evolution b etween
the ingroup and outgroup sequences, which cannot be
tested. For this reason, timetrees displayed in the Tree
Explorer have the outgroup cluster compressed and
grayed out by default to promote correct scient ific anal-
ysis and interpretation.
Data Coverage Display by Node
In the Tree Explorer, users will be able to display another set
of numbers at internal tree nodes that correspond to the
proportion of positions in the alignment where there is at
least one sequence with an unambiguous nucleotide or
amino acid in both the descendent lineages; see figure 5
in
Filipski et al. (2014). This metric is referred to as mini-
mum data coverage and is useful in exposing nodes in the
tree that lack sufficient data to make reliable phylogenetic
inferences. For example, when the minimum data coverage
is zero for a node, then the time elapsed on the branch
connecting this node with its descendant node will always
be of zero, because zero substitutions will be mapped to
that branch (
Filipski et al. 2014). This means that diver-
gence times for such nodes would be underestimated.
Such branches will also have very low statistical confidence
when inferring the phylogenetic tree. So, it is always good
to examine this metric for all nodes in the tree.
Conclusions
We have made many major upgrades to MEGA’s infrastructure
and added a number of new functionalities that will enable
researchers to conduct additional analyses with greater ease.
These upgrades make the seventh version of MEGA more ver-
satile than previous versions. For Microsoft Windows, the
64-bit MEGA is made available with Graphical User Interface
and as a command line program intended for use in high-
throughput and scripted analysis. Both versions are available
from www.megasoftware.net free of charge. The command
line version of M
EGA7 is now available in native cross-platform
applications for Linux and Mac OS X also. The GUI version of
M
EGA7 is also available for Mac OS X, where we provide an
installation that automatically configures the use of Wine for
compatibility with Mac OS X. Since Wine only supports 32-bit
software, we provide 32-bit M
EGA7GUIforMacOSX.
However, Mac and Linux users can run the 64-bit Windows
version of MEGA7 GUI using virtual machine environments,
including VMWare, Parallels, or Crossover. Alternatively,
64-bit M
EGA-CC along with MEGA-PROT O can be used as they
run natively on Windows, Mac OS X, and Linux.
Acknowledgments
We thank Charlotte Konikoff and Mike Suleski for extensively
testing M
EGA7. Many other laboratory members and beta
testers provided invaluable feedback and bug reports. We
thank Julie Marin for help in assembling the rRNA data ana-
lyzed. This study was supported in part by research grants
from National Institutes of Health (HG002096-12 to S.K.) and
Japan Society for the Promotion of Science (JSPS) grants-in-
aid for scientific research (24370033) to K.T.
References
Edgar RC. 2004. Muscle: a multiple sequence alignment method with
reduced time and space complexity. BMC Bioinformatics 5:113.
Filipski A, Murillo O, Freydenzon A, Tamura K, Kumar S. 2014. Prospects
for building large timetrees using molecular data with incomplete
gene coverage among species. MolBiolEvol31:2542–2550.
Hedges SB, Marin J, Suleski M, Paymer M, Kumar S. 2015. Tree of life
reveals clock-like speciation and diversification. MolBiolEvol
32:835–845.
Kumar S, Stecher G, Peterson D, Tamura K. 2012. M
EGA-CC :comput-
ing core of molecular evolutionary genetics analysis program
for automated and iterative data analysis. Bioinformatics
28:2685–2686.
KumarS,TamuraK,NeiM.1994.M
EGA: molecular evolutionary genet-
ics analysis software for microcomputers. Comput Appl Biosci.
10:189–191.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J,
Glo¨ckner FO. 2013. The SILVA ribosomal RNA gene database proj-
ect: improved data processing and web-based tools. Nucleic Acids
Res. 41:D590–D596.
Saitou N, Nei M. 1987. The neighbor-joining method—a new
method for reconstructing phylogenetic trees. MolBiolEvol.
4:406–425.
Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S.
2012. Estimating divergence times in large molecular phylogenies.
Proc Natl Acad Sci U S A. 109:19333–19338.
Tamura K, Nei M. 1993. Estimation of the number of nucleotide substi-
tutions in the control region of mitochondrial-DNA in humans and
chimpanzees. Mol Biol Evol. 10:512–526.
FIG.3.Tree Explorer window with gene duplications marked with
closed blue diamonds and speciation events, if a trusted species
tree is provided, are identified by open red diamonds (see fig. 2 legend
for more information).
Molecular Evolutionary Genetics Analysis
.
doi:10.1093/molbev/msw054 MBE
1873
Downloaded from https://academic.oup.com/mbe/article/33/7/1870/2579089 by guest on 20 August 2022

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6:
molecular evolutionary genetics analysis version 6.0. MolBiolEvol.
30:2725–2729.
Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, Schweer T,
Peplies J, Ludwig W, Glockner FO. 2014. The SILVA and “All-species
Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res.
42:D643–D648.
Zmasek CM, Eddy SR. 2001. A simple algorithm to infer gene dupli-
cation and speciation events on a gene tree. Bioinformatics
17:821–828.
Kumar et al.
.
doi:10.1093/molbev/msw054 MBE
1874
Downloaded from https://academic.oup.com/mbe/article/33/7/1870/2579089 by guest on 20 August 2022
Citations
More filters
Journal ArticleDOI
TL;DR: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine and has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses.
Abstract: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.

21,952 citations


Cites background or methods from "MEGA7: Molecular Evolutionary Genet..."

  • ...MEGA includes a large repertoire of programs for assembling sequence alignments, inferring evolutionary trees, estimating genetic distances and diversities, inferring ancestral sequences, computing timetrees, and testing selection (Kumar et al. 2016)....

    [...]

  • ...MEGA across Computing Platforms (MEGA X) MEGA was first developed for MS DOS in the early 1990s (Kumar et al. 1994) and then upgraded for use in MS Windows eight times, including MEGA 1 to MEGA 6 and MEGA-CC and MEGA-MD (Kumar et al. 2001, 2016)....

    [...]

  • ...Emulators cannot be used effectively for the latest 64-bit version of MEGA that is built to handle memory-intensive analyses of large contemporary data sets (Kumar et al. 2016), so a more comprehensive solution is required for users of alternate platforms....

    [...]

Journal ArticleDOI
TL;DR: The current version of iTOL v4 introduces four new dataset types, together with numerous new features, and is the first tool which supports direct visualization of Qiime 2 trees and associated annotations.
Abstract: The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. The current version introduces four new dataset types, together with numerous new features. Annotation options have been expanded and new control options added for many display elements. An interactive spreadsheet-like editor has been implemented, providing dataset creation and editing directly in the web interface. Font support has been rewritten with full support for UTF-8 character encoding throughout the user interface. Google Web Fonts are now fully supported in the tree text labels. iTOL v4 is the first tool which supports direct visualization of Qiime 2 trees and associated annotations. The user account system has been streamlined and expanded with new navigation options, and currently handles >700 000 trees from more than 40 000 individual users. Full batch access has been implemented allowing programmatic upload and export of trees and annotations.

4,233 citations

Journal ArticleDOI
TL;DR: The phylogenetic analysis complemented with synteny analyses suggests that Bmp2, -4 and -16 are remnants of a gene quartet that originated during the two rounds of whole-genome duplication (2R-WGD) early in vertebrate evolution.
Abstract: The vertebrate gene repertoire is characterized by “cryptic” genes whose identification has been hampered by their absence from the genomes of well-studied species. One example is the Bmp16 gene, a paralog of the developmental key genes Bmp2 and -4. We focus on the Bmp2/4/16 group of genes to study the evolutionary dynamics following gen(om)e duplications with special emphasis on the poorly studied Bmp16 gene. We reveal the presence of Bmp16 in chondrichthyans in addition to previously reported teleost fishes and reptiles. Using comprehensive, vertebrate-wide gene sampling, our phylogenetic analysis complemented with synteny analyses suggests that Bmp2, -4 and -16 are remnants of a gene quartet that originated during the two rounds of whole-genome duplication (2R-WGD) early in vertebrate evolution. We confirm that Bmp16 genes were lost independently in at least three lineages (mammals, archelosaurs and amphibians) and report that they have elevated rates of sequence evolution. This finding agrees with their more “flexible” deployment during development; while Bmp16 has limited embryonic expression domains in the cloudy catshark, it is broadly expressed in the green anole lizard. Our study illustrates the dynamics of gene family evolution by integrating insights from sequence diversification, gene repertoire changes, and shuffling of expression domains.

1,376 citations

Journal ArticleDOI
22 Jun 2020-Science
TL;DR: The epitope of 4A8 is defined as the N-terminal domain (NTD) of the S protein by determining with cryo–eletron microscopy its structure in complex with the Sprotein, which points to the NTD as a promising target for therapeutic mAbs against COVID-19.
Abstract: Developing therapeutics against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could be guided by the distribution of epitopes, not only on the receptor binding domain (RBD) of the Spike (S) protein but also across the full Spike (S) protein We isolated and characterized monoclonal antibodies (mAbs) from 10 convalescent COVID-19 patients Three mAbs showed neutralizing activities against authentic SARS-CoV-2 One mAb, named 4A8, exhibits high neutralization potency against both authentic and pseudotyped SARS-CoV-2 but does not bind the RBD We defined the epitope of 4A8 as the N-terminal domain (NTD) of the S protein by determining with cryo-eletron microscopy its structure in complex with the S protein to an overall resolution of 31 angstroms and local resolution of 33 angstroms for the 4A8-NTD interface This points to the NTD as a promising target for therapeutic mAbs against COVID-19

1,189 citations

Journal ArticleDOI
TL;DR: The macOS version of the MEGA software, which eliminates the need for virtualization and emulation programs, has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux.
Abstract: The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on macOS. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge.

896 citations


Cites methods from "MEGA7: Molecular Evolutionary Genet..."

  • ...…and students using Apple computers by bundling the Molecular Evolutionary Genetics Analysis (MEGA) software for Microsoft Windows with the Wineskin tool built on top of the WINE compatibility layer, which is capable of running Windows applications on other operating systems (Kumar et al. 2016)....

    [...]

  • ...Second, MEGAþWineskin packaging supported only 32-bit applications, so the 64-bit versions of MEGA for handling memory-intensive analyses of large contemporary data sets were not available to macOS users (Kumar et al. 2016, 2018)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Abstract: A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

57,055 citations


"MEGA7: Molecular Evolutionary Genet..." refers methods in this paper

  • ...For the Neighbor-Joining (NJ) method (Saitou and Nei 1987), memory usage increased at a polynomial rate as the number of sequences was increased....

    [...]

Journal ArticleDOI
TL;DR: An advanced version of the Molecular Evolutionary Genetics Analysis software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis, is released, which enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny.
Abstract: We announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny. A new Timetree Wizard in MEGA6 facilitates this timetree inference by providing a graphical user interface (GUI) to specify the phylogeny and calibration constraints step-by-step. This version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management that can double the size of sequence data sets to which MEGA can be applied. Both GUI and command-line versions of MEGA6 can be downloaded from www.megasoftware.net free of charge.

37,956 citations


"MEGA7: Molecular Evolutionary Genet..." refers background or methods in this paper

  • ...calibrations with minimum and/or maximum constraints (Tamura et al. 2013)....

    [...]

  • ...Molecular Evolutionary Genetics Analysis (MEGA) software is now being applied to increasingly bigger datasets (Kumar et al. 1994; Tamura et al. 2013)....

    [...]

  • ...Kumar et al. . doi:10.1093/molbev/msw054 MBE D ow nloaded from https://academ ic.oup.com /m be/article-abstract/33/7/1870/2579089 by guest on 29 M arch 2019 calibrations with minimum and/or maximum constraints (Tamura et al. 2013)....

    [...]

01 Jan 2013
TL;DR: The Molecular Evolutionary Genetics Analysis (MEGA) software as discussed by the authors provides facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis, including the inference of timetrees.
Abstract: We announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny. A new Timetree Wizard in MEGA6 facilitates this timetree inference by providing a graphical user interface (GUI) to specify the phylogeny and calibration constraints step-by-step. This version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management that can double the size of sequence data sets to which MEGA can be applied. Both GUI and command-line versions of MEGA6 can be downloaded from www. megasoftware.net free of charge.

30,478 citations

Journal ArticleDOI
TL;DR: The extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
Abstract: SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.

18,256 citations


"MEGA7: Molecular Evolutionary Genet..." refers methods in this paper

  • ...We benchmarked 64-bit MEGA7 performance using 16S ribosomal RNA sequence alignments obtained from the SILVA rRNA database project (Quast et al. 2013; Yilmaz et al. 2014) with thousands of sites and increasingly greater number of sequences (as many as 10,000)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions was proposed, taking into account excess transitions, unequal nucleotide frequencies, and variation of substitution rate among different sites.
Abstract: Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA ( mtDNA ) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions. In this method, excess transitions, unequal nucleotide frequencies, and variation of substitution rate among different sites are all taken into account. Application of this method to human and chimpanzee data suggested that the transition / transversion ratio for the entire control region was - 15 and nearly the same for the two species. The 95% confidence interval of the age of the common ancestral mtDNA was estimated to be 80,000-480,000 years in humans and 0.57-2.72 Myr in common chimpanzees.

9,144 citations