RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

doi:10.1093/BIOINFORMATICS/BTU033

Home
/
Papers
/
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Journal Article•DOI•

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Alexandros Stamatakis¹•Institutions (1)

Heidelberg Institute for Theoretical Studies¹

01 May 2014-Bioinformatics (Oxford University Press)-Vol. 30, Iss: 9, pp 1312-1313

TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.

read less

Abstract: Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins.

[...]

Tommy Tsan-Yuk Lam¹, Na Jia, Ya-Wei Zhang, Marcus Ho-Hin Shum¹, Jia-Fu Jiang, Hongbo Zhu¹, Yigang Tong², Yongxia Shi, Xue-Bing Ni¹, Yunshi Liao¹, Wen-Juan Li², Bao-Gui Jiang, Wei Wei³, Ting-Ting Yuan, Kui Zheng, Xiao-Ming Cui, Jie Li, Guangqian Pei, Xin Qiang, William Yiu-Man Cheung¹, Lian-Feng Li³, Fang-Fang Sun, Si Qin, Jicheng Huang, Gabriel M. Leung¹, Edward C. Holmes⁴, Yan-Ling Hu³, Yi Guan¹, Wu-Chun Cao - Show less +25 more•Institutions (4)

University of Hong Kong¹, Beijing University of Chemical Technology², Guangxi Medical University³, University of Sydney⁴

26 Mar 2020-Nature

TL;DR: The discovery of multiple lineages of pangolin coronavirus and their similarity to SARS-CoV-2 suggests that pangolins should be considered as possible hosts in the emergence of new coronaviruses and should be removed from wet markets to prevent zoonotic transmission.

...read moreread less

Abstract: The ongoing outbreak of viral pneumonia in China and across the world is associated with a new coronavirus, SARS-CoV-21. This outbreak has been tentatively associated with a seafood market in Wuhan, China, where the sale of wild animals may be the source of zoonotic infection2. Although bats are probable reservoir hosts for SARS-CoV-2, the identity of any intermediate host that may have facilitated transfer to humans is unknown. Here we report the identification of SARS-CoV-2-related coronaviruses in Malayan pangolins (Manis javanica) seized in anti-smuggling operations in southern China. Metagenomic sequencing identified pangolin-associated coronaviruses that belong to two sub-lineages of SARS-CoV-2-related coronaviruses, including one that exhibits strong similarity in the receptor-binding domain to SARS-CoV-2. The discovery of multiple lineages of pangolin coronavirus and their similarity to SARS-CoV-2 suggests that pangolins should be considered as possible hosts in the emergence of new coronaviruses and should be removed from wet markets to prevent zoonotic transmission.

...read moreread less

1,461 citations

Journal Article•DOI•

On the origin and continuing evolution of SARS-CoV-2

[...]

Xiaolu Tang¹, Changcheng Wu¹, Xiang Li², Yuhe Song², Yuhe Song³, Xinmin Yao¹, Xinkai Wu¹, Yuange Duan¹, Hong Zhang¹, Yirong Wang¹, Zhaohui Qian⁴, Jie Cui², Jian Lu¹ - Show less +9 more•Institutions (4)

Peking University¹, Chinese Academy of Sciences², Shanghai University³, Peking Union Medical College⁴

03 Mar 2020-National Science Review

TL;DR: The results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination.

...read moreread less

Abstract: The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major global concern. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses had two major lineages (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date. We found that L lineage was more prevalent than the S lineage within the limited patient samples we examined. The implication of these evolutionary changes on disease etiology remains unclear. These findings strongly underscores the urgent need for further comprehensive studies that combine viral genomic data, with epidemiological studies of coronavirus disease 2019 (COVID-19).

...read moreread less

1,369 citations

Cites methods from "RAxML version 8: a tool for phyloge..."

...12 [47] was used to build the maximum likelihood phylogenetic tree of 103 aligned SARS-CoV-2...
[...]

Posted Content•DOI•

OrthoFinder: phylogenetic orthology inference for comparative genomics

[...]

David M. Emms¹, Steven L. Kelly¹•Institutions (1)

University of Oxford¹

24 Apr 2019-bioRxiv

TL;DR: This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted genes trees, gene duplication events, the rooted species tree, and comparative genomic statistics.

...read moreread less

Abstract: Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted genes trees, gene duplication events, the rooted species tree, and comparative genomic statistics. Each output is benchmarked on appropriate real or simulated datasets and, where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder’s comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.

...read moreread less

1,366 citations

Journal Article•DOI•

ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees.

[...]

Chao Zhang¹, Maryam Rabiee¹, Erfan Sayyari¹, Siavash Mirarab¹•Institutions (1)

University of California, San Diego¹

08 May 2018-BMC Bioinformatics

TL;DR: ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species and removes low support branches from gene trees, resulting in improved accuracy.

...read moreread less

Abstract: Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions. We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species (n) and the number of genes (k). ASTRAL-III limits the bipartition constraint set (X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is $O\left ((nk)^{1.726} D \right)$ where D=O(nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results. ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.

...read moreread less

1,261 citations

Cites methods from "RAxML version 8: a tool for phyloge..."

...The gene tree should leave the relationship between identical sequences unresolved (FastTree [32] automatically does it and RAxML, which outputs an arbitrary resolution, warns the user about the input)....
[...]
...Gene trees are estimated using RAxML [41] with 200 replicates of bootstrapping....
[...]

Journal Article•DOI•

Real-time, portable genome sequencing for Ebola surveillance

[...]

Joshua Quick¹, Nicholas J. Loman¹, Sophie Duraffour², Jared T. Simpson³, Jared T. Simpson⁴, Ettore Severi⁵, Ettore Severi⁶, Lauren A. Cowley, Joseph Akoi Bore², Raymond Koundouno², Gytis Dudas⁷, Amy Mikhail, Nobila Ouedraogo⁸, Babak Afrough, Amadou Bah⁹, Jonathan H.J. Baum², Beate Becker-Ziaja², Jan Peter Boettcher⁸, Mar Cabeza-Cabrerizo², Álvaro Camino-Sánchez², Lisa L. Carter¹⁰, Juliane Doerrbecker², Theresa Enkirch¹¹, Isabel García-Dorival¹², Nicole Hetzelt⁸, Julia Hinzmann⁸, Tobias Holm², Liana E. Kafetzopoulou¹³, Liana E. Kafetzopoulou⁵, Michel Koropogui, Abigael Kosgey¹⁴, Eeva Kuisma⁵, Christopher H. Logue⁵, Antonio Mazzarelli, Sarah Meisel², Marc Mertens¹⁵, Janine Michel⁸, Didier Ngabo, Katja Nitzsche², Elisa Pallasch², Livia Victoria Patrono², Jasmine Portmann, Johanna Repits¹⁶, Natasha Y. Rickett¹², Andreas Sachse⁸, Katrin Singethan¹⁷, Inês Vitoriano, Rahel L. Yemanaberhan², Elsa Gayle Zekeng¹², Trina Racine¹⁸, Alexander Bello¹⁸, Amadou A. Sall¹⁹, Ousmane Faye¹⁹, Oumar Faye¹⁹, N’Faly Magassouba, Cecelia V. Williams²⁰, Victoria Amburgey²⁰, Linda Winona²⁰, Emily Davis²¹, Jon Gerlach²¹, Frank Washington²¹, Vanessa Monteil, Marine Jourdain, Marion Bererd, Alimou Camara, Hermann Somlare, Abdoulaye Camara, Marianne Gerard, Guillaume Bado, Bernard Baillet, Déborah Delaune, Koumpingnin Yacouba Nebie²², Abdoulaye Diarra²², Yacouba Savane²², Raymond Pallawo²², Giovanna Jaramillo Gutierrez²³, Natacha Milhano⁶, Natacha Milhano²⁴, Isabelle Roger²², Christopher Williams, Facinet Yattara, Kuiama Lewandowski, James E. Taylor, Phillip A. Rachwal²⁵, Daniel J. Turner, Georgios Pollakis¹², Julian A. Hiscox¹², David A. Matthews, Matthew K. O'Shea, Andrew Johnston, Duncan W. Wilson, Emma Hutley, Erasmus Smit⁵, Antonino Di Caro, Roman Wölfel²⁶, Kilian Stoecker²⁶, Erna Fleischmann²⁶, Martin Gabriel², Simon A. Weller²⁵, Lamine Koivogui, Boubacar Diallo²², Sakoba Keita, Andrew Rambaut²⁷, Andrew Rambaut⁷, Pierre Formenty²², Stephan Günther², Miles W. Carroll - Show less +103 more•Institutions (27)

University of Birmingham¹, Bernhard Nocht Institute for Tropical Medicine², Ontario Institute for Cancer Research³, University of Toronto⁴, Public Health England⁵, European Centre for Disease Prevention and Control⁶, University of Edinburgh⁷, Robert Koch Institute⁸, Swiss Tropical and Public Health Institute⁹, University College London¹⁰, Paul Ehrlich Institute¹¹, University of Liverpool¹², Rega Institute for Medical Research¹³, Kenya Medical Research Institute¹⁴, Friedrich Loeffler Institute¹⁵, Janssen-Cilag¹⁶, Technische Universität München¹⁷, Public Health Agency of Canada¹⁸, Pasteur Institute¹⁹, Sandia National Laboratories²⁰, MRIGlobal²¹, World Health Organization²², University of London²³, Norwegian Institute of Public Health²⁴, Defence Science and Technology Laboratory²⁵, Bundeswehr Institute of Microbiology²⁶, National Institutes of Health²⁷

11 Feb 2016-Nature

TL;DR: This paper presents sequence data and analysis of 142 EBOV samples collected during the period March to October 2015 and shows that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.

...read moreread less

Abstract: A nanopore DNA sequencer is used for real-time genomic surveillance of the Ebola virus epidemic in the field in Guinea; the authors demonstrate that it is possible to pack a genomic surveillance laboratory in a suitcase and transport it to the field for on-site virus sequencing, generating results within 24 hours of sample collection. This paper reports the use of nanopore DNA sequencers (known as MinIONs) for real-time genomic surveillance of the Ebola virus epidemic, in the field in Guinea. The authors demonstrate that it is possible to pack a genomic surveillance laboratory in a suitcase and transport it to the field for on-site virus sequencing, generating results within 24 hours of sample collection. The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths1. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 and 1.42 × 10−3 mutations per site per year. This is equivalent to 16–27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic2,3,4,5,6,7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions8. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities9. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15–60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.

...read moreread less

1,187 citations

Cites background from "RAxML version 8: a tool for phyloge..."

...England, Porton Down, Wiltshire SP4 0JG, UK (12)Friedrich-Loeffler-Institute, Greifswald, Germany (13)Spiez Laboratory, Spiez, Switzerland (14)Janssen-Cilag, Stockholm, Sweden (15)Public Health Agency of Canada, Winnipeg, Canada (16)Institut Pasteur Dakar, Dakar, Senegal (17)Laboratoire de Fièvres Hémorragiques de Guinée, Conakry, Guinea (18)Sandia National Laboratories, Albuquerque, New Mexico, USA (19)Ratoma Ebola Diagnostic Center, Conakry, Guinea (20)MRIGlobal, Kansas City, USA (21)Expertise France, Laboratoire K-plan de Forecariah en Guinée, Paris, France (22)Fédération des Laboratoires - HIA Bégin, Paris, France (23)Laboratoire de Biologie - Centre de Traitement des Soignants, Conakry, Guinée (24)World Health Organization, Conakry, Guinea (25)Institut National de Santé Publique, Conakry, Guinea (26)Ministry of Health Guinea, Conakry, Guinea (27)Defence Science and Technology Laboratory (Dstl) Porton Down, Salisbury SP4 0JQ, UK (28)Oxford Nanopore Technologies, Oxford, UK (29)Ontario Institute for Cancer Research, Toronto, Canada (30)Department of Cellular and Molecular Medicine, School of Medical Sciences, University of Bristol, Bristol BS8 1TD, UK (31)National Institute for Infectious Diseases L....
[...]

…
1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

[...]

Alexandros Stamatakis¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Oct 2006-Bioinformatics

TL;DR: UNLABELLED RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML) that has been used to compute ML trees on two of the largest alignments to date.

...read moreread less

Abstract: Summary: RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2--3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25 057 (1463 bp) and 2182 (51 089 bp) taxa, respectively. Availability: icwww.epfl.ch/~stamatak Contact: Alexandros.Stamatakis@epfl.ch Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

14,847 citations

"RAxML version 8: a tool for phyloge..." refers background or methods in this paper

...Since the last RAxML paper (Stamatakis, 2006), it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community....
[...]
...RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analysis of large datasets under maximum likelihood....
[...]
...RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogen- etic analyses of large datasets under maximum likelihood....
[...]

Journal Article•DOI•

New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

[...]

Stéphane Guindon, Jean-François Dufayard, Vincent Lefort, Maria Anisimova, Wim Hordijk, Olivier Gascuel - Show less +2 more

29 Mar 2010-Systematic Biology

TL;DR: A new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves and a new test to assess the support of the data for internal branches of a phylogeny are introduced.

...read moreread less

Abstract: PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.

...read moreread less

14,385 citations

"RAxML version 8: a tool for phyloge..." refers background in this paper

...Since the last RAxML paper (Stamatakis, 2006), it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community....
[...]

Journal Article•DOI•

A Rapid Bootstrap Algorithm for the RAxML Web Servers

[...]

Alexandros Stamatakis¹, Paul Hoover², Jacques Rougemont³•Institutions (3)

Ludwig Maximilian University of Munich¹, San Diego Supercomputer Center², École Polytechnique Fédérale de Lausanne³

01 Oct 2008-Systematic Biology

TL;DR: This work developed, implemented, and thoroughly tested rapid bootstrap heuristics in RAxML (Randomized Axelerated Maximum Likelihood) that are more than an order of magnitude faster than current algorithms and can contribute to resolving the computational bottleneck and improve current methodology in phylogenetic analyses.

...read moreread less

Abstract: Despite recent advances achieved by application of high-performance computing methods and novel algorithmic techniques to maximum likelihood (ML)-based inference programs, the major computational bottleneck still consists in the computation of bootstrap support values. Conducting a probably insufficient number of 100 bootstrap (BS) analyses with current ML programs on large datasets—either with respect to the number of taxa or base pairs—can easily require a month of run time. Therefore, we have developed, implemented, and thoroughly tested rapid bootstrap heuristics in RAxML (Randomized Axelerated Maximum Likelihood) that are more than an order of magnitude faster than current algorithms. These new heuristics can contribute to resolving the computational bottleneck and improve current methodology in phylogenetic analyses. Computational experiments to assess the performance and relative accuracy of these heuristics were conducted on 22 diverse DNA and AA (amino acid), single gene as well as multigene, real-world alignments containing 125 up to 7764 sequences. The standard BS (SBS) and rapid BS (RBS) values drawn on the best-scoring ML tree are highly correlated and show almost identical average support values. The weighted RF (Robinson-Foulds) distance between SBS- and RBS-based consensus trees was smaller than 6% in all cases (average 4%). More importantly, RBS inferences are between 8 and 20 times faster (average 14.73) than SBS analyses with RAxML and between 18 and 495 times faster than BS analyses with competing programs, such as PHYML or GARLI. Moreover, this performance improvement increases with alignment size. Finally, we have set up two freely accessible Web servers for this significantly improved version of RAxML that provide access to the 200-CPU cluster of the Vital-IT unit at the Swiss Institute of Bioinformatics and the 128-CPU cluster of the CIPRES project at the San Diego Supercomputer Center. These Web servers offer the possibility to conduct large-scale phylogenetic inferences to a large part of the community that does not have access to, or the expertise to use, high-performance computing resources. (Maximum likelihood; phylogenetic inference; rapid bootstrap; RAxML; support values.)

...read moreread less

6,585 citations

"RAxML version 8: a tool for phyloge..." refers background in this paper

...Its major strength is a fast maximum likelihood tree search algorithm that returns trees with good likelihood scores....
[...]

Journal Article•DOI•

Ultrafast Approximation for Phylogenetic Bootstrap

[...]

Bui Quang Minh¹, Minh Anh Thi Nguyen², Arndt von Haeseler¹•Institutions (2)

Medical University of Vienna¹, University of Groningen²

01 May 2013-Molecular Biology and Evolution

TL;DR: This work proposes an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees and offers an efficient and easy-to-use software to perform the UFBoot analysis with ML tree inference.

...read moreread less

Abstract: Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.

...read moreread less

2,469 citations

"RAxML version 8: a tool for phyloge..." refers background in this paper

...In the following, I will present some of the most notable new features and extensions of RAxML....
[...]

Journal Article•DOI•

A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data

[...]

Paul O. Lewis¹•Institutions (1)

University of Connecticut¹

01 Nov 2001-Systematic Biology

TL;DR: Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology + sequence data), likelihood ratio tests, and Bayesian analyses.

...read moreread less

Abstract: Evolutionary biologists have adopted simplelikelihood models for purposes of estimating ancestral states and evaluating character independence on specieed phylogenies; however, for pur- poses of estimating phylogenies byusing discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. AnimportantmodiecationofstandardMarkovmodelsinvolvesmakingthelikelihoodconditionalon characters being variable, because constant characters are absent in morphological data sets. Without this modiecation, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likeli- hood analyses (morphologyCsequence data), likelihood ratio tests, and Bayesian analyses. (Discrete morphological character; Markov model; maximum likelihood; phylogeny.) The increased availability of nucleotide and protein sequences from a diversity of both organisms and genes has stimu- lated the development of stochastic models describing evolutionary change in molecu- lar sequences over time. Such models are not only useful for estimating molecular evolutionary parameters of interest but also important as the basis for phylogenetic inference using the method of maximum likelihood (ML) and Bayesian inference. ML provides a very general framework for esti- mation and has been extensively applied in diverse eelds of science (Casella and Berger, 1990); however, the popularity of ML in phylogenetic inference has lagged behind thatofotheroptimality criteria(suchas max- imum parsimony), primarily because of its much greater computational cost for evalu- ating any givencandidate tree.Recent devel- opments on the algorithmic aspects of ML inference as applied to phylogeny recon- struction (Olsen et al., 1994; Lewis, 1998; Salter and Pearl, 2001; Swofford, 2001) have succeeded in reducing this computational cost substantially, and ML phylogeny esti- mates involving hundreds of terminal taxa are now entering the realm of feasibility. Bayesian methods (based on a likelihood foundation) offer the prospect of obtaining meaningful nodal support measures with- out the unreasonable computational burden imposed by existing methods such as boot- strapping (Rannala and Yang, 1996; Yang and Rannala, 1997; Larget and Simon, 1999;

...read moreread less

2,351 citations