scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Building Phylogenetic Trees from Molecular Data with MEGA

01 May 2013-Molecular Biology and Evolution (Oxford University Press)-Vol. 30, Iss: 5, pp 1229-1235
TL;DR: A step-by-step protocol is presented in sufficient detail to allow a novice to start with a sequence of interest and to build a publication-quality tree illustrating the evolution of an appropriate set of homologs of that sequence.
Abstract: Phylogenetic analysis is sometimes regarded as being an intimidating, complex process that requires expertise and years of experience. In fact, it is a fairly straightforward process that can be learned quickly and applied effectively. This Protocol describes the several steps required to produce a phylogenetic tree from molecular data for novices. In the example illustrated here, the program MEGA is used to implement all those steps, thereby eliminating the need to learn several programs, and to deal with multiple file formats from one step to another (Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 28:2731‐2739). The first step, identification of a set of homologous sequences and downloading those sequences, is implemented by MEGA’s own browser built on top of the Google Chrome toolkit. For the second step, alignment of those sequences, MEGA offers two different algorithms: ClustalW and MUSCLE. For the third step, construction of a phylogenetic tree from the aligned sequences, MEGA offers many different methods. Here we illustrate the maximum likelihood method, beginning with MEGA’s Models feature, which permits selecting the most suitable substitution model. Finally, MEGA provides a powerful and flexible interface for the final step, actually drawing the tree for publication. Here a step-by-step protocol is presented in sufficient detail to allow a novice to start with a sequence of interest and to build a publication-quality tree illustrating the evolution of an appropriate set of homologs of that sequence. MEGA is available for use on PCs and Macs from www. megasoftware.net.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
08 Jan 2015-Nature
TL;DR: It is determined that Clostridium scindens, a bile acid 7α-dehydroxylating intestinal bacterium, is associated with resistance to C. difficile infection and, upon administration, enhances resistance to infection in a secondary bile Acid dependent fashion.
Abstract: The gastrointestinal tracts of mammals are colonized by hundreds of microbial species that contribute to health, including colonization resistance against intestinal pathogens. Many antibiotics destroy intestinal microbial communities and increase susceptibility to intestinal pathogens. Among these, Clostridium difficile, a major cause of antibiotic-induced diarrhoea, greatly increases morbidity and mortality in hospitalized patients. Which intestinal bacteria provide resistance to C. difficile infection and their in vivo inhibitory mechanisms remain unclear. Here we correlate loss of specific bacterial taxa with development of infection, by treating mice with different antibiotics that result in distinct microbiota changes and lead to varied susceptibility to C. difficile. Mathematical modelling augmented by analyses of the microbiota of hospitalized patients identifies resistance-associated bacteria common to mice and humans. Using these platforms, we determine that Clostridium scindens, a bile acid 7α-dehydroxylating intestinal bacterium, is associated with resistance to C. difficile infection and, upon administration, enhances resistance to infection in a secondary bile acid dependent fashion. Using a workflow involving mouse models, clinical studies, metagenomic analyses, and mathematical modelling, we identify a probiotic candidate that corrects a clinically relevant microbiome deficiency. These findings have implications for the rational design of targeted antimicrobials as well as microbiome-based diagnostics and therapeutics for individuals at risk of C. difficile infection.

1,413 citations

Journal ArticleDOI
TL;DR: It is hypothesized that the direct progenitor of SARS-CoV may have originated after sequential recombination events between the precursors of these SARSr-CoVs, and highlights the necessity of preparedness for future emergence of Sars-like diseases.
Abstract: A large number of SARS-related coronaviruses (SARSr-CoV) have been detected in horseshoe bats since 2005 in different areas of China. However, these bat SARSr-CoVs show sequence differences from SARS coronavirus (SARS-CoV) in different genes (S, ORF8, ORF3, etc) and are considered unlikely to represent the direct progenitor of SARS-CoV. Herein, we report the findings of our 5-year surveillance of SARSr-CoVs in a cave inhabited by multiple species of horseshoe bats in Yunnan Province, China. The full-length genomes of 11 newly discovered SARSr-CoV strains, together with our previous findings, reveals that the SARSr-CoVs circulating in this single location are highly diverse in the S gene, ORF3 and ORF8. Importantly, strains with high genetic similarity to SARS-CoV in the hypervariable N-terminal domain (NTD) and receptor-binding domain (RBD) of the S1 gene, the ORF3 and ORF8 region, respectively, were all discovered in this cave. In addition, we report the first discovery of bat SARSr-CoVs highly similar to human SARS-CoV in ORF3b and in the split ORF8a and 8b. Moreover, SARSr-CoV strains from this cave were more closely related to SARS-CoV in the non-structural protein genes ORF1a and 1b compared with those detected elsewhere. Recombination analysis shows evidence of frequent recombination events within the S gene and around the ORF8 between these SARSr-CoVs. We hypothesize that the direct progenitor of SARS-CoV may have originated after sequential recombination events between the precursors of these SARSr-CoVs. Cell entry studies demonstrated that three newly identified SARSr-CoVs with different S protein sequences are all able to use human ACE2 as the receptor, further exhibiting the close relationship between strains in this cave and SARS-CoV. This work provides new insights into the origin and evolution of SARS-CoV and highlights the necessity of preparedness for future emergence of SARS-like diseases.

801 citations

Journal ArticleDOI
TL;DR: The results showed that many of the endophytic strains produced GA and have moderate to high phosphate solubilization capacities, and when inoculated into P. sativum L. plants grown in soil under soluble phosphate limiting conditions, theendophytes that produced medium-high levels of GA displayed beneficial plant growth promotion effects.
Abstract: The use of plant growth promoting bacterial inoculants as live microbial biofertilisers provides a promising alternative to chemical fertilisers and pesticides. Inorganic phosphate solubilisation is one of the major mechanisms of plant growth promotion by plant associated bacteria. This involves bacteria releasing organic acids into the soil which solubilise the phosphate complexes converting them into ortho-phosphate which is available for plant up-take and utilisation. The study presented here describes the ability of endophytic bacterial isolates to produce gluconic acid, solubilise insoluble phosphate and stimulate the growth of Pea plants (Pisum sativum). This study also describes the genetic systems within three of these endophyte isolates thought to be responsible for their effective phosphate solubilising abilities. The results showed that many of the endophytic isolates produced gluconic acid (14-169 mM) and have moderate to high phosphate solubilisation capacities (~ 400-1300 mg L-1). When inoculated to Pea plants grown in sand/soil under soluble phosphate limiting conditions, the endophyte isolates that produced medium to high levels of gluconic acid also displayed enhanced plant growth promotion effects.

558 citations


Cites methods from "Building Phylogenetic Trees from Mo..."

  • ...Protein alignments were performed using the ClustalW (Thompson et al., 1994) function, using an adjusted multiple alignment gap opening penalty of three and a gap extension penalty of 1.8 (Hall, 2013)....

    [...]

Journal ArticleDOI
28 May 2013-eLife
TL;DR: It is proposed that HERB-1 and US-1 emerged from a metapopulation that was established in the early 1800s outside of the species' center of diversity, which replaced it outside of Mexico in the 20th century.
Abstract: Phytophthora infestans, the cause of potato late blight, is infamous for having triggered the Irish Great Famine in the 1840s. Until the late 1970s, P. infestans diversity outside of its Mexican center of origin was low, and one scenario held that a single strain, US-1, had dominated the global population for 150 years; this was later challenged based on DNA analysis of historical herbarium specimens. We have compared the genomes of 11 herbarium and 15 modern strains. We conclude that the 19th century epidemic was caused by a unique genotype, HERB-1, that persisted for over 50 years. HERB-1 is distinct from all examined modern strains, but it is a close relative of US-1, which replaced it outside of Mexico in the 20th century. We propose that HERB-1 and US-1 emerged from a metapopulation that was established in the early 1800s outside of the species' center of diversity. DOI: http://dx.doi.org/10.7554/eLife.00731.001

346 citations

Journal ArticleDOI
24 Mar 2016-PLOS ONE
TL;DR: Evidence is provided that PGPR inoculation, namely, B. pumilus S1r1 can biologically fix atmospheric N2 and provide an alternative technique, besides plant breeding, to delay N remobilisation in maize plant for higher ear yield with reduced fertiliser-N input.
Abstract: Plant growth-promoting rhizobacteria (PGPR) may provide a biological alternative to fix atmospheric N2 and delay N remobilisation in maize plant to increase crop yield, based on an understanding that plant-N remobilisation is directly correlated to its plant senescence. Thus, four PGPR strains were selected from a series of bacterial strains isolated from maize roots at two locations in Malaysia. The PGPR strains were screened in vitro for their biochemical plant growth-promoting (PGP) abilities and plant growth promotion assays. These strains were identified as Klebsiella sp. Br1, Klebsiella pneumoniae Fr1, Bacillus pumilus S1r1 and Acinetobacter sp. S3r2 and a reference strain used was Bacillus subtilis UPMB10. All the PGPR strains were tested positive for N2 fixation, phosphate solubilisation and auxin production by in vitro tests. In a greenhouse experiment with reduced fertiliser-N input (a third of recommended fertiliser-N rate), the N2 fixation abilities of PGPR in association with maize were determined by 15N isotope dilution technique at two harvests, namely, prior to anthesis (D50) and ear harvest (D65). The results indicated that dry biomass of top, root and ear, total N content and bacterial colonisations in non-rhizosphere, rhizosphere and endosphere of maize roots were influenced by PGPR inoculation. In particular, the plants inoculated with B. pumilus S1r1 generally outperformed those with the other treatments. They produced the highest N2 fixing capacity of 30.5% (262 mg N2 fixed plant−1) and 25.5% (304 mg N2 fixed plant−1) of the total N requirement of maize top at D50 and D65, respectively. N remobilisation and plant senescence in maize were delayed by PGPR inoculation, which is an indicative of greater grain production. This is indicated by significant interactions between PGPR strains and time of harvests for parameters on N uptake and at. % 15Ne of tassel. The phenomenon is also supported by the lower N content in tassels of maize treated with PGPR, namely, B. pumilus S1r1, K. pneumoniae Fr1, B. subtilis UPMB10 and Acinetobacter sp. S3r2 at D65 harvest. This study provides evidence that PGPR inoculation, namely, B. pumilus S1r1 can biologically fix atmospheric N2 and provide an alternative technique, besides plant breeding, to delay N remobilisation in maize plant for higher ear yield (up to 30.9%) with reduced fertiliser-N input.

238 citations


Cites methods from "Building Phylogenetic Trees from Mo..."

  • ...The tree was constructed with Mega version 5 software package [23] by using the maximum likelihood method from distance calculated by the method of Kimura two-parameter model with a discrete Gamma distribution [24]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations


"Building Phylogenetic Trees from Mo..." refers methods in this paper

  • ...Two alignment methods are provided: ClustalW (Thompson et al. 1994) and MUSCLE (Edgar 2004a, 2004b)....

    [...]

  • ...For the second step, alignment of those sequences, MEGA offers two different algorithms: ClustalW and MUSCLE....

    [...]

  • ...For ClustalW, the default settings are fine for DNA, but for proteins, I recommend changing the Multiple Alignment Gap Opening penalty to 3 and the Multiple Alignment Gap Extension penalty to 1.8....

    [...]

Journal ArticleDOI
TL;DR: The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models, inferring ancestral states and sequences, and estimating evolutionary rates site-by-site.
Abstract: Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

39,110 citations


"Building Phylogenetic Trees from Mo..." refers background in this paper

  • ...MEGA5 (Tamura et al. 2011) is an integrated program that carries out all four steps in a single environment, with a single user interface eliminating the need for interconverting file formats....

    [...]

  • ...Here a step-by-step protocol is presented in sufficient detail to allow a novice to start with a sequence of interest and to build a publication-quality tree illustrating the evolution of an appropriate set of homologs of that sequence....

    [...]

Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations


"Building Phylogenetic Trees from Mo..." refers methods in this paper

  • ...For MUSCLE, I recommend that you accept the default settings....

    [...]

  • ...Two alignment methods are provided: ClustalW (Thompson et al. 1994) and MUSCLE (Edgar 2004a, 2004b)....

    [...]

  • ...Either can be used, but in general MUSCLE is preferable....

    [...]

  • ...For the second step, alignment of those sequences, MEGA offers two different algorithms: ClustalW and MUSCLE....

    [...]

  • ...In the tool bar, near the top of the window, Clustal alignment is symbolized by the W button, and MUSCLE by an arm with clenched fist to “show a muscle.”...

    [...]

Journal ArticleDOI
TL;DR: A new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves and a new test to assess the support of the data for internal branches of a phylogeny are introduced.
Abstract: PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.

14,385 citations


"Building Phylogenetic Trees from Mo..." refers methods in this paper

  • ...Step 3.6: Alternatives to MEGA5 for Estimating the Tree PhyML (http://www.atgc-montpellier.fr/phyml/binaries.php) (Guindon et al. 2010) is another program that estimates ML trees, and it can also be used over the web http://www. atgc-montpellier.fr/phyml/....

    [...]

Trending Questions (1)
How to construct phylogeny trees with DNA sequences using MEGA?

The paper provides a step-by-step protocol for constructing phylogenetic trees from DNA sequences using MEGA software. It covers steps such as sequence identification, alignment, and tree estimation using the maximum likelihood method.