scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Genome Sequence of the Pea Aphid Acyrthosiphon pisum

Stephen Richards1, Richard A. Gibbs1, Nicole M. Gerardo2, Nancy A. Moran3  +220 moreInstitutions (58)
01 Jan 2010-PLOS Biology (Public Library of Science)-Vol. 8, Iss: 2, pp 1-24
TL;DR: The genome of the pea aphid shows remarkable levels of gene duplication and equally remarkable gene absences that shed light on aspects of aphid biology, most especially its symbiosis with Buchnera.
Abstract: Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Bernhard Misof, Shanlin Liu, Karen Meusemann1, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen2, Jessica L. Ware2, Tomas Flouri3, Rolf G. Beutel4, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco3, Torsten Wappler5, Jes Rust5, Andre J. Aberer3, Ulrike Aspöck6, Ulrike Aspöck7, Horst Aspöck7, Daniela Bartel7, Alexander Blanke8, Simon Berger3, Alexander Böhm7, Thomas R. Buckley9, Brett Calcott10, Junqing Chen, Frank Friedrich11, Makiko Fukui12, Mari Fujita8, Carola Greve, Peter Grobe, Shengchang Gu, Ying Huang, Lars S. Jermiin1, Akito Y. Kawahara13, Lars Krogmann14, Martin Kubiak11, Robert Lanfear15, Robert Lanfear16, Robert Lanfear17, Harald Letsch7, Yiyuan Li, Zhenyu Li, Jiguang Li, Haorong Lu, Ryuichiro Machida8, Yuta Mashimo8, Pashalia Kapli18, Pashalia Kapli3, Duane D. McKenna19, Guanliang Meng, Yasutaka Nakagaki8, José Luis Navarrete-Heredia20, Michael Ott21, Yanxiang Ou, Günther Pass7, Lars Podsiadlowski5, Hans Pohl4, Björn M. von Reumont22, Kai Schütte11, Kaoru Sekiya8, Shota Shimizu8, Adam Slipinski1, Alexandros Stamatakis23, Alexandros Stamatakis3, Wenhui Song, Xu Su, Nikolaus U. Szucsich7, Meihua Tan, Xuemei Tan, Min Tang, Jingbo Tang, Gerald Timelthaler7, Shigekazu Tomizuka8, Michelle D. Trautwein24, Xiaoli Tong25, Toshiki Uchifune8, Manfred Walzl7, Brian M. Wiegmann26, Jeanne Wilbrandt, Benjamin Wipfler4, Thomas K. F. Wong1, Qiong Wu, Gengxiong Wu, Yinlong Xie, Shenzhou Yang, Qing Yang, David K. Yeates1, Kazunori Yoshizawa27, Qing Zhang, Rui Zhang, Wenwei Zhang, Yunhui Zhang, Jing Zhao, Chengran Zhou, Lili Zhou, Tanja Ziesmann, Shijie Zou, Yingrui Li, Xun Xu, Yong Zhang, Huanming Yang, Jian Wang, Jun Wang, Karl M. Kjer2, Xin Zhou 
07 Nov 2014-Science
TL;DR: The phylogeny of all major insect lineages reveals how and when insects diversified and provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
Abstract: Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

1,998 citations

Journal ArticleDOI
TL;DR: Gut bacteria of other insects have also been shown to contribute to nutrition, protection from parasites and pathogens, modulation of immune responses, and communication, and the extent of these roles is still unclear and awaits further studies.
Abstract: Insect guts present distinctive environments for microbial colonization, and bacteria in the gut potentially provide many beneficial services to their hosts. Insects display a wide range in degree of dependence on gut bacteria for basic functions. Most insect guts contain relatively few microbial species as compared to mammalian guts, but some insects harbor large gut communities of specialized bacteria. Others are colonized only opportunistically and sparsely by bacteria common in other environments. Insect digestive tracts vary extensively in morphology and physicochemical properties, factors that greatly influence microbial community structure. One obstacle to the evolution of intimate associations with gut microorganisms is the lack of dependable transmission routes between host individuals. Here, social insects, such as termites, ants, and bees, are exceptions: social interactions provide opportunities for transfer of gut bacteria, and some of the most distinctive and consistent gut communities, with specialized beneficial functions in nutrition and protection, have been found in social insect species. Still, gut bacteria of other insects have also been shown to contribute to nutrition, protection from parasites and pathogens, modulation of immune responses, and communication. The extent of these roles is still unclear and awaits further studies.

1,633 citations

Journal ArticleDOI
TL;DR: The Environment for Tree Exploration v3 is presented, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics.
Abstract: The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org.

1,452 citations


Cites methods from "Genome Sequence of the Pea Aphid Ac..."

  • ...2010), ETE has been widely used as a computational framework to perform numerous phylogenomic analyses, including characterizing newly sequenced genomes (Richards et al. 2010; Wang et al. 2014), extracting information from large sets of phylogenetic trees (Derelle and Lang 2012; Chiapello et al....

    [...]

  • ...…ETE has been widely used as a computational framework to perform numerous phylogenomic analyses, including characterizing newly sequenced genomes (Richards et al. 2010; Wang et al. 2014), extracting information from large sets of phylogenetic trees (Derelle and Lang 2012; Chiapello et al. 2015;…...

    [...]

Journal ArticleDOI
TL;DR: Since 2006, numerous cases of bacterial symbionts with extraordinarily small genomes have been reported, pointing to highly degenerate genomes that retain only the most essential functions, often including a considerable fraction of genes that serve the hosts.
Abstract: Since 2006, numerous cases of bacterial symbionts with extraordinarily small genomes have been reported. These organisms represent independent lineages from diverse bacterial groups. They have diminutive gene sets that rival some mitochondria and chloroplasts in terms of gene numbers and lack genes that are considered to be essential in other bacteria. These symbionts have numerous features in common, such as extraordinarily fast protein evolution and a high abundance of chaperones. Together, these features point to highly degenerate genomes that retain only the most essential functions, often including a considerable fraction of genes that serve the hosts. These discoveries have implications for the concept of minimal genomes, the origins of cellular organelles, and studies of symbiosis and host-associated microbiota.

1,184 citations

Journal ArticleDOI
TL;DR: This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication, to contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.
Abstract: The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.

895 citations

References
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations


"Genome Sequence of the Pea Aphid Ac..." refers methods in this paper

  • ...We then built one consensus per group with the MAFFT [118] multiple sequence alignment program and classified each consensus (1) according to BLASTER matches using TBLASTX and BLASTX [114] with the entire Repbase Update databank [119] and (2) according to the presence of structural features such as terminal repeats (TIR, LTR, and polyA or SSR tails)....

    [...]

  • ...In the first part of the pipeline, consensus TEs were predicted ab initio by first searching for repeats with BLASTER for an all-by-all BLASTN [114] genome comparison and then results grouped using three clustering methods—GROUPER [115], RECON [116], and PILER [117]—with default parameters....

    [...]

Journal ArticleDOI
TL;DR: This work has used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches.
Abstract: The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/. (Algorithm; computer simulations; maximum likelihood; phylogeny; rbcL; RDPII project.) The size of homologous sequence data sets has in- creased dramatically in recent years, and many of these data sets now involve several hundreds of taxa. More- over, current probabilistic sequence evolution models (Swofford et al., 1996 ; Page and Holmes, 1998 ), notably those including rate variation among sites (Uzzell and Corbin, 1971 ; Jin and Nei, 1990 ; Yang, 1996 ), require an increasing number of calculations. Therefore, the speed of phylogeny reconstruction methods is becoming a sig- nificant requirement and good compromises between speed and accuracy must be found. The maximum likelihood (ML) approach is especially accurate for building molecular phylogenies. Felsenstein (1981) brought this framework to nucleotide-based phy- logenetic inference, and it was later also applied to amino acid sequences (Kishino et al., 1990). Several vari- ants were proposed, most notably the Bayesian meth- ods (Rannala and Yang 1996; and see below), and the discrete Fourier analysis of Hendy et al. (1994), for ex- ample. Numerous computer studies (Huelsenbeck and Hillis, 1993; Kuhner and Felsenstein, 1994; Huelsenbeck, 1995; Rosenberg and Kumar, 2001; Ranwez and Gascuel, 2002) have shown that ML programs can recover the cor- rect tree from simulated data sets more frequently than other methods can. Another important advantage of the ML approach is the ability to compare different trees and evolutionary models within a statistical framework (see Whelan et al., 2001, for a review). However, like all optimality criterion-based phylogenetic reconstruction approaches, ML is hampered by computational difficul- ties, making it impossible to obtain the optimal tree with certainty from even moderate data sets (Swofford et al., 1996). Therefore, all practical methods rely on heuristics that obtain near-optimal trees in reasonable computing time. Moreover, the computation problem is especially difficult with ML, because the tree likelihood not only depends on the tree topology but also on numerical pa- rameters, including branch lengths. Even computing the optimal values of these parameters on a single tree is not an easy task, particularly because of possible local optima (Chor et al., 2000). The usual heuristic method, implemented in the pop- ular PHYLIP (Felsenstein, 1993 ) and PAUP ∗ (Swofford, 1999 ) packages, is based on hill climbing. It combines stepwise insertion of taxa in a growing tree and topolog- ical rearrangement. For each possible insertion position and rearrangement, the branch lengths of the resulting tree are optimized and the tree likelihood is computed. When the rearrangement improves the current tree or when the position insertion is the best among all pos- sible positions, the corresponding tree becomes the new current tree. Simple rearrangements are used during tree growing, namely "nearest neighbor interchanges" (see below), while more intense rearrangements can be used once all taxa have been inserted. The procedure stops when no rearrangement improves the current best tree. Despite significant decreases in computing times, no- tably in fastDNAml (Olsen et al., 1994 ), this heuristic becomes impracticable with several hundreds of taxa. This is mainly due to the two-level strategy, which sepa- rates branch lengths and tree topology optimization. In- deed, most calculations are done to optimize the branch lengths and evaluate the likelihood of trees that are finally rejected. New methods have thus been proposed. Strimmer and von Haeseler (1996) and others have assembled four- taxon (quartet) trees inferred by ML, in order to recon- struct a complete tree. However, the results of this ap- proach have not been very satisfactory to date (Ranwez and Gascuel, 2001 ). Ota and Li (2000, 2001) described

16,261 citations


"Genome Sequence of the Pea Aphid Ac..." refers methods in this paper

  • ...4 [105], using JTT as an evolutionary model and assuming a discrete gammadistribution model with four rate categories and invariant sites, where the gamma shape parameter and the fraction of invariant sites were estimated from the data....

    [...]

  • ...4 [105], using JTT as an evolutionary model and assuming a discrete gamma-distribution model with four rate categories and invariant sites, where the gamma shape parameter and the fraction of invariant sites were estimated from the data....

    [...]

Journal ArticleDOI
TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.
Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

12,003 citations


"Genome Sequence of the Pea Aphid Ac..." refers methods in this paper

  • ...We then built one consensus per group with the MAFFT [118] multiple sequence alignment program and classified each consensus (1) according to BLASTER matches using TBLASTX and BLASTX [114] with the entire Repbase Update databank [119] and (2) according to the presence of structural features such as terminal repeats (TIR, LTR, and polyA or SSR tails)....

    [...]

Journal ArticleDOI
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

10,262 citations


"Genome Sequence of the Pea Aphid Ac..." refers methods in this paper

  • ...For each protein encoded in the pea aphid genome, a SmithWaterman [106] search (e-val 1023) was performed against the above mentioned proteomes....

    [...]

  • ...For each protein encoded in the pea aphid genome, a SmithWaterman [106] search (e-val 10(23)) was performed against the above mentioned proteomes....

    [...]

Journal ArticleDOI
TL;DR: A computerized method is presented that reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.
Abstract: The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.

8,757 citations