scispace - formally typeset
Search or ask a question

Showing papers by "Richard Durbin published in 2021"


Journal ArticleDOI
Arang Rhie1, Shane A. McCarthy2, Shane A. McCarthy3, Olivier Fedrigo4, Joana Damas5, Giulio Formenti4, Sergey Koren1, Marcela Uliano-Silva6, William Chow2, Arkarachai Fungtammasan, J. H. Kim7, Chul Hee Lee7, Byung June Ko7, Mark Chaisson8, Gregory Gedman4, Lindsey J. Cantin4, Françoise Thibaud-Nissen1, Leanne Haggerty9, Iliana Bista2, Iliana Bista3, Michelle Smith2, Bettina Haase4, Jacquelyn Mountcastle4, Sylke Winkler10, Sylke Winkler11, Sadye Paez4, Jason T. Howard, Sonja C. Vernes10, Sonja C. Vernes12, Sonja C. Vernes13, Tanya M. Lama14, Frank Grützner15, Wesley C. Warren16, Christopher N. Balakrishnan17, Dave W Burt18, Jimin George19, Matthew T. Biegler4, David Iorns, Andrew Digby, Daryl Eason, Bruce C. Robertson20, Taylor Edwards21, Mark Wilkinson22, George F. Turner23, Axel Meyer24, Andreas F. Kautt24, Andreas F. Kautt25, Paolo Franchini24, H. William Detrich26, Hannes Svardal27, Hannes Svardal28, Maximilian Wagner29, Gavin J. P. Naylor30, Martin Pippel10, Milan Malinsky2, Milan Malinsky31, Mark Mooney, Maria Simbirsky, Brett T. Hannigan, Trevor Pesout32, Marlys L. Houck33, Ann C Misuraca33, Sarah B. Kingan34, Richard Hall34, Zev N. Kronenberg34, Ivan Sović34, Christopher Dunn34, Zemin Ning2, Alex Hastie, Joyce V. Lee, Siddarth Selvaraj, Richard E. Green32, Nicholas H. Putnam, Ivo Gut35, Jay Ghurye36, Erik Garrison32, Ying Sims2, Joanna Collins2, Sarah Pelan2, James Torrance2, Alan Tracey2, Jonathan Wood2, Robel E. Dagnew8, Dengfeng Guan37, Dengfeng Guan3, Sarah E. London38, David F. Clayton19, Claudio V. Mello39, Samantha R. Friedrich39, Peter V. Lovell39, Ekaterina Osipova10, Farooq O. Al-Ajli40, Farooq O. Al-Ajli41, Simona Secomandi42, Heebal Kim7, Constantina Theofanopoulou4, Michael Hiller43, Yang Zhou, Robert S. Harris44, Kateryna D. Makova44, Paul Medvedev44, Jinna Hoffman1, Patrick Masterson1, Karen Clark1, Fergal J. Martin9, Kevin L. Howe9, Paul Flicek9, Brian P. Walenz1, Woori Kwak, Hiram Clawson32, Mark Diekhans32, Luis R Nassar32, Benedict Paten32, Robert H. S. Kraus10, Robert H. S. Kraus24, Andrew J. Crawford45, M. Thomas P. Gilbert46, M. Thomas P. Gilbert47, Guojie Zhang, Byrappa Venkatesh48, Robert W. Murphy49, Klaus-Peter Koepfli50, Beth Shapiro51, Beth Shapiro32, Warren E. Johnson50, Warren E. Johnson52, Federica Di Palma53, Tomas Marques-Bonet, Emma C. Teeling54, Tandy Warnow55, Jennifer A. Marshall Graves56, Oliver A. Ryder33, Oliver A. Ryder57, David Haussler32, Stephen J. O'Brien58, Jonas Korlach34, Harris A. Lewin5, Kerstin Howe2, Eugene W. Myers11, Eugene W. Myers10, Richard Durbin3, Richard Durbin2, Adam M. Phillippy1, Erich D. Jarvis4, Erich D. Jarvis51 
National Institutes of Health1, Wellcome Trust Sanger Institute2, University of Cambridge3, Rockefeller University4, University of California, Davis5, Leibniz Association6, Seoul National University7, University of Southern California8, European Bioinformatics Institute9, Max Planck Society10, Dresden University of Technology11, Radboud University Nijmegen12, University of St Andrews13, University of Massachusetts Amherst14, University of Adelaide15, University of Missouri16, East Carolina University17, University of Queensland18, Clemson University19, University of Otago20, University of Arizona21, Natural History Museum22, Bangor University23, University of Konstanz24, Harvard University25, Northeastern University26, University of Antwerp27, National Museum of Natural History28, University of Graz29, University of Florida30, University of Basel31, University of California, Santa Cruz32, Zoological Society of San Diego33, Pacific Biosciences34, Pompeu Fabra University35, University of Maryland, College Park36, Harbin Institute of Technology37, University of Chicago38, Oregon Health & Science University39, Qatar Airways40, Monash University Malaysia Campus41, University of Milan42, Goethe University Frankfurt43, Pennsylvania State University44, University of Los Andes45, University of Copenhagen46, Norwegian University of Science and Technology47, Agency for Science, Technology and Research48, Royal Ontario Museum49, Smithsonian Institution50, Howard Hughes Medical Institute51, Walter Reed Army Institute of Research52, University of East Anglia53, University College Dublin54, University of Illinois at Urbana–Champaign55, La Trobe University56, University of California, San Diego57, Nova Southeastern University58
28 Apr 2021-Nature
TL;DR: The Vertebrate Genomes Project (VGP) as mentioned in this paper is an international effort to generate high quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

647 citations


Posted ContentDOI
Sergey Nurk1, Sergey Koren1, Arang Rhie1, Rautiainen M1, Andrey Bzikadze2, Alla Mikheenko3, Mitchell R. Vollger4, Nicolas Altemose5, Lev Uralsky, Ariel Gershman6, Sergey Aganezov6, Hoyt Sj7, Mark Diekhans8, Glennis A. Logsdon4, Michael Alonge6, Stylianos E. Antonarakis9, Borchers M10, Gerry Bouffard1, Shelise Brooks1, Caldas Gv5, Hwei-Ling Cheng11, Chen-Shan Chin, William Chow12, de Lima Lg10, Philip C. Dishuck4, Richard Durbin13, Tatiana Dvorkina3, Ian T. Fiddes, Giulio Formenti14, Robert S. Fulton15, Arkarachai Fungtammasan, Erik Garrison16, P. G. S. Grady7, Tina A. Graves-Lindsay15, Ira M. Hall17, Nancy F. Hansen1, Gabrielle A. Hartley7, Marina Haukness8, Kerstin Howe12, Michael W. Hunkapiller18, Chirag Jain1, Miten Jain8, Erich D. Jarvis14, Peter Kerpedjiev, Melanie Kirsche6, Mikhail Kolmogorov2, Jonas Korlach18, Milinn Kremitzki15, Huiyan Li11, Valerie Maduro1, Tobias Marschall19, Ann McCartney1, Jennifer McDaniel20, Danny E. Miller4, Jim C. Mullikin1, Eugene W. Myers21, Nathan D. Olson20, Benedict Paten8, Paul Peluso18, Pavel A. Pevzner2, David Porubsky4, Tamara A. Potapova10, Evgeny I. Rogaev, Jill A. Rosenfeld, Steven L. Salzberg6, Valerie A. Schneider1, Fritz J. Sedlazeck22, Kishwar Shafin8, Colin J. Shew23, Alaina Shumate6, Ying Sims12, Smit Afa24, Daniela C. Soto23, Ivan Sović18, Jessica M. Storer24, Aaron M. Streets5, Beth A. Sullivan25, Françoise Thibaud-Nissen1, James Torrance12, Justin Wagner20, Brian P. Walenz1, Aaron M. Wenger18, Wood Jmd12, Chunlin Xiao1, Stephanie M Yan6, Alice Young1, Samantha Zarate6, Urvashi Surti26, Rajiv C. McCoy6, Megan Y. Dennis23, Ivan Alexandrov27, Ivan Alexandrov3, Jennifer L. Gerton10, Rachel J. O’Neill7, Winston Timp6, Justin M. Zook20, Michael C. Schatz6, Evan E. Eichler4, Karen H. Miga8, Adam M. Phillippy1 
27 May 2021-bioRxiv
TL;DR: The T2T-CHM13 reference as mentioned in this paper contains gapless assemblies for all 22 autosomes plus Chromosome X, corrected numerous errors, and introduced nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding.
Abstract: In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of the human genome, which revolutionized the field of genomics. While these drafts and the updates that followed effectively covered the euchromatic fraction of the genome, the heterochromatin and many other complex regions were left unfinished or erroneous. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. The new T2T-CHM13 reference includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding. The newly completed regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies for the first time.

108 citations


Journal ArticleDOI
TL;DR: The mitoVGP as discussed by the authors is a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10-kbp, PacBio or Nanopore) and short (100-300-bp, Illumina) reads, leading to successful complete mitogenome assemblies of 100 vertebrate species of the VGP.
Abstract: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

48 citations


Journal ArticleDOI
20 Oct 2021-Nature
TL;DR: A large-scale metagenomic analysis of plant and mammal environmental DNA reveals complex ecological changes across the circumpolar region over the past 50,000 years, as biota responded to changing climates, culminating in the postglacial extinction of large mammals and emergence of modern ecosystems as discussed by the authors.
Abstract: During the last glacial–interglacial cycle, Arctic biotas experienced substantial climatic changes, yet the nature, extent and rate of their responses are not fully understood1–8. Here we report a large-scale environmental DNA metagenomic study of ancient plant and mammal communities, analysing 535 permafrost and lake sediment samples from across the Arctic spanning the past 50,000 years. Furthermore, we present 1,541 contemporary plant genome assemblies that were generated as reference sequences. Our study provides several insights into the long-term dynamics of the Arctic biota at the circumpolar and regional scales. Our key findings include: (1) a relatively homogeneous steppe–tundra flora dominated the Arctic during the Last Glacial Maximum, followed by regional divergence of vegetation during the Holocene epoch; (2) certain grazing animals consistently co-occurred in space and time; (3) humans appear to have been a minor factor in driving animal distributions; (4) higher effective precipitation, as well as an increase in the proportion of wetland plants, show negative effects on animal diversity; (5) the persistence of the steppe–tundra vegetation in northern Siberia enabled the late survival of several now-extinct megafauna species, including the woolly mammoth until 3.9 ± 0.2 thousand years ago (ka) and the woolly rhinoceros until 9.8 ± 0.2 ka; and (6) phylogenetic analysis of mammoth environmental DNA reveals a previously unsampled mitochondrial lineage. Our findings highlight the power of ancient environmental metagenomics analyses to advance understanding of population histories and long-term ecological dynamics. A large-scale metagenomic analysis of plant and mammal environmental DNA reveals complex ecological changes across the circumpolar region over the past 50,000 years, as biota responded to changing climates, culminating in the postglacial extinction of large mammals and emergence of modern ecosystems.

44 citations


Journal ArticleDOI
TL;DR: In this article, a high-quality chromosome-scale genome assembly of the Black Soldier fly (BSF) using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology is presented.
Abstract: Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.

25 citations


Journal ArticleDOI
TL;DR: In this article, the authors reported the retrieval of low-coverage (0.03×) environmental genomes from American black bear (Ursus americanus) and a 0.04× environmental genome of the extinct giant short-faced bear (Arctodus simus) from cave sediment samples from northern Mexico dated to 16-14 thousand calibrated years before present (cal kyr BP).

21 citations


Posted ContentDOI
09 Apr 2021-bioRxiv
TL;DR: The Vertebrate Genomes Project (VGP) has been producing assemblies with an emphasis on being as complete and error-free as possible, utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation as mentioned in this paper.
Abstract: Many genome assemblies have been found to be incomplete and contain misassemblies. The Vertebrate Genomes Project (VGP) has been producing assemblies with an emphasis on being as complete and error-free as possible, utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. Here we evaluate these new vertebrate genome assemblies relative to the previous references for the same species, including a mammal (platypus), two birds (zebra finch, Anna’s hummingbird), and a fish (climbing perch). We found that 3 to 11% of genomic sequence was entirely missing in the previous reference assemblies, which included nearly entire GC-rich and repeat-rich microchromosomes with high gene density. Genome-wide, between 25 to 60% of the genes were either completely or partially missing in the previous assemblies, and this was in part due to a bias in GC-rich 5’-proximal promoters and 5’ exon regions. Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the VGP assemblies.

16 citations


Journal ArticleDOI
TL;DR: The ability to separately analyse genomic-scale DNA sequences of closely related species co-preserved in environmental samples is demonstrated, which brings the use of ancient eDNA into the era of population genomics and phylogenetics.
Abstract: Analysis of ancient environmental DNA (eDNA) has revolutionized our ability to describe biological communities in space and time, by allowing for parallel sequencing of DNA from all trophic levels. However, because environmental samples contain sparse and fragmented data from multiple individuals, and often contain closely related species, the field of ancient eDNA has so far been limited to organellar genomes in its contribution to population and phylogenetic studies. This is in contrast to data from fossils where full-genome studies are routine, despite these being rare and their destruction for sequencing undesirable. Here, we report the retrieval of three low coverage (0.03x) genomes from American black bear (Ursus americanus) and a 0.04x genome of an extinct giant short-faced bear (Arctodus simus) from cave sediment samples from northern Mexico dated to 16-14 thousand calibrated years before present (cal kyr BP), which we contextualize with a new high coverage (26x) and two lower coverage giant short-faced bear genomes from ~22-30 cal kyr BP old Yukon fossils. We show that the Late Pleistocene black bear population in Mexico is ancestrally related to the present day eastern American black bear population, and that the extinct giant short-faced bears present in Mexico were deeply divergent from the earlier Beringian population. Our findings demonstrate the ability to separately analyse genomic-scale DNA sequences of closely related species co-preserved in environmental samples, which brings the use of ancient eDNA into the era of population genomics and phylogenetics.

14 citations


Journal ArticleDOI
13 May 2021
TL;DR: A genome assembly from an individual female Salmo trutta (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae) is presented and Gene annotation on Ensembl has identified 43,935 protein coding genes.
Abstract: We present a genome assembly from an individual female Salmo trutta (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae). The genome sequence is 2.37 gigabases in span. The majority of the assembly is scaffolded into 40 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 43,935 protein coding genes.

11 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a comparative genome-wide methylome and transcriptome study, focussing on liver and muscle tissues in phenotypically divergent cichlid species.
Abstract: Epigenetic variation modulates gene expression and can be heritable. However, knowledge of the contribution of epigenetic divergence to adaptive diversification in nature remains limited. The massive evolutionary radiation of Lake Malawi cichlid fishes displaying extensive phenotypic diversity despite extremely low sequence divergence is an excellent system to study the epigenomic contribution to adaptation. Here, we present a comparative genome-wide methylome and transcriptome study, focussing on liver and muscle tissues in phenotypically divergent cichlid species. In both tissues we find substantial methylome divergence among species. Differentially methylated regions (DMR), enriched in evolutionary young transposons, are associated with transcription changes of ecologically-relevant genes related to energy expenditure and lipid metabolism, pointing to a link between dietary ecology and methylome divergence. Unexpectedly, half of all species-specific DMRs are shared across tissues and are enriched in developmental genes, likely reflecting distinct epigenetic developmental programmes. Our study reveals substantial methylome divergence in closely-related cichlid fishes and represents a resource to study the role of epigenetics in species diversification. The Lake Malawi cichlid fishes are an example of extreme vertebrate radiation; however, there is very little sequence divergence among the species. Here the authors present a comparative genome-wide methylome study to suggest DNA methylation played a major role in the extensive phenotypic diversity amongst these fishes.

11 citations


Journal ArticleDOI
TL;DR: The Earth BioGenome Project (EBP) as discussed by the authors is an audacious endeavor to obtain whole genome sequences of representatives from all eukaryotic species on earth, and it also faces complicated ethical, legal, and social issues.
Abstract: The Earth BioGenome Project (EBP) is an audacious endeavor to obtain whole genome sequences of representatives from all eukaryotic species on earth. In addition to the Project’s technical and organizational challenges, it also faces complicated ethical, legal, and social issues. This paper, from members of the EBP’s Ethical, Legal, and Social Issues (ELSI) Committee, catalogs these ELSI concerns arising from EBP. While we do not— and cannot—provide simple, overarching solutions for all of the issues raised here, we conclude our Perspective by beginning to chart a path forward for EBP’s work.

Posted ContentDOI
06 Aug 2021-bioRxiv
TL;DR: In this paper, the authors looked for sexassociated loci in full genome data from 647 individuals of Astatotilapia calliptera from Lake Masoko, a small isolated crater lake in Tanzania, which contains two distinct ecomorphs of the species.
Abstract: African cichlid fishes not only exhibit remarkably high rates of speciation but also have some of the fastest evolving sex determination systems in vertebrates. However, little is known empirically in cichlids about the genetic mechanisms generating new sex-determining variants, what forces dictate their fate, the demographic scales at which they evolve, and whether they are related to speciation. To address these questions, we looked for sex-associated loci in full genome data from 647 individuals of Astatotilapia calliptera from Lake Masoko, a small isolated crater lake in Tanzania, which contains two distinct ecomorphs of the species. We identified three separate XY systems on recombining chromosomes. Two Y alleles derive from mutations that increase expression of the gonadal soma-derived factor gene (gsdf) on chromosome 7; the first is a tandem duplication of the entire gene observed throughout much of the Lake Malawi haplochromine cichlid radiation to which A. calliptera belongs, and the second is a 5 kb insertion directly upstream of gsdf. Both the latter variant and another 700 bp insertion on chromosome 19 responsible for the third Y allele arose from transposable element insertions. Males belonging to the Masoko deep-water benthic ecomorph are determined exclusively by the gsdf duplication, whereas all three Y alleles are used in the Masoko littoral ecomorph, in which they appear to act antagonistically among males with different amounts of benthic admixture. This antagonism in the face of ongoing admixture may be important for sustaining multifactorial sex determination in Lake Masoko. In addition to identifying the molecular basis of three coexisting sex determining alleles, these results demonstrate that genetic interactions between Y alleles and genetic background can potentially affect fitness and adaptive evolution.

Journal ArticleDOI
TL;DR: Pin_hic as discussed by the authors is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy.
Abstract: BACKGROUND Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.

Posted ContentDOI
30 Jul 2021-bioRxiv
TL;DR: In this article, the extent and functional relevance of DNA methylome divergence between two Astatotilapia calliptera ecomorphs in crater Lake Masoko, southern Tanzania, was investigated.
Abstract: Epigenetic variation can alter transcription and promote phenotypic divergence between populations facing different environmental challenges. Here we assess the epigenetic basis of diversification during the early stages of speciation. We focus on the extent and functional relevance of DNA methylome divergence between two Astatotilapia calliptera ecomorphs in crater Lake Masoko, southern Tanzania. We report extensive genome-wide methylome divergence between populations linked to key biological processes, including transcriptional activity of ecologically-relevant genes. These include genes involved in steroid metabolism, haemoglobin composition and erythropoiesis, consistent with divergent habitat occupancy of the ecomorphs. Using a common garden experiment, we found that global methylation profiles are rapidly remodelled across generations, but ecomorph-specific differences can be inherited. Collectively, our study suggests an epigenetic contribution to early stages of vertebrate speciation. One sentence summary Inheritance and plasticity of epigenetic divergence characterise early stages of speciation in an incipient cichlid species of an African crater lake.

Journal ArticleDOI
29 Jun 2021
TL;DR: A genome assembly based on an individual female Aphantopus hyperantus (the ringlet butterfly; Arthropoda; Insecta; Lepidoptera, Nymphalidae), scaffolded using data from a second, unrelated specimen is presented.
Abstract: We present a genome assembly based on an individual female Aphantopus hyperantus , also known as Maniola hyperantus (the ringlet butterfly; Arthropoda; Insecta; Lepidoptera, Nymphalidae), scaffolded using data from a second, unrelated specimen. The genome sequence is 411 megabases in span. The majority of the assembly is scaffolded into 29 chromosomal pseudomolecules, including the Z sex chromosome.

Posted ContentDOI
30 Apr 2021-bioRxiv
TL;DR: In this article, the genome of a wild Betta splendens and whole-genome sequenced multiple individuals across five species within the B. splendens species complex, including wild populations and domesticated ornamental betta.
Abstract: Siamese fighting fish, commonly known as betta, are among the world’s most popular and morphologically diverse pet fish, but the genetic processes leading to their domestication and phenotypic diversification are largely unknown. We assembled de novo the genome of a wild Betta splendens and whole-genome sequenced multiple individuals across five species within the B. splendens species complex, including wild populations and domesticated ornamental betta. Given our estimate of the mutation rate from pedigrees, our analyses suggest that betta were domesticated at least 1,000 years ago, centuries earlier than previously thought. Ornamental betta individuals have variable contributions from other Betta species and have also introgressed into wild populations of those species. We identify dmrt1 as the main sex determination gene in ornamental betta but not in wild B. splendens, and find evidence for recent directional selection at the X-allele of the locus. Furthermore, we find genes with signatures of recent, strong selection that have large effects on color in specific parts of the body, or the shape of individual fins, and are almost all unlinked. Our results demonstrate how simple genetic architectures paired with anatomical modularity can lead to vast phenotypic diversity generated during animal domestication, and set the stage for using betta as a modern system for evolutionary genetics. One-Sentence Summary Genomic analyses reveal betta fish were domesticated more than 1,000 years ago and the genes that changed in the process.

Posted ContentDOI
04 Feb 2021-bioRxiv
TL;DR: In this article, the authors compared fibroblast-derived human induced pluripotent stem cells (F-hiPSCs) derived from different tissues, skin and blood, in the same individual.
Abstract: Summary Human Induced Pluripotent Stem Cells (hiPSC) are an established patient-specific model system where opportunities are emerging for cell-based therapies We contrast hiPSCs derived from different tissues, skin and blood, in the same individual We show extensive single-nucleotide mutagenesis in all hiPSC lines, although fibroblast-derived hiPSCs (F-hiPSCs) are particularly heavily mutagenized by ultraviolet(UV)-related damage We utilize genome sequencing data on 454 F-hiPSCs and 44 blood-derived hiPSCs (B-hiPSCs) to gain further insights Across 324 whole genome sequenced(WGS) F-hiPSCs derived by the Human Induced Pluripotent Stem Cell Initiative (HipSci), UV-related damage is present in ~72% of cell lines, sometimes causing substantial mutagenesis (range 025-15 per Mb) Furthermore, we find remarkable genomic heterogeneity between independent F-hiPSC clones derived from the same reprogramming process in the same donor, due to oligoclonal populations within fibroblasts Combining WGS and exome-sequencing data of 452 HipSci F-hiPSCs, we identify 272 predicted pathogenic mutations in cancer-related genes, of which 21 genes were hit recurrently three or more times, involving 77 (17%) lines Notably, 151 of 272 mutations were present in starting fibroblast populations suggesting that more than half of putative driver events in F-hiPSCs were acquired in vivo In contrast, B-hiPSCs reprogrammed from erythroblasts show lower levels of genome-wide mutations (range 028-14 per Mb), no UV damage, but a strikingly high prevalence of acquired BCOR mutations of ~57%, indicative of strong selection pressure All hiPSCs had otherwise stable, diploid genomes on karyotypic pre-screening, highlighting how copy-number-based approaches do not have the required resolution to detect widespread nucleotide mutagenesis This work strongly suggests that models for cell-based therapies require detailed nucleotide-resolution characterization prior to clinical application

Journal ArticleDOI
14 May 2021
TL;DR: A genome assembly from an individual female Aquila chrysaetos chrySAetos (the European golden eagle; Chordata; Aves; Accipitridae) is presented.
Abstract: We present a genome assembly from an individual female Aquila chrysaetos chrysaetos (the European golden eagle; Chordata; Aves; Accipitridae). The genome sequence is 1.23 gigabases in span. The majority of the assembly is scaffolded into 28 chromosomal pseudomolecules, including the W and Z sex chromosomes.

Posted ContentDOI
24 Jul 2021-bioRxiv
TL;DR: In this article, a comparative genome-wide methylome and transcriptome study of Lake Malawi cichlid fishes is presented, focussing on liver and muscle tissues in phenotypically divergent species.
Abstract: Epigenetic variation modulates gene expression and can be heritable. However, knowledge of the contribution of epigenetic divergence to adaptive diversification in nature remains limited. The massive evolutionary radiation of Lake Malawi cichlid fishes displaying extensive phenotypic diversity despite extremely low sequence divergence is an excellent system to study the epigenomic contribution to adaptation. Here, we present the first comparative genome-wide methylome and transcriptome study, focussing on liver and muscle tissues in phenotypically divergent cichlid species. In both tissues we find substantial methylome divergence among species. Differentially methylated regions (DMR), enriched in evolutionary young transposons, are associated with transcription changes of ecologically-relevant genes related to energy expenditure and lipid metabolism, poiting to a link between dietary ecology and methylome divergence. Unexpectedly, half of all species-specific DMRs are shared across tissues and are enriched in developmental genes, likely reflecting distinct epigenetic developmental programmes. Our study reveals substantial methylome divergence in closely-related cichlid fishes and represents a valuable resource to study the role of epigenetics in species diversification.