scispace - formally typeset
Search or ask a question

Showing papers by "Richard Durbin published in 2020"


Journal ArticleDOI
TL;DR: A novel tool, purge_dups, is presented, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps and can reduce heter allele duplication and increase assembly continuity while maintaining completeness of the primary assembly.
Abstract: Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. Results Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines. Availability and implementation The source code is written in C and is available at https://github.com/dfguan/purge_dups. Supplementary information Supplementary data are available at Bioinformatics online.

728 citations


Posted ContentDOI
Arang Rhie1, Shane A. McCarthy2, Olivier Fedrigo3, Joana Damas4, Giulio Formenti3, Sergey Koren1, Marcela Uliano-Silva2, William Chow2, Arkarachai Fungtammasan, Gregory Gedman3, Lindsey J. Cantin3, Françoise Thibaud-Nissen1, Leanne Haggerty5, Chul Hee Lee6, Byung June Ko6, J. H. Kim6, Iliana Bista2, Michelle Smith2, Bettina Haase3, Jacquelyn Mountcastle3, Sylke Winkler7, Sadye Paez3, Jason T. Howard8, Sonja C. Vernes7, Tanya M. Lama9, Frank Grützner10, Wesley C. Warren11, Christopher N. Balakrishnan12, Dave W Burt13, Jimin George14, Matthew T. Biegler3, David Iorns15, Andrew Digby, Daryl Eason, Taylor Edwards16, Mark Wilkinson17, George F. Turner18, Axel Meyer19, Andreas F. Kautt19, Paolo Franchini19, H. William Detrich20, Hannes Svardal21, Maximilian Wagner22, Gavin J. P. Naylor23, Martin Pippel7, Milan Malinsky2, Mark Mooney, Maria Simbirsky, Brett T. Hannigan, Trevor Pesout24, Marlys L. Houck, Ann C Misuraca, Sarah B. Kingan25, Richard Hall25, Zev N. Kronenberg25, Jonas Korlach25, Ivan Sović25, Christopher Dunn25, Zemin Ning2, Alex Hastie, Joyce V. Lee, Siddarth Selvaraj, Richard E. Green24, Nicholas H. Putnam, Jay Ghurye26, Erik Garrison24, Ying Sims2, Joanna Collins2, Sarah Pelan2, James Torrance2, Alan Tracey2, Jonathan Wood2, Dengfeng Guan27, Sarah E. London28, David F. Clayton14, Claudio V. Mello29, Samantha R. Friedrich29, Peter V. Lovell29, Ekaterina Osipova7, Farooq O. Al-Ajli30, Simona Secomandi31, Heebal Kim6, Constantina Theofanopoulou3, Yang Zhou32, Robert S. Harris33, Kateryna D. Makova33, Paul Medvedev33, Jinna Hoffman1, Patrick Masterson1, Karen Clark1, Fergal J. Martin5, Kevin L. Howe5, Paul Flicek5, Brian P. Walenz1, Woori Kwak, Hiram Clawson24, Mark Diekhans24, Luis R Nassar24, Benedict Paten24, Robert H. S. Kraus19, Harris A. Lewin4, Andrew J. Crawford34, M. Thomas P. Gilbert32, Guojie Zhang32, Byrappa Venkatesh35, Robert W. Murphy36, Klaus-Peter Koepfli37, Beth Shapiro24, Warren E. Johnson37, Federica Di Palma38, Tomas Marques-Bonet39, Emma C. Teeling40, Tandy Warnow41, Jennifer A. Marshall Graves42, Oliver A. Ryder43, David Haussler24, Stephen J. O'Brien44, Kerstin Howe2, Eugene W. Myers45, Richard Durbin2, Adam M. Phillippy1, Erich D. Jarvis3 
23 May 2020-bioRxiv
TL;DR: The Vertebrate Genomes Project is embarked on, an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.
Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species. To address this issue, the international Genome 10K (G10K) consortium has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.

567 citations


Journal ArticleDOI
20 Mar 2020-Science
TL;DR: The authors' study adds data about African, Oceanian, and Amerindian populations and indicates that diversity tends to result from differences at the single-nucleotide level rather than copy number variation.
Abstract: Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.

415 citations


Journal ArticleDOI
TL;DR: Souporcell is developed, a method to cluster cells using the genetic variants detected within the scRNA-seq reads, which achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.
Abstract: Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.

179 citations


Journal ArticleDOI
TL;DR: The results reinforce the role of ancestral hybridization in explosive diversification by demonstrating its significance in one of the largest recent vertebrate adaptive radiations.
Abstract: The adaptive radiation of cichlid fishes in East African Lake Malawi encompasses over 500 species that are believed to have evolved within the last 800,000 years from a common founder population. It has been proposed that hybridization between ancestral lineages can provide the genetic raw material to fuel such exceptionally high diversification rates, and evidence for this has recently been presented for the Lake Victoria region cichlid superflock. Here, we report that Lake Malawi cichlid genomes also show evidence of hybridization between two lineages that split 3-4 Ma, today represented by Lake Victoria cichlids and the riverine Astatotilapia sp. "ruaha blue." The two ancestries in Malawi cichlid genomes are present in large blocks of several kilobases, but there is little variation in this pattern between Malawi cichlid species, suggesting that the large-scale mosaic structure of the genomes was largely established prior to the radiation. Nevertheless, tens of thousands of polymorphic variants apparently derived from the hybridization are interspersed in the genomes. These loci show a striking excess of differentiation across ecological subgroups in the Lake Malawi cichlid assemblage, and parental alleles sort differentially into benthic and pelagic Malawi cichlid lineages, consistent with strong differential selection on these loci during species divergence. Furthermore, these loci are enriched for genes involved in immune response and vision, including opsin genes previously identified as important for speciation. Our results reinforce the role of ancestral hybridization in explosive diversification by demonstrating its significance in one of the largest recent vertebrate adaptive radiations.

84 citations


Journal ArticleDOI
TL;DR: A new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs is presented, which can scale to more populations than previously possible for complex demographic histories including admixture.
Abstract: The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, mig...

79 citations


Journal ArticleDOI
TL;DR: It is demonstrated that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.
Abstract: During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods. We use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels. Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.

27 citations


Journal ArticleDOI
TL;DR: A novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences data from all related individuals is presented, thereby generalizing previous work on single individuals.
Abstract: Motivation Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP) and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available. Current trio assembly approaches are not designed to incorporate long- and short-read data from mother-father-child trios, and therefore require relatively high coverages of costly long-read data to produce high-quality assemblies. Thus, building a trio-aware assembler capable of producing accurate and chromosomal-scale diploid genomes of all individuals in a pedigree, while being cost-effective in terms of sequencing costs, is a pressing need of the genomics community. Results We present a novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences (PacBio) data from all related individuals, thereby generalizing our previous work on single individuals. We demonstrate the effectiveness of our pedigree approach on a simulated trio of pseudo-diploid yeast genomes with different heterozygosity rates, and real data from human chromosome. We show that we require as little as 30× coverage Illumina data and 15× PacBio data from each individual in a trio to generate chromosomal-scale phased assemblies. Additionally, we show that we can detect and phase variants from generated phased assemblies. Availability and implementation https://github.com/shilpagarg/WHdenovo.

22 citations


Journal ArticleDOI
TL;DR: This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes, and provides genomic insights into the geographic population structure of A. plantaginis.
Abstract: Background Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. Findings We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. Conclusions We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.

18 citations


Posted ContentDOI
15 Nov 2020-bioRxiv
TL;DR: A high-quality chromosome-scale genome assembly of the Black Soldier Fly revealing six autosomes and the identification of an X chromosome is generated and this reference sequence will provide an essential tool for future genetic modifications, functional and population genomics.
Abstract: Background Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important mass reared entomological resource for bioconversion of organic material into animal feed. Results We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudo-chromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a BUSCO completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 17,664 protein-coding genes using the BRAKER2 pipeline. We analysed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and the identification of an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of a lab population by assessing runs of homozygosity. This revealed a plethora of inbreeding events including recent long runs of homozygosity on chromosome five. Conclusions Release of this novel chromosome-scale BSF genome assembly will provide an improved platform for further genomic studies and functional characterisation of candidate regions of artificial selection. This reference sequence will provide an essential tool for future genetic modifications, functional and population genomics.

17 citations


Journal ArticleDOI
24 Jun 2020
TL;DR: A genome assembly for Cottoperca gobio (channel bull blenny, (Gunther, 1861); Chordata; Actinopterygii (ray-finned fishes), a temperate water outgroup for Antarctic Notothenioids is presented in this article.
Abstract: We present a genome assembly for Cottoperca gobio (channel bull blenny, (Gunther, 1861)); Chordata; Actinopterygii (ray-finned fishes), a temperate water outgroup for Antarctic Notothenioids The size of the genome assembly is 609 megabases, with the majority of the assembly scaffolded into 24 chromosomal pseudomolecules Gene annotation on Ensembl of this assembly has identified 21,662 coding genes

Posted ContentDOI
31 May 2020-bioRxiv
TL;DR: It is demonstrated that aligning aDNA sequences to variation graphs allows recovering a higher fraction of non-reference variation and effectively mitigates the impact of reference bias in population genetics analyses using aDNA, while retaining mapping sensitivity.
Abstract: During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Recently, alternative approaches for read mapping and genetic variation analysis have been developed that replace the linear reference by a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for ancient DNA and compare our approach to existing methods. We used vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants, and compared these with the same data aligned with bwa to the human linear reference genome. We show that use of vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias, but can have lower sensitivity than vg, particularly for indels. Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analysing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.

Journal ArticleDOI
23 Oct 2020
TL;DR: In this article, the authors present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing, revealing an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions.
Abstract: Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented common genetic variation private to southern Africa, central Africa, Oceania, and the Americas, but an absence of such variants fixed between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the past 10,000 years, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations.

Posted ContentDOI
01 Jul 2020-bioRxiv
TL;DR: The mitoVGP as discussed by the authors is a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (>10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads.
Abstract: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. As part of the Vertebrate Genomes Project (VGP) we have developed mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (>10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline led to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We have observed that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we have identified errors, missing sequences, and incomplete genes in those references, particularly in repeat regions. Our assemblies have also identified novel gene region duplications, shedding new light on mitochondrial genome evolution and organization.

Posted ContentDOI
02 Mar 2020-bioRxiv
TL;DR: This assembly of the wood tiger moth genome is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling highly heterozygous genomes.
Abstract: Background Diploid genome assembly is typically impeded by heterozygosity, as it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution which exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. Findings We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked-reads. Both assemblies are highly contiguous (mean scaffold N50: 8.2Mb) and complete (mean BUSCO completeness: 97.3%), with complete annotations and 31 chromosomes identified through karyotyping. We employed the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from five populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. Conclusions We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling highly heterozygous genomes. Using this assembly, we provide genomic insights into geographic population structure of Arctia plantaginis.

Posted ContentDOI
20 Dec 2020-bioRxiv
TL;DR: In this paper, an automated workflow for jointly analysing ancient and present-day sequence data in a phylogenetic context is presented. But, given the highly degraded nature of aDNA data, post-mortem deamination and often low genomic coverage, combining ancient and modern samples for phylogenetic analyses remains difficult.
Abstract: During the last decade, large volumes of ancient DNA (aDNA) data have been generated as part of whole-genome shotgun and target capture sequencing studies. This includes sequences from non-recombining loci such as the mitochondrial or Y chromosomes. However, given the highly degraded nature of aDNA data, post-mortem deamination and often low genomic coverage, combining ancient and modern samples for phylogenetic analyses remains difficult. Without care, these factors can lead to incorrect placement. For the Y chromosomes, current standard methods focus on curated markers, but these contain only a subset of the total variation. Examining all polymorphic markers is particularly important for low coverage aDNA data because it substantially increases the number of overlapping sites between present-day and ancient individuals which may lead to higher resolution phylogenetic placement. We provide an automated workflow for jointly analysing ancient and present-day sequence data in a phylogenetic context. For each ancient sample, we effectively evaluate the number of ancestral and derived alleles present on each branch and use this information to place ancient lineages to their most likely position in the phylogeny. We provide both a parsimony approach and a highly optimised likelihood-based approach that assigns a posterior probability to each branch. To illustrate the application of this method, we have compiled and make available the largest public Y-chromosomal dataset to date (2,014 samples) which we used as a reference for phylogenetic placement. We process publicly available African ancient DNA Y-chromosome sequences and examine how patterns of Y-chromosomal diversity change across time and the relationship between ancient and present-day lineages. The same software can be used to place samples with large amounts of missing data into other large non-recombining phylogenies such as the mitochondrial tree.

Journal ArticleDOI
13 Feb 2020
TL;DR: A genome assembly from an individual male Sciurus carolinensis (the eastern grey squirrel) is presented, with both X and Y sex chromosomes assembled.
Abstract: We present a genome assembly from an individual male Sciurus carolinensis (the eastern grey squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.82 gigabases in span. The majority of the assembly (92.3%) is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.

Journal ArticleDOI
03 Feb 2020
TL;DR: A genome assembly from an individual male Sciurus vulgaris (the Eurasian red squirrel) is presented, with both X and Y sex chromosomes assembled.
Abstract: We present a genome assembly from an individual male Sciurus vulgaris (the Eurasian red squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.88 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.


Journal ArticleDOI
19 Feb 2020
TL;DR: A genome assembly from an individual male Lutra lutra (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae) is presented.
Abstract: We present a genome assembly from an individual male Lutra lutra (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled.

Posted ContentDOI
24 Nov 2020-bioRxiv
TL;DR: This study presents the first genome-wide methylome study in a large vertebrate evolutionary radiation, focussing on liver and muscle tissues in six genetically similar but eco-morphologically divergent cichlid fishes from Lake Malawi, finding substantial methylome divergence in DNA sequences conserved between species and differentially methylated regions (DMR) are significantly enriched in recently active transposable elements.
Abstract: Epigenetic variation modulates gene expression and can be heritable. However, knowledge of the contribution of epigenetic variation to diversification and speciation in nature remains limited. Here, we present the first genome-wide methylome study in a large vertebrate evolutionary radiation, focussing on liver and muscle tissues in six genetically similar but eco-morphologically divergent cichlid fishes from Lake Malawi. In both tissues we find substantial methylome divergence in DNA sequences conserved between species and differentially methylated regions (DMR) are signifi-cantly enriched in recently active transposable elements. DMRs in the liver are associated with transcription changes of genes with hepatic functions, pointing to a link between dietary ecology and methylome divergence. Unexpectedly, DMRs shared across adult tissues are enriched in genes involved in embryonic and developmental processes, suggesting roles in early embryogen-esis. Our study provides initial evidence for DNA methylation contributing to phenotypic diversifi-cation of cichlids, and represents an important resource for further work.

Journal ArticleDOI
TL;DR: The goals of this conference were to reach a world class standard of science with a large number of contributions from within Africa, to initiate an exchange between African and international researchers and to identify challenges and opportunities for evolutionary genomics research in Africa.
Abstract: We report on the first meeting of SMBE in Africa. SMBE Malawi was initiated to bring together African and international researchers who use genetics or genomics to study natural systems impacted by human activities. The goals of this conference were 1) to reach a world-class standard of science with a large number of contributions from Africa, 2) to initiate exchange between African and international researchers, and 3) to identify challenges and opportunities for evolutionary genomics research in Africa. As repored, we think that we have achieved these goals and make suggestions on the way forward for African evolutionary genomics research.