scispace - formally typeset
Search or ask a question

Showing papers on "Genomics published in 2022"


Journal ArticleDOI
TL;DR: This review focused on the genomics, transmission, and effectiveness of vaccines against the Omicron variant, which will be helpful for further investigation of a new variant of SARS‐CoV‐2.
Abstract: Currently, severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has spread worldwide as an Omicron variant. This variant is a heavily mutated virus and designated as a variant of concern by the World Health Organization (WHO). WHO cautioned that the Omicron variant of SARS‐CoV‐2 held a very high risk of infection, reigniting anxieties about the economy's recovery from the 2‐year pandemic. The extensively mutated Omicron variant is likely to spread internationally, posing a high risk of infection surges with serious repercussions in some areas. According to preliminary data, the Omicron variant of SARS‐CoV‐2 has a higher risk of reinfection. On the other hand, whether the current COVID‐19 vaccines could effectively resist the new strain is still under investigation. However, there is very limited information on the current situation of the Omicron variant, such as genomics, transmissibility, efficacy of vaccines, treatment, and management. This review focused on the genomics, transmission, and effectiveness of vaccines against the Omicron variant, which will be helpful for further investigation of a new variant of SARS‐CoV‐2.

504 citations


Journal ArticleDOI
TL;DR: The Human Pangenome Reference Consortium (HPRC) as discussed by the authors aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity.
Abstract: The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.

124 citations


Journal ArticleDOI
TL;DR: In this article , the authors highlight the importance of sincere, concerted global efforts toward genomic equity to ensure the benefits of genomic medicine are accessible to all, and describe factors that have contributed to the imbalance in representation of different populations and propose a roadmap to enhancing inclusion and ensuring equal health benefits of genomics advances.
Abstract: Two decades ago, the sequence of the first human genome was published. Since then, advances in genome technologies have resulted in whole-genome sequencing and microarray-based genotyping of millions of human genomes. However, genetic and genomic studies are predominantly based on populations of European ancestry. As a result, the potential benefits of genomic research—including better understanding of disease etiology, early detection and diagnosis, rational drug design and improved clinical care—may elude the many underrepresented populations. Here, we describe factors that have contributed to the imbalance in representation of different populations and, leveraging our experiences in setting up genomic studies in diverse global populations, we propose a roadmap to enhancing inclusion and ensuring equal health benefits of genomics advances. Our Perspective highlights the importance of sincere, concerted global efforts toward genomic equity to ensure the benefits of genomic medicine are accessible to all.

117 citations


Journal ArticleDOI
Giulio Formenti, Kathrin Theissinger, Carlos Fernandes, Iliana Bista, Aureliano Bombarely, Christoph Bleidorn, Claudio Ciofi, Angelica Crottini, José Alberto Godoy Godoy, Jacob Höglund, Joanna Malukiewicz, Alice Mouton, Rebekah A. Oomen, Sadye Paez, Per J. Palsbøll, Christophe Pampoulie, Hannes Svardal, Constantina Theofanopoulou, Jan de Vries, Ann-Marie Waldvogel, Guojie Zhang, Camila J. Mazzoni, Miklós Bálint, Fedor Čiampor, J. Hoglund, María José Ruiz-López, Goujie Zhang, Erich D. Jarvis, Sargis A. Aghayan, Tyler Alioto, Isabel Almudi, Nadir Alvarez, Paulo C. Alves, Isabel R. Amorim, Agostinho Antunes, Paula Arribas, Petr Baldrian, Paul R. Berg, Giorgio Bertorelle, Astrid Böhne, Andrea Bonisoli-Alquati, Ljudevit Luka Boštjančić, Bastien Boussau, Catherine Breton, Elena Buzan, Paula F. Campos, Carlos Carreras, Luis Filipe Castro, Luis J. Chueca, Elena Conti, Robert Cook-Deegan, Daniel Croll, Mónica V. Cunha, Frédéric Delsuc, Alice B. Dennis, Dimitar Dimitrov, Rui Faria, Adrien Favre, Olivier Fedrigo, Rosa Fernández, Gentile Francesco Ficetola, Jean-François Flot, Toni Gabaldón, Dolores R. Galea Agius, Guido Roberto Gallo, Alice Maria Giani, M. Thomas P. Gilbert, Tine Grebenc, Katerina Guschanski, Romain Guyot, Bernhard Hausdorf, Oliver Hawlitschek, Peter D. Heintzman, Berthold Heinze, Michael Hiller, Martin Husemann, Alessio Iannucci, Iker Irisarri, Kjetill S. Jakobsen, Sissel Jentoft, Peter Klinga, Agnieszka Kloch, Claudius F. Kratochwil, Henrik Kusche, Kara K S Layton, Jennifer A. Leonard, Emmanuelle Lerat, Gianni Liti, Tereza Manousaki, Tomas Marques-Bonet, Pável Matos-Maraví, Michael Matschiner, Florian Maumus, Ann M Mc Cartney, Shai Meiri, José Melo-Ferreira, Ximo Mengual, Michael T. Monaghan, Matteo Montagna, Robert W. Mysłajek, Marco T. Neiber, Violaine Nicolas, Marta Novo, Petar Ozretić, Ferran Palero, Lucian Pârvulescu, Marta Pascual, Octávio S. Paulo, Martina Pavlek, Cinta Pegueroles, Loïc Pellissier, Graziano Pesole, Craig R. Primmer, Ana Riesgo, Lukas Rüber, Diego Rubolini, Daniel Salvi, Ole Seehausen, Matthias Seidel, Simona Secomandi, Bruno Studer, Spyros Theodoridis, Marco Thines, Lara Urban, Anti Vasemägi, Adriana Vella, Noel Vella, Sonja C. Vernes, Cristiano Vernesi, David R. Vieites, Robert M. Waterhouse, Christopher W. Wheat, Gert Wörheide, Yannick Wurm, Gabrielle Zammit 
TL;DR: In this article , a large-scale generation of reference genomes representing global biodiversity is discussed. But the authors focus on the large-size generation of the reference genomes and do not discuss how to generate reference genomes for the conservation genomics.
Abstract: Progress in genome sequencing now enables the large-scale generation of reference genomes. Various international initiatives aim to generate reference genomes representing global biodiversity. These genomes provide unique insights into genomic diversity and architecture, thereby enabling comprehensive analyses of population and functional genomics, and are expected to revolutionize conservation genomics.

77 citations


Journal ArticleDOI
04 Feb 2022-Genetics
TL;DR: A broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum are provided.
Abstract: Abstract WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.

66 citations


Journal ArticleDOI
03 Jun 2022-Science
TL;DR: A high-throughput method that yields the genomes of individual microbes from complex microbial communities, exploring the human gut microbiome and developing a generalizable computational framework that combines sequencing reads from multiple microbes of the same species to generate a comprehensive list of reference genomes.
Abstract: Characterizing complex microbial communities with single-cell resolution has been a long-standing goal of microbiology. We present Microbe-seq, a high-throughput method that yields the genomes of individual microbes from complex microbial communities. We encapsulate individual microbes in droplets with microfluidics and liberate their DNA, which we then amplify, tag with droplet-specific barcodes, and sequence. We explore the human gut microbiome, sequencing more than 20,000 microbial single-amplified genomes (SAGs) from a single human donor and coassembling genomes of almost 100 bacterial species, including several with multiple subspecies strains. We use these genomes to probe microbial interactions, reconstructing the horizontal gene transfer (HGT) network and observing HGT between 92 species pairs; we also identify a significant in vivo host-phage association between crAssphage and one strain of Bacteroides vulgatus. Microbe-seq contributes high-throughput culture-free capabilities to investigate genomic blueprints of complex microbial communities with single-microbe resolution. Description Strain specific single-cell sequencing Single-cell methods are the state of the art in biological research. Zheng et al. developed a high-throughput technique called Microbe-seq designed to analyze single bacterial cells from a microbiota. Microbe-seq uses microfluidics to separate individual bacterial cells within droplets and then extract, amplify, and barcode their DNA, which is then subject to pooled Illumina sequencing. The technique was tested by sequencing multiple human fecal samples to generate barcoded reads for thousands of single amplified genomes (SAGs) per sample. Pooling the SAGs corresponding to the same bacterial species allowed consensus assemblies of these genomes to provide insights into strain-level diversity and revealed a phage association and the limits on horizontal gene-transfer events between strains. —CA A microfluidic method was developed for high-throughput single-cell sequencing of human gut microbiota strains. INTRODUCTION The human gut microbiome is a complex ecosystem specific to each individual that comprises hundreds of microbial species. Different strains of the same species can impact health disparately in important ways, such as through antibiotic resistance and host-microbiome interactions. Consequently, consideration of microbes only at the species level without identifying their strains obscures important distinctions. The strain-level genomic structure of the gut microbiome has yet to be elucidated fully, even within a single person. Shotgun metagenomics broadly surveys the genomic content of microbial communities but in general cannot capture strain-level variations. Conversely, culture-based approaches and titer plate-based single-cell sequencing can yield strain-resolved genomes, but access only a limited number of microbial strains. RATIONALE We develop and validate Microbe-seq—a high-throughput single-cell sequencing method with strain resolution—and apply it to the human gut microbiome. Using an integrated microfluidic workflow, we encapsulate tens of thousands of microbes individually into droplets. Within each droplet, we lyse the microbe, perform whole-genome amplification, and tag the DNA with droplet-specific barcodes; we then pool the DNA from all droplets and sequence. In mammalian systems—the focus of most single-cell studies—high-quality reference genomes are available for the small number of species under investigation; by contrast, in complex communities of 100 or more microbial species—such as the human gut microbiome—reference genomes are a priori unknown. Therefore, we develop a generalizable computational framework that combines sequencing reads from multiple microbes of the same species to generate a comprehensive list of reference genomes. By comparing individual microbes from the same species, we identify whether multiple strains coexist and coassemble their strain-resolved genomes. The resulting collection of high-quality strain-resolved genomes from a broad range of microbial taxa enables the ability to probe, in unprecedented detail, the genomic structure of the microbial community. RESULTS We apply Microbe-seq to seven gut microbiome samples collected from one human subject and acquire 21,914 single-amplified genomes (SAGs), which we coassemble into 76 species-level genomes, many from species that are difficult to culture. Ten of these species include multiple strains whose genomes we coassemble. We use these strain-resolved genomes to reconstruct the horizontal gene transfer (HGT) network of this microbiome; we find frequent exchange among Bacteroidetes species related to a mobile element carrying a Type-VI secretion system, which mediates inter-strain competition. Our droplet-based encapsulation also provides the opportunity to probe physical associations between individual microbes and colocalized bacteriophages. We find a significant host-phage association between crAssphage, the most abundant bacteriophage known in the human gut microbiome, and one particular strain of Bacteroides vulgatus. CONCLUSION We use Microbe-seq, combining microfluidic-droplet operation with tailored bioinformatic analysis, to achieve a strain-resolved survey of the genomic structure of a single person’s gut microbiome. Our methodology is general and immediately applicable to other complex microbial communities, such as the microbiomes in the soil and ocean. Applying our method to a broader human population and integrating Microbe-seq with other techniques, including functional screening, sorting, and long-read sequencing, could significantly enhance the understanding of the gut microbiome and its interaction with human health. Microbe-seq overview. Cells encapsulated individually at high throughput into droplets are lysed and resulting DNA amplified and barcoded. Pooled DNA sequencing yields single amplified genomes, which are clustered and coassembled into reference genomes of ~100 species. For multistrain species, assigning SAGs to constituent strains through SNPs enables coassembly of strain-resolved genomes, used to elucidate the HGT network and host-phage associations.

60 citations



Journal ArticleDOI
TL;DR: In this paper , the authors reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data.

54 citations


Journal ArticleDOI
TL;DR: RagTag as mentioned in this paper is a toolset for assembly scaffolding and patching for tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype developed to accelerate functional genomics and genome editing in tomato.
Abstract: Abstract Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species.

52 citations


Journal ArticleDOI
TL;DR: In this paper , the authors focus on technologies that can be adopted if exome sequencing is unrevealing and discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics and methyl profiling.
Abstract: Rare diseases affect 30 million people in the USA and more than 300-400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25-35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.

49 citations


Journal ArticleDOI
TL;DR: The 3.1 Gb haplotype-resolved (at 99.6% precision) chromosome-scale assembly of the potato cultivar 'Otava' based on high-quality long reads, single-cell sequencing of 717 pollen genomes and Hi-C data was reported in this article .
Abstract: Potato is the most widely produced tuber crop worldwide. However, reconstructing the four haplotypes of its autotetraploid genome remained an unsolved challenge. Here, we report the 3.1 Gb haplotype-resolved (at 99.6% precision), chromosome-scale assembly of the potato cultivar 'Otava' based on high-quality long reads, single-cell sequencing of 717 pollen genomes and Hi-C data. Unexpectedly, ~50% of the genome was identical-by-descent due to recent inbreeding, which was contrasted by highly abundant structural rearrangements involving ~20% of the genome. Among 38,214 genes, only 54% were present in all four haplotypes with an average of 3.2 copies per gene. Taking the leaf transcriptome as an example, 11% of the genes were differently expressed in at least one haplotype, where 25% of them were likely regulated through allele-specific DNA methylation. Our work sheds light on the recent breeding history of potato, the functional organization of its tetraploid genome and has the potential to strengthen the future of genomics-assisted breeding.

Journal ArticleDOI
TL;DR: In this article , the authors focus on technologies that can be adopted if exome sequencing is unrevealing and discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics and methyl profiling.
Abstract: Rare diseases affect 30 million people in the USA and more than 300-400 million worldwide, often causing chronic illness, disability, and premature death. Traditional diagnostic techniques rely heavily on heuristic approaches, coupling clinical experience from prior rare disease presentations with the medical literature. A large number of rare disease patients remain undiagnosed for years and many even die without an accurate diagnosis. In recent years, gene panels, microarrays, and exome sequencing have helped to identify the molecular cause of such rare and undiagnosed diseases. These technologies have allowed diagnoses for a sizable proportion (25-35%) of undiagnosed patients, often with actionable findings. However, a large proportion of these patients remain undiagnosed. In this review, we focus on technologies that can be adopted if exome sequencing is unrevealing. We discuss the benefits of sequencing the whole genome and the additional benefit that may be offered by long-read technology, pan-genome reference, transcriptomics, metabolomics, proteomics, and methyl profiling. We highlight computational methods to help identify regionally distant patients with similar phenotypes or similar genetic mutations. Finally, we describe approaches to automate and accelerate genomic analysis. The strategies discussed here are intended to serve as a guide for clinicians and researchers in the next steps when encountering patients with non-diagnostic exomes.

Journal ArticleDOI
TL;DR: ChromoMap as mentioned in this paper is an R package for the construction of interactive visualizations of chromosomes/chromosomal regions, mapping of any chromosomal feature with known coordinates (i.e., protein coding genes, transposable elements, non-coding RNAs, microsatellites, etc.).
Abstract: The recent advancements in high-throughput sequencing have resulted in the availability of annotated genomes, as well as of multi-omics data for many living organisms. This has increased the need for graphic tools that allow the concurrent visualization of genomes and feature-associated multi-omics data on single publication-ready plots.We present chromoMap, an R package, developed for the construction of interactive visualizations of chromosomes/chromosomal regions, mapping of any chromosomal feature with known coordinates (i.e., protein coding genes, transposable elements, non-coding RNAs, microsatellites, etc.), and chromosomal regional characteristics (i.e. genomic feature density, gene expression, DNA methylation, chromatin modifications, etc.) of organisms with a genome assembly. ChromoMap can also integrate multi-omics data (genomics, transcriptomics and epigenomics) in relation to their occurrence across chromosomes. ChromoMap takes tab-delimited files (BED like) or alternatively R objects to specify the genomic co-ordinates of the chromosomes and elements to annotate. Rendered chromosomes are composed of continuous windows of a given range, which, on hover, display detailed information about the elements annotated within that range. By adjusting parameters of a single function, users can generate a variety of plots that can either be saved as static image or as HTML documents.ChromoMap's flexibility allows for concurrent visualization of genomic data in each strand of a given chromosome, or of more than one homologous chromosome; allowing the comparison of multi-omic data between genotypes (e.g. species, varieties, etc.) or between homologous chromosomes of phased diploid/polyploid genomes. chromoMap is an extensive tool that can be potentially used in various bioinformatics analysis pipelines for genomic visualization of multi-omics data.

Journal ArticleDOI
TL;DR: The country of origin of all 444,829 human microbiome samples that are available from the world’s 3 largest genomic data repositories, including the Sequence Read Archive (SRA) are analyzed to demonstrate a critical need to ensure more global representation of participants in microbiome studies.
Abstract: The importance of sampling from globally representative populations has been well established in human genomics. In human microbiome research, however, we lack a full understanding of the global distribution of sampling in research studies. This information is crucial to better understand global patterns of microbiome-associated diseases and to extend the health benefits of this research to all populations. Here, we analyze the country of origin of all 444,829 human microbiome samples that are available from the world’s 3 largest genomic data repositories, including the Sequence Read Archive (SRA). The samples are from 2,592 studies of 19 body sites, including 220,017 samples of the gut microbiome. We show that more than 71% of samples with a known origin come from Europe, the United States, and Canada, including 46.8% from the US alone, despite the country representing only 4.3% of the global population. We also find that central and southern Asia is the most underrepresented region: Countries such as India, Pakistan, and Bangladesh account for more than a quarter of the world population but make up only 1.8% of human microbiome samples. These results demonstrate a critical need to ensure more global representation of participants in microbiome studies.

Journal ArticleDOI
TL;DR: This paper developed a CRISPR interference chemical-genetics platform to titrate the expression of mycobacterium tuberculosis (Mtb) genes and quantify bacterial fitness in the presence of different drugs.
Abstract: Mycobacterium tuberculosis (Mtb) infection is notoriously difficult to treat. Treatment efficacy is limited by Mtb's intrinsic drug resistance, as well as its ability to evolve acquired resistance to all antituberculars in clinical use. A deeper understanding of the bacterial pathways that influence drug efficacy could facilitate the development of more effective therapies, identify new mechanisms of acquired resistance, and reveal overlooked therapeutic opportunities. Here we developed a CRISPR interference chemical-genetics platform to titrate the expression of Mtb genes and quantify bacterial fitness in the presence of different drugs. We discovered diverse mechanisms of intrinsic drug resistance, unveiling hundreds of potential targets for synergistic drug combinations. Combining chemical genetics with comparative genomics of Mtb clinical isolates, we further identified several previously unknown mechanisms of acquired drug resistance, one of which is associated with a multidrug-resistant tuberculosis outbreak in South America. Lastly, we found that the intrinsic resistance factor whiB7 was inactivated in an entire Mtb sublineage endemic to Southeast Asia, presenting an opportunity to potentially repurpose the macrolide antibiotic clarithromycin to treat tuberculosis. This chemical-genetic map provides a rich resource to understand drug efficacy in Mtb and guide future tuberculosis drug development and treatment.

Journal ArticleDOI
TL;DR: The rapidly evolving spatial genomics landscape will enable generalized high-throughput genomic measurements and perturbations to be performed in the context of tissues, which will empower hypothesis generation and biological discovery and bridge the worlds of tissue biology and genomics.

Journal ArticleDOI
TL;DR: The California Conservation Genomics Project (CCGP) as mentioned in this paper is a state-of-the-art project that aims to identify geographic regions that are critical to long term preservation of California biodiversity, prioritize those regions based on defensible genomic criteria, and provide foundational knowledge that informs management strategies at both individual species and ecosystem levels.
Abstract: The California Conservation Genomics Project (CCGP) is a unique, critically important step forward in the use of comprehensive landscape genetic data to modernize natural resource management at a regional scale. We describe the CCGP, including all aspects of project administration, data collection, current progress, and future challenges. The CCGP will generate, analyze, and curate a single high-quality reference genome and 100-150 resequenced genomes for each of 153 species projects (representing 235 individual species) that span the ecological and phylogenetic breadth of California's marine, freshwater, and terrestrial ecosystems. The resulting portfolio of roughly 20,000 resequenced genomes will be analyzed with identical informatic and landscape genomic pipelines, providing a comprehensive overview of hotspots of within-species genomic diversity, potential and realized corridors connecting these hotspots, regions of reduced diversity requiring genetic rescue, and the distribution of variation critical for rapid climate adaptation. After two years of concerted effort, full funding ($12M USD) has been secured, species identified, and funds distributed to 68 laboratories and 114 investigators drawn from all 10 University of California campuses. The remaining phases of the CCGP include completion of data collection and analyses, and delivery of the resulting genomic data and inferences to state and federal regulatory agencies to help stabilize species declines. The aspirational goals of the CCGP are to identify geographic regions that are critical to long term preservation of California biodiversity, prioritize those regions based on defensible genomic criteria, and provide foundational knowledge that informs management strategies at both the individual species and ecosystem levels.

Journal ArticleDOI
TL;DR: In this paper , a reference genome for the Avena genus of oat has been presented, which reveals the mosaic structure of the oat genome, traces large-scale genomic reorganizations in the polyploidization history of Oat, and illustrates a breeding barrier associated with the genome architecture of OAT.
Abstract: Cultivated oat (Avena sativa L.) is an allohexaploid (AACCDD, 2n = 6x = 42) thought to have been domesticated more than 3,000 years ago while growing as a weed in wheat, emmer and barley fields in Anatolia1,2. Oat has a low carbon footprint, substantial health benefits and the potential to replace animal-based food products. However, the lack of a fully annotated reference genome has hampered efforts to deconvolute its complex evolutionary history and functional gene dynamics. Here we present a high-quality reference genome of A. sativa and close relatives of its diploid (Avena longiglumis, AA, 2n = 14) and tetraploid (Avena insularis, CCDD, 2n = 4x = 28) progenitors. We reveal the mosaic structure of the oat genome, trace large-scale genomic reorganizations in the polyploidization history of oat and illustrate a breeding barrier associated with the genome architecture of oat. We showcase detailed analyses of gene families implicated in human health and nutrition, which adds to the evidence supporting oat safety in gluten-free diets, and we perform mapping-by-sequencing of an agronomic trait related to water-use efficiency. This resource for the Avena genus will help to leverage knowledge from other cereal genomes, improve understanding of basic oat biology and accelerate genomics-assisted breeding and reanalysis of quantitative trait studies.

Journal ArticleDOI
TL;DR: ChromoMap as discussed by the authors is an R package for the construction of interactive visualizations of chromosomes/chromosomal regions, mapping of any chromosomal feature with known coordinates (i.e., protein coding genes, transposable elements, non-coding RNAs, microsatellites, etc.).
Abstract: The recent advancements in high-throughput sequencing have resulted in the availability of annotated genomes, as well as of multi-omics data for many living organisms. This has increased the need for graphic tools that allow the concurrent visualization of genomes and feature-associated multi-omics data on single publication-ready plots.We present chromoMap, an R package, developed for the construction of interactive visualizations of chromosomes/chromosomal regions, mapping of any chromosomal feature with known coordinates (i.e., protein coding genes, transposable elements, non-coding RNAs, microsatellites, etc.), and chromosomal regional characteristics (i.e. genomic feature density, gene expression, DNA methylation, chromatin modifications, etc.) of organisms with a genome assembly. ChromoMap can also integrate multi-omics data (genomics, transcriptomics and epigenomics) in relation to their occurrence across chromosomes. ChromoMap takes tab-delimited files (BED like) or alternatively R objects to specify the genomic co-ordinates of the chromosomes and elements to annotate. Rendered chromosomes are composed of continuous windows of a given range, which, on hover, display detailed information about the elements annotated within that range. By adjusting parameters of a single function, users can generate a variety of plots that can either be saved as static image or as HTML documents.ChromoMap's flexibility allows for concurrent visualization of genomic data in each strand of a given chromosome, or of more than one homologous chromosome; allowing the comparison of multi-omic data between genotypes (e.g. species, varieties, etc.) or between homologous chromosomes of phased diploid/polyploid genomes. chromoMap is an extensive tool that can be potentially used in various bioinformatics analysis pipelines for genomic visualization of multi-omics data.

Journal ArticleDOI
TL;DR: In this article , a reference genome for the well-known model species D. melanogaster was generated and the identification and analysis of transposable element variation as they are the most common type of structural variant.
Abstract: High quality reference genomes are crucial to understanding genome function, structure and evolution. The availability of reference genomes has allowed us to start inferring the role of genetic variation in biology, disease, and biodiversity conservation. However, analyses across organisms demonstrate that a single reference genome is not enough to capture the global genetic diversity present in populations. In this work, we generate 32 high-quality reference genomes for the well-known model species D. melanogaster and focus on the identification and analysis of transposable element variation as they are the most common type of structural variant. We show that integrating the genetic variation across natural populations from five climatic regions increases the number of detected insertions by 58%. Moreover, 26% to 57% of the insertions identified using long-reads were missed by short-reads methods. We also identify hundreds of transposable elements associated with gene expression variation and new TE variants likely to contribute to adaptive evolution in this species. Our results highlight the importance of incorporating the genetic variation present in natural populations to genomic studies, which is essential if we are to understand how genomes function and evolve.


Journal ArticleDOI
TL;DR: In this article , the authors argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.
Abstract: The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.

BookDOI
10 Feb 2022
TL;DR: The third edition has been thoroughly revised to include advances in genomics and contains new chapters on population genomics, genetic monitoring, and conservation genetics in practice, as well as new sections on climate change, emerging diseases, metagenomics, and more.
Abstract: Loss of biodiversity is among the greatest problems facing the world today. Conservation and the Genomics of Populations gives a comprehensive overview of the essential background, concepts, and tools needed to understand how genetic information can be used to conserve species threatened with extinction, and to manage species of ecological or commercial importance. New molecular techniques, statistical methods, and computer programs, genetic principles, and methods are becoming increasingly useful in the conservation of biological diversity. Using a balance of data and theory, coupled with basic and applied research examples, this book examines genetic and phenotypic variation in natural populations, the principles and mechanisms of evolutionary change, the interpretation of genetic data from natural populations, and how these can be applied to conservation. The book includes examples from plants, animals, and microbes in wild and captive populations. This third edition has been thoroughly revised to include advances in genomics and contains new chapters on population genomics, genetic monitoring, and conservation genetics in practice, as well as new sections on climate change, emerging diseases, metagenomics, and more. More than one-third of the references in this edition were published after the previous edition. Each of the 24 chapters and the Appendix end with a Guest Box written by an expert who provides an example of the principles presented in the chapter from their own work. This book is essential for advanced undergraduate and graduate students of conservation genetics, natural resource management, and conservation biology, as well as professional conservation biologists and policy-makers working for wildlife and habitat management agencies. Much of the book will also interest nonprofessionals who are curious about the role of genetics in conservation and management of wild and captive populations.

Proceedings ArticleDOI
28 Feb 2022
TL;DR: GenStore is proposed, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequenceAnalysis by exploiting low-cost and accurate in- storage filters.
Abstract: Read mapping is a fundamental step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). Read mapping is costly because it needs to perform approximate string matching (ASM) on large amounts of data. To address the computational challenges in genome analysis, many prior works propose various approaches such as accurate filters that select the reads within a dataset of genomic reads (called a read set) that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the amount of expensive computation, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different properties such as read lengths and error rates, which highly depend on the sequencing technology, and 2) different degrees of genetic variation compared to the reference genome, which highly depends on the genomes that are being compared. Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flash-based solid-state drive (SSD). Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern NAND flash-based SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05× (1.52-3.32×) for read sets with high similarity to the reference genome and 1.45-33.63× (2.70-19.2×) for read sets with low similarity to the reference genome.

Journal ArticleDOI
TL;DR: In this paper , the authors discuss how developments in machine learning methodologies contributed to more accurate base calling and lower error rates, and how these methods enable new biological discoveries, and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data.


Journal ArticleDOI
01 May 2022-Cell
TL;DR: In this paper , the authors summarize new discoveries made during the past decade in crop domestication and breeding, including the construction of crop genome maps and the functional characterization of numerous trait genes.

Journal ArticleDOI
TL;DR: In this paper , a review summarizes the current status with regard to the integration of high-throughput phenotyping and GWAS in plants, in addition to discussing the inherent challenges and future prospects.

Journal ArticleDOI
TL;DR: In this paper , a reference genome for the well-known model species D. melanogaster was generated and the identification and analysis of transposable element variation as they are the most common type of structural variant.
Abstract: High quality reference genomes are crucial to understanding genome function, structure and evolution. The availability of reference genomes has allowed us to start inferring the role of genetic variation in biology, disease, and biodiversity conservation. However, analyses across organisms demonstrate that a single reference genome is not enough to capture the global genetic diversity present in populations. In this work, we generate 32 high-quality reference genomes for the well-known model species D. melanogaster and focus on the identification and analysis of transposable element variation as they are the most common type of structural variant. We show that integrating the genetic variation across natural populations from five climatic regions increases the number of detected insertions by 58%. Moreover, 26% to 57% of the insertions identified using long-reads were missed by short-reads methods. We also identify hundreds of transposable elements associated with gene expression variation and new TE variants likely to contribute to adaptive evolution in this species. Our results highlight the importance of incorporating the genetic variation present in natural populations to genomic studies, which is essential if we are to understand how genomes function and evolve.

Journal ArticleDOI
TL;DR: A high-coverage whole-genome sequencing (WGS) dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases as mentioned in this paper .
Abstract: As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.