scispace - formally typeset
Search or ask a question

Showing papers on "Genomics published in 2021"


Journal ArticleDOI
Daniel Taliun1, Daniel N. Harris2, Michael D. Kessler2, Jedidiah Carlson1  +202 moreInstitutions (61)
10 Feb 2021-Nature
TL;DR: The Trans-Omics for Precision Medicine (TOPMed) project as discussed by the authors aims to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases.
Abstract: The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1 In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals) These rare variants provide insights into mutational processes and recent human evolutionary history The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 001% The goals, resources and design of the NHLBI Trans-Omics for Precision Medicine (TOPMed) programme are described, and analyses of rare variants detected in the first 53,831 samples provide insights into mutational processes and recent human evolutionary history

801 citations


Journal ArticleDOI
Arang Rhie1, Shane A. McCarthy2, Shane A. McCarthy3, Olivier Fedrigo4, Joana Damas5, Giulio Formenti4, Sergey Koren1, Marcela Uliano-Silva6, William Chow3, Arkarachai Fungtammasan, J. H. Kim7, Chul Hee Lee7, Byung June Ko7, Mark Chaisson8, Gregory Gedman4, Lindsey J. Cantin4, Françoise Thibaud-Nissen1, Leanne Haggerty9, Iliana Bista2, Iliana Bista3, Michelle Smith3, Bettina Haase4, Jacquelyn Mountcastle4, Sylke Winkler10, Sylke Winkler11, Sadye Paez4, Jason T. Howard, Sonja C. Vernes12, Sonja C. Vernes11, Sonja C. Vernes13, Tanya M. Lama14, Frank Grützner15, Wesley C. Warren16, Christopher N. Balakrishnan17, Dave W Burt18, Jimin George19, Matthew T. Biegler4, David Iorns, Andrew Digby, Daryl Eason, Bruce C. Robertson20, Taylor Edwards21, Mark Wilkinson22, George F. Turner23, Axel Meyer24, Andreas F. Kautt24, Andreas F. Kautt25, Paolo Franchini24, H. William Detrich26, Hannes Svardal27, Hannes Svardal28, Maximilian Wagner29, Gavin J. P. Naylor30, Martin Pippel11, Milan Malinsky31, Milan Malinsky3, Mark Mooney, Maria Simbirsky, Brett T. Hannigan, Trevor Pesout32, Marlys L. Houck33, Ann C Misuraca33, Sarah B. Kingan34, Richard Hall34, Zev N. Kronenberg34, Ivan Sović34, Christopher Dunn34, Zemin Ning3, Alex Hastie, Joyce V. Lee, Siddarth Selvaraj, Richard E. Green32, Nicholas H. Putnam, Ivo Gut35, Jay Ghurye36, Erik Garrison32, Ying Sims3, Joanna Collins3, Sarah Pelan3, James Torrance3, Alan Tracey3, Jonathan Wood3, Robel E. Dagnew8, Dengfeng Guan37, Dengfeng Guan2, Sarah E. London38, David F. Clayton19, Claudio V. Mello39, Samantha R. Friedrich39, Peter V. Lovell39, Ekaterina Osipova11, Farooq O. Al-Ajli40, Farooq O. Al-Ajli41, Simona Secomandi42, Heebal Kim7, Constantina Theofanopoulou4, Michael Hiller43, Yang Zhou, Robert S. Harris44, Kateryna D. Makova44, Paul Medvedev44, Jinna Hoffman1, Patrick Masterson1, Karen Clark1, Fergal J. Martin9, Kevin L. Howe9, Paul Flicek9, Brian P. Walenz1, Woori Kwak, Hiram Clawson32, Mark Diekhans32, Luis R Nassar32, Benedict Paten32, Robert H. S. Kraus11, Robert H. S. Kraus24, Andrew J. Crawford45, M. Thomas P. Gilbert46, M. Thomas P. Gilbert47, Guojie Zhang, Byrappa Venkatesh48, Robert W. Murphy49, Klaus-Peter Koepfli50, Beth Shapiro32, Beth Shapiro51, Warren E. Johnson52, Warren E. Johnson50, Federica Di Palma53, Tomas Marques-Bonet, Emma C. Teeling54, Tandy Warnow55, Jennifer A. Marshall Graves56, Oliver A. Ryder57, Oliver A. Ryder33, David Haussler32, Stephen J. O'Brien58, Jonas Korlach34, Harris A. Lewin5, Kerstin Howe3, Eugene W. Myers11, Eugene W. Myers10, Richard Durbin2, Richard Durbin3, Adam M. Phillippy1, Erich D. Jarvis51, Erich D. Jarvis4 
National Institutes of Health1, University of Cambridge2, Wellcome Trust Sanger Institute3, Rockefeller University4, University of California, Davis5, Leibniz Association6, Seoul National University7, University of Southern California8, European Bioinformatics Institute9, Dresden University of Technology10, Max Planck Society11, University of St Andrews12, Radboud University Nijmegen13, University of Massachusetts Amherst14, University of Adelaide15, University of Missouri16, East Carolina University17, University of Queensland18, Clemson University19, University of Otago20, University of Arizona21, Natural History Museum22, Bangor University23, University of Konstanz24, Harvard University25, Northeastern University26, University of Antwerp27, National Museum of Natural History28, University of Graz29, University of Florida30, University of Basel31, University of California, Santa Cruz32, Zoological Society of San Diego33, Pacific Biosciences34, Pompeu Fabra University35, University of Maryland, College Park36, Harbin Institute of Technology37, University of Chicago38, Oregon Health & Science University39, Monash University Malaysia Campus40, Qatar Airways41, University of Milan42, Goethe University Frankfurt43, Pennsylvania State University44, University of Los Andes45, University of Copenhagen46, Norwegian University of Science and Technology47, Agency for Science, Technology and Research48, Royal Ontario Museum49, Smithsonian Institution50, Howard Hughes Medical Institute51, Walter Reed Army Institute of Research52, University of East Anglia53, University College Dublin54, University of Illinois at Urbana–Champaign55, La Trobe University56, University of California, San Diego57, Nova Southeastern University58
28 Apr 2021-Nature
TL;DR: The Vertebrate Genomes Project (VGP) as mentioned in this paper is an international effort to generate high quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Abstract: High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

647 citations



Journal ArticleDOI
19 Jan 2021-Mbio
TL;DR: In this paper, the authors used a pipeline for single nucleotide variant calling in a metagenomic context, characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California.
Abstract: Viral genome sequencing has guided our understanding of the spread and extent of genetic diversity of SARS-CoV-2 during the COVID-19 pandemic. SARS-CoV-2 viral genomes are usually sequenced from nasopharyngeal swabs of individual patients to track viral spread. Recently, RT-qPCR of municipal wastewater has been used to quantify the abundance of SARS-CoV-2 in several regions globally. However, metatranscriptomic sequencing of wastewater can be used to profile the viral genetic diversity across infected communities. Here, we sequenced RNA directly from sewage collected by municipal utility districts in the San Francisco Bay Area to generate complete and nearly complete SARS-CoV-2 genomes. The major consensus SARS-CoV-2 genotypes detected in the sewage were identical to clinical genomes from the region. Using a pipeline for single nucleotide variant calling in a metagenomic context, we characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California. Observed wastewater variants were more similar to local California patient-derived genotypes than they were to those from other regions within the United States or globally. Additional variants detected in wastewater have only been identified in genomes from patients sampled outside California, indicating that wastewater sequencing can provide evidence for recent introductions of viral lineages before they are detected by local clinical sequencing. These results demonstrate that epidemiological surveillance through wastewater sequencing can aid in tracking exact viral strains in an epidemic context.

235 citations


Journal ArticleDOI
07 Apr 2021-Nature
TL;DR: In this article, the activity-by-contact (ABC) model was applied to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants.
Abstract: Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer–gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions. Mapping enhancer regulation across human cell types and tissues illuminates genome function and provides a resource to connect risk variants for common diseases to their molecular and cellular functions.

233 citations


Journal ArticleDOI
07 Apr 2021-Nature
TL;DR: In this article, the authors used complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere.
Abstract: The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence. The complete assembly of human chromosome 8 resolves previous gaps and reveals hidden complex forms of genetic variation, enabling functional and evolutionary characterization of primate centromeres.

174 citations


Journal ArticleDOI
01 Jan 2021-Genomics
TL;DR: The most critical findings related to the genetics of the SARS-CoV-2 are reviewed, with a specific focus on genetic diversity and reported mutations, molecular-based diagnosis assays, using interfering RNA technology for the treatment of patients, and genetic-related vaccination strategies.

123 citations


Posted ContentDOI
Sergey Nurk1, Sergey Koren1, Arang Rhie1, Rautiainen M1, Andrey Bzikadze2, Alla Mikheenko3, Mitchell R. Vollger4, Nicolas Altemose5, Lev Uralsky, Ariel Gershman6, Sergey Aganezov6, Hoyt Sj7, Mark Diekhans8, Glennis A. Logsdon4, Michael Alonge6, Stylianos E. Antonarakis9, Borchers M10, Gerry Bouffard1, Shelise Brooks1, Caldas Gv5, Hwei-Ling Cheng11, Chen-Shan Chin, William Chow12, de Lima Lg10, Philip C. Dishuck4, Richard Durbin13, Tatiana Dvorkina3, Ian T. Fiddes, Giulio Formenti14, Robert S. Fulton15, Arkarachai Fungtammasan, Erik Garrison16, P. G. S. Grady7, Tina A. Graves-Lindsay15, Ira M. Hall17, Nancy F. Hansen1, Gabrielle A. Hartley7, Marina Haukness8, Kerstin Howe12, Michael W. Hunkapiller18, Chirag Jain1, Miten Jain8, Erich D. Jarvis14, Peter Kerpedjiev, Melanie Kirsche6, Mikhail Kolmogorov2, Jonas Korlach18, Milinn Kremitzki15, Huiyan Li11, Valerie Maduro1, Tobias Marschall19, Ann McCartney1, Jennifer McDaniel20, Danny E. Miller4, Jim C. Mullikin1, Eugene W. Myers21, Nathan D. Olson20, Benedict Paten8, Paul Peluso18, Pavel A. Pevzner2, David Porubsky4, Tamara A. Potapova10, Evgeny I. Rogaev, Jill A. Rosenfeld, Steven L. Salzberg6, Valerie A. Schneider1, Fritz J. Sedlazeck22, Kishwar Shafin8, Colin J. Shew23, Alaina Shumate6, Ying Sims12, Smit Afa24, Daniela C. Soto23, Ivan Sović18, Jessica M. Storer24, Aaron M. Streets5, Beth A. Sullivan25, Françoise Thibaud-Nissen1, James Torrance12, Justin Wagner20, Brian P. Walenz1, Aaron M. Wenger18, Wood Jmd12, Chunlin Xiao1, Stephanie M Yan6, Alice Young1, Samantha Zarate6, Urvashi Surti26, Rajiv C. McCoy6, Megan Y. Dennis23, Ivan Alexandrov27, Ivan Alexandrov3, Jennifer L. Gerton10, Rachel J. O’Neill7, Winston Timp6, Justin M. Zook20, Michael C. Schatz6, Evan E. Eichler4, Karen H. Miga8, Adam M. Phillippy1 
27 May 2021-bioRxiv
TL;DR: The T2T-CHM13 reference as mentioned in this paper contains gapless assemblies for all 22 autosomes plus Chromosome X, corrected numerous errors, and introduced nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding.
Abstract: In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of the human genome, which revolutionized the field of genomics. While these drafts and the updates that followed effectively covered the euchromatic fraction of the genome, the heterochromatin and many other complex regions were left unfinished or erroneous. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base pair (bp) sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. The new T2T-CHM13 reference includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding. The newly completed regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies for the first time.

108 citations


Journal ArticleDOI
TL;DR: The use of single-cell sequencing in cancer research has revolutionized our understanding of the biological characteristics and dynamics within cancer lesions, including information related to the landscapes of malignant cells and immune cells, tumor heterogeneity, circulating tumor cells and underlying mechanisms of tumor biological behaviors as mentioned in this paper.
Abstract: Single-cell sequencing, including genomics, transcriptomics, epigenomics, proteomics and metabolomics sequencing, is a powerful tool to decipher the cellular and molecular landscape at a single-cell resolution, unlike bulk sequencing, which provides averaged data. The use of single-cell sequencing in cancer research has revolutionized our understanding of the biological characteristics and dynamics within cancer lesions. In this review, we summarize emerging single-cell sequencing technologies and recent cancer research progress obtained by single-cell sequencing, including information related to the landscapes of malignant cells and immune cells, tumor heterogeneity, circulating tumor cells and the underlying mechanisms of tumor biological behaviors. Overall, the prospects of single-cell sequencing in facilitating diagnosis, targeted therapy and prognostic prediction among a spectrum of tumors are bright. In the near future, advances in single-cell sequencing will undoubtedly improve our understanding of the biological characteristics of tumors and highlight potential precise therapeutic targets for patients.

103 citations



Journal ArticleDOI
TL;DR: In this article, the authors review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.
Abstract: Crop genomics has seen dramatic advances in recent years due to improvements in sequencing technology, assembly methods, and computational resources. These advances have led to the development of new tools to facilitate crop improvement. The study of structural variation within species and the characterization of the pan-genome has revealed extensive genome content variation among individuals within a species that is paradigm shifting to crop genomics and improvement. Here, we review advances in crop genomics and how utilization of these tools is shifting in light of pan-genomes that are becoming available for many crop species.

Journal ArticleDOI
17 Feb 2021-PLOS ONE
TL;DR: In this article, the authors used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2.
Abstract: The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance, and determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could detect SARS-CoV-2 in 21 samples and 16 samples which were classified inconclusive and pan-sarbeco positive respectively suggesting that COVIDSeq could be used as a confirmatory test. The sequencing approach also enabled insights into the evolution and genetic epidemiology of the SARS-CoV-2 samples. The samples were classified into a total of 3 clades. This study reports two lineages B.1.112 and B.1.99 for the first time in India. This study also revealed 1,143 unique single nucleotide variants and added a total of 73 novel variants identified for the first time. To the best of our knowledge, this is the first report of the COVIDSeq approach for detection and genetic epidemiology of SARS-CoV-2. Our analysis suggests that COVIDSeq could be a potential high sensitivity assay for the detection of SARS-CoV-2, with an additional advantage of enabling the genetic epidemiology of SARS-CoV-2.

Journal ArticleDOI
TL;DR: The importance of further basic research to assess the safety of genome editing techniques in human embryos will inform debates about the potential clinical use of this technology, and are consistent with recent findings indicating complexity at on-target sites following CRISPR-Cas9 genome editing.
Abstract: CRISPR-Cas9 genome editing is a promising technique for clinical applications, such as the correction of disease-associated alleles in somatic cells. The use of this approach has also been discussed in the context of heritable editing of the human germ line. However, studies assessing gene correction in early human embryos report low efficiency of mutation repair, high rates of mosaicism, and the possibility of unintended editing outcomes that may have pathologic consequences. We developed computational pipelines to assess single-cell genomics and transcriptomics datasets from OCT4 (POU5F1) CRISPR-Cas9-targeted and control human preimplantation embryos. This allowed us to evaluate on-target mutations that would be missed by more conventional genotyping techniques. We observed loss of heterozygosity in edited cells that spanned regions beyond the POU5F1 on-target locus, as well as segmental loss and gain of chromosome 6, on which the POU5F1 gene is located. Unintended genome editing outcomes were present in ∼16% of the human embryo cells analyzed and spanned 4-20 kb. Our observations are consistent with recent findings indicating complexity at on-target sites following CRISPR-Cas9 genome editing. Our work underscores the importance of further basic research to assess the safety of genome editing techniques in human embryos, which will inform debates about the potential clinical use of this technology.

Journal ArticleDOI
TL;DR: The Aging Atlas database aims to provide a wide range of life science researchers with valuable resources that allow access to a large-scale of gene expression and regulation datasets created by various high-throughput omics technologies.
Abstract: Abstract Organismal aging is driven by interconnected molecular changes encompassing internal and extracellular factors. Combinational analysis of high-throughput ‘multi-omics’ datasets (gathering information from genomics, epigenomics, transcriptomics, proteomics, metabolomics and pharmacogenomics), at either populational or single-cell levels, can provide a multi-dimensional, integrated profile of the heterogeneous aging process with unprecedented throughput and detail. These new strategies allow for the exploration of the molecular profile and regulatory status of gene expression during aging, and in turn, facilitate the development of new aging interventions. With a continually growing volume of valuable aging-related data, it is necessary to establish an open and integrated database to support a wide spectrum of aging research. The Aging Atlas database aims to provide a wide range of life science researchers with valuable resources that allow access to a large-scale of gene expression and regulation datasets created by various high-throughput omics technologies. The current implementation includes five modules: transcriptomics (RNA-seq), single-cell transcriptomics (scRNA-seq), epigenomics (ChIP-seq), proteomics (protein–protein interaction), and pharmacogenomics (geroprotective compounds). Aging Atlas provides user-friendly functionalities to explore age-related changes in gene expression, as well as raw data download services. Aging Atlas is freely available at https://bigd.big.ac.cn/aging/index.

Journal ArticleDOI
28 May 2021-Science
TL;DR: In this paper, the authors investigated genome folding across the eukaryotic tree of life and found two types of three-dimensional (3D) genome architectures at the chromosome scale, each of which appears and disappears repeatedly during evolutionary evolution.
Abstract: We investigated genome folding across the eukaryotic tree of life. We find two types of three-dimensional (3D) genome architectures at the chromosome scale. Each type appears and disappears repeatedly during eukaryotic evolution. The type of genome architecture that an organism exhibits correlates with the absence of condensin II subunits. Moreover, condensin II depletion converts the architecture of the human genome to a state resembling that seen in organisms such as fungi or mosquitoes. In this state, centromeres cluster together at nucleoli, and heterochromatin domains merge. We propose a physical model in which lengthwise compaction of chromosomes by condensin II during mitosis determines chromosome-scale genome architecture, with effects that are retained during the subsequent interphase. This mechanism likely has been conserved since the last common ancestor of all eukaryotes.

Journal ArticleDOI
21 Jan 2021
TL;DR: Differences in the methods commonly used to determine chromatin states in different cell types are discussed, including ATAC-seq and ChIP–seq, and technological improvements including single-molecule, multi-omics and spatial methods will bring further insight into the secrets of genome regulation.
Abstract: Chromatin accessibility, or the physical access to chromatinized DNA, is a widely studied characteristic of the eukaryotic genome. As active regulatory DNA elements are generally ‘accessible’, the genome-wide profiling of chromatin accessibility can be used to identify candidate regulatory genomic regions in a tissue or cell type. Multiple biochemical methods have been developed to profile chromatin accessibility, both in bulk and at the single-cell level. Depending on the method, enzymatic cleavage, transposition or DNA methyltransferases are used, followed by high-throughput sequencing, providing a view of genome-wide chromatin accessibility. In this Primer, we discuss these biochemical methods, as well as bioinformatics tools for analysing and interpreting the generated data, and insights into the key regulators underlying developmental, evolutionary and disease processes. We outline standards for data quality, reproducibility and deposition used by the genomics community. Although chromatin accessibility profiling is invaluable to study gene regulation, alone it provides only a partial view of this complex process. Orthogonal assays facilitate the interpretation of accessible regions with respect to enhancer–promoter proximity, functional transcription factor binding and regulatory function. We envision that technological improvements including single-molecule, multi-omics and spatial methods will bring further insight into the secrets of genome regulation. This Primer on chromatin accessibility profiling methods discusses differences in the methods commonly used to determine chromatin states in different cell types, including ATAC-seq and ChIP–seq. The authors summarize applications in different areas of research, from single cells to tissues and whole organisms.

Posted ContentDOI
19 Nov 2021-bioRxiv
TL;DR: RagTag as discussed by the authors is a toolset for assembly scaffolding and patching for tomato genotype M82 along with Sweet-100, a rapid-cycling genotype that was developed to accelerate functional genomics and genome editing.
Abstract: Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a rapid-cycling genotype that we developed to accelerate functional genomics and genome editing. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species.

Journal ArticleDOI
10 Nov 2021-Nature
TL;DR: In this article, a pan-genome of 3,171 cultivated and 195 wild accessions of Cicer arietinum was constructed to provide publicly available resources for chickpea genomics research and breeding.
Abstract: Zero hunger and good health could be realized by 2030 through effective conservation, characterization and utilization of germplasm resources1. So far, few chickpea (Cicer arietinum) germplasm accessions have been characterized at the genome sequence level2. Here we present a detailed map of variation in 3,171 cultivated and 195 wild accessions to provide publicly available resources for chickpea genomics research and breeding. We constructed a chickpea pan-genome to describe genomic diversity across cultivated chickpea and its wild progenitor accessions. A divergence tree using genes present in around 80% of individuals in one species allowed us to estimate the divergence of Cicer over the last 21 million years. Our analysis found chromosomal segments and genes that show signatures of selection during domestication, migration and improvement. The chromosomal locations of deleterious mutations responsible for limited genetic diversity and decreased fitness were identified in elite germplasm. We identified superior haplotypes for improvement-related traits in landraces that can be introgressed into elite breeding lines through haplotype-based breeding, and found targets for purging deleterious alleles through genomics-assisted breeding and/or gene editing. Finally, we propose three crop breeding strategies based on genomic prediction to enhance crop productivity for 16 traits while avoiding the erosion of genetic diversity through optimal contribution selection (OCS)-based pre-breeding. The predicted performance for 100-seed weight, an important yield-related trait, increased by up to 23% and 12% with OCS- and haplotype-based genomic approaches, respectively. Whole-genome sequencing of 3,171 cultivated and 195 wild chickpea accessions is used to construct a chickpea pan-genome, providing insight into chickpea evolution and enabling breeding strategies that could improve crop productivity.

Journal ArticleDOI
TL;DR: PhycoCosm provides integration of genome sequence and annotation for >100 algal genomes with available multi-omics data and interactive web-based tools to enable algal research in bioenergy and the environment, encouraging community engagement and data exchange, and fostering new sequencing projects that will further these research goals.
Abstract: Algae are a diverse, polyphyletic group of photosynthetic eukaryotes spanning nearly all eukaryotic lineages of life and collectively responsible for ∼50% of photosynthesis on Earth. Sequenced algal genomes, critical to understanding their complex biology, are growing in number and require efficient tools for analysis. PhycoCosm (https://phycocosm.jgi.doe.gov) is an algal multi-omics portal, developed by the US Department of Energy Joint Genome Institute to support analysis and distribution of algal genome sequences and other 'omics' data. PhycoCosm provides integration of genome sequence and annotation for >100 algal genomes with available multi-omics data and interactive web-based tools to enable algal research in bioenergy and the environment, encouraging community engagement and data exchange, and fostering new sequencing projects that will further these research goals.

Journal ArticleDOI
19 Jul 2021-eLife
TL;DR: The authors used Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups.
Abstract: Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.


Journal ArticleDOI
TL;DR: The past year is reviewed in the context of the phylogenetic analysis of variants isolated over the course of the pandemic in India and the importance of continued sequencing-based surveillance in the country is highlighted.
Abstract: Since its emergence as a pneumonia-like outbreak in the Chinese city of Wuhan in late 2019, the novel coronavirus disease COVID-19 has spread widely to become a global pandemic. The first case of COVID-19 in India was reported on 30 January 2020 and since then it has affected more than ten million people and resulted in around 150,000 deaths in the country. Over time, the viral genome has accumulated mutations as it passes through its human hosts, a common evolutionary mechanism found in all microorganisms. This has implications for disease surveillance and management, vaccines and therapeutics, and the emergence of reinfections. Sequencing the viral genome can help monitor these changes and provides an extraordinary opportunity to understand the genetic epidemiology and evolution of the virus as well as tracking its spread in a population. Here we review the past year in the context of the phylogenetic analysis of variants isolated over the course of the pandemic in India and highlight the importance of continued sequencing-based surveillance in the country.

Journal ArticleDOI
15 Feb 2021-eLife
TL;DR: In this article, the UK Biobank GWAS results for three molecular traits-urate, IGF-1, and testosterone-with better-understood biology than most other complex traits.
Abstract: Genome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. We describe UK Biobank GWAS results for three molecular traits-urate, IGF-1, and testosterone-with better-understood biology than most other complex traits. We find that many of the most significant hits are readily interpretable. We observe huge enrichment of associations near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of each trait, including differences in testosterone regulation between females and males. At the same time, even these molecular traits are highly polygenic, with many thousands of variants spread across the genome contributing to trait variance. In summary, for these three molecular traits we identify strong enrichment of signal in putative core gene sets, even while most of the SNP-based heritability is driven by a massively polygenic background.

Journal ArticleDOI
TL;DR: The authors in this article analyzed 4907 Circular Metagenome Assembled Genomes (cMAGs) of putative viruses from human gut microbiomes and identified nearly 600 genomes of crAss-like phages that account for nearly 87% of the DNA reads mapped to these cMAGs.
Abstract: CrAssphage is the most abundant human-associated virus and the founding member of a large group of bacteriophages, discovered in animal-associated and environmental metagenomes, that infect bacteria of the phylum Bacteroidetes. We analyze 4907 Circular Metagenome Assembled Genomes (cMAGs) of putative viruses from human gut microbiomes and identify nearly 600 genomes of crAss-like phages that account for nearly 87% of the DNA reads mapped to these cMAGs. Phylogenetic analysis of conserved genes demonstrates the monophyly of crAss-like phages, a putative virus order, and of 5 branches, potential families within that order, two of which have not been identified previously. The phage genomes in one of these families are almost twofold larger than the crAssphage genome (145-192 kilobases), with high density of self-splicing introns and inteins. Many crAss-like phages encode suppressor tRNAs that enable read-through of UGA or UAG stop-codons, mostly, in late phage genes. A distinct feature of the crAss-like phages is the recurrent switch of the phage DNA polymerase type between A and B families. Thus, comparative genomic analysis of the expanded assemblage of crAss-like phages reveals aspects of genome architecture and expression as well as phage biology that were not apparent from the previous work on phage genomics. Here, the authors analyze 4907 Circular Metagenome Assembled Genomes from human microbiomes and identify and characterize nearly 600 diverse genomes of crAss-like phages, finding two putative families with unusual genomic features, including high density of self-splicing introns and inteins.

Journal ArticleDOI
TL;DR: The findings suggest that the virion's genotype and phenotype in a specific population should be considered in developing diagnostic tools and treatment options.
Abstract: The ongoing pandemic caused by a novel coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), affects thousands of people every day worldwide. Hence, drugs and vaccines effective against all variants of SARS-CoV-2 are crucial today. Viral genome mutations exist commonly which may impact the encoded proteins, possibly resulting to varied effectivity of detection tools and disease treatment. Thus, this study surveyed the SARS-CoV-2 genome and proteome and evaluated its mutation characteristics. Phylogenetic analyses of SARS-CoV-2 genes and proteins show three major clades and one minor clade (P6810S; ORF1ab). The overall frequency and densities of mutations in the genes and proteins of SARS-CoV-2 were observed. Nucleocapsid exhibited the highest mutation density among the structural proteins while the spike D614G was the most common, occurring mostly in genomes outside China and United States. ORF8 protein had the highest mutation density across all geographical areas. Moreover, mutation hotspots neighboring and at the catalytic site of RNA-dependent RNA polymerase were found that might challenge the binding and effectivity of remdesivir. Mutation coldspots may present as conserved diagnostic and therapeutic targets were found in ORF7b, ORF9b, and ORF14. These findings suggest that the virion's genotype and phenotype in a specific population should be considered in developing diagnostic tools and treatment options.

Journal ArticleDOI
TL;DR: In this article, the authors highlight these technologies, characteristic features and suitable applications in precision oncology, and highlight the potential applications of these technologies in precision cancer diagnosis and treatment.
Abstract: RNA sequencing (RNAseq) can reveal gene fusions, splicing variants, mutations/indels in addition to differential gene expression, thus providing a more complete genetic picture than DNA sequencing. This most widely used technology in genomics tool box has evolved from classic bulk RNA sequencing (RNAseq), popular single cell RNA sequencing (scRNAseq) to newly emerged spatial RNA sequencing (spRNAseq). Bulk RNAseq studies average global gene expression, scRNAseq investigates single cell RNA biology up to 20,000 individual cells simultaneously, while spRNAseq has ability to dissect RNA activities spatially, representing next generation of RNA sequencing. This article highlights these technologies, characteristic features and suitable applications in precision oncology.

Journal ArticleDOI
TL;DR: Combining comparative genome analysis, multi-omics analysis, and metabolic gene-cluster analysis, a working model for MIA evolution, and a pangenome for MIA biosynthesis are proposed, which will help in establishing a sustainable supply of camptothecin.
Abstract: Plant genomes remain highly fragmented and are often characterized by hundreds to thousands of assembly gaps. Here, we report chromosome-level reference and phased genome assembly of Ophiorrhiza pumila, a camptothecin-producing medicinal plant, through an ordered multi-scaffolding and experimental validation approach. With 21 assembly gaps and a contig N50 of 18.49 Mb, Ophiorrhiza genome is one of the most complete plant genomes assembled to date. We also report 273 nitrogen-containing metabolites, including diverse monoterpene indole alkaloids (MIAs). A comparative genomics approach identifies strictosidine biogenesis as the origin of MIA evolution. The emergence of strictosidine biosynthesis-catalyzing enzymes precede downstream enzymes’ evolution post γ whole-genome triplication, which occurred approximately 110 Mya in O. pumila, and before the whole-genome duplication in Camptotheca acuminata identified here. Combining comparative genome analysis, multi-omics analysis, and metabolic gene-cluster analysis, we propose a working model for MIA evolution, and a pangenome for MIA biosynthesis, which will help in establishing a sustainable supply of camptothecin. Ophiorrhiza pumila is a medicinal plant that can produce the anti-cancer monoterpene indole alkaloid (MIA) camptothecin. Here, the authors report its genome assembly and propose a working model for MIA evolution and biosynthesis through comparative genomics, synteny, and metabolic gene cluster analyses.

Journal ArticleDOI
Daniel L. McCartney1, Josine L. Min2, Rebecca C Richmond2, Ake T. Lu3, Maria K. Sobczyk2, Gail Davies1, Linda Broer4, Xiuqing Guo5, Ayoung Jeong6, Ayoung Jeong7, Jeesun Jung8, Silva Kasela9, Seyma Katrinli10, Pei-Lun Kuo8, Pamela R. Matias-Garcia11, Pashupati P. Mishra, Marianne Nygaard12, Marianne Nygaard13, Teemu Palviainen14, Amit Patki15, Laura M. Raffield16, Scott M. Ratliff17, Tom G. Richardson2, Oliver Robinson18, Mette Soerensen13, Mette Soerensen12, Dianjianyi Sun19, Pei-Chien Tsai20, Pei-Chien Tsai21, Pei-Chien Tsai22, Matthijs D. van der Zee23, Matthijs D. van der Zee24, Rosie M. Walker1, Xiaochuan Wang25, Yunzhang Wang26, Rui Xia27, Zongli Xu8, Jie Yao5, Wei Zhao17, Adolfo Correa28, Eric Boerwinkle27, Pierre Antoine Dugué25, Pierre Antoine Dugué29, Pierre Antoine Dugué30, Peter Durda31, Hannah R Elliott2, Christian Gieger, Eco J. C. de Geus23, Eco J. C. de Geus24, Sarah E. Harris1, Gibran Hemani2, Medea Imboden7, Medea Imboden6, Mika Kähönen32, Sharon L.R. Kardia17, Jacob K. Kresovich8, Shengxu Li, Kathryn L. Lunetta33, Massimo Mangino34, Massimo Mangino22, Dan Mason35, Andrew M. McIntosh1, Jonas Mengel-From13, Jonas Mengel-From12, Ann Zenobia Moore8, Joanne M. Murabito33, Miina Ollikainen14, James S. Pankow36, Nancy L. Pedersen26, Annette Peters, Silvia Polidoro18, David J. Porteous1, Olli T. Raitakari37, Olli T. Raitakari38, Stephen S. Rich39, Dale P. Sandler8, Elina Sillanpää14, Elina Sillanpää40, Alicia K. Smith10, Melissa C. Southey25, Melissa C. Southey30, Melissa C. Southey29, Konstantin Strauch41, Konstantin Strauch42, Hemant K. Tiwari15, Toshiko Tanaka8, Therese Tillin, André G. Uitterlinden4, David Van Den Berg43, Jenny van Dongen24, Jenny van Dongen23, James G. Wilson28, James G. Wilson44, John Wright35, Idil Yet45, Idil Yet22, Donna K. Arnett46, Stefania Bandinelli, Jordana T. Bell22, Alexandra M. Binder3, Dorret I. Boomsma24, Dorret I. Boomsma23, Wei Chen47, Kaare Christensen12, Kaare Christensen13, Karen N. Conneely10, Paul Elliott18, Luigi Ferrucci8, Myriam Fornage27, Sara Hägg26, Caroline Hayward1, Marguerite R. Irvin15, Jaakko Kaprio14, Debbie A Lawlor2, Terho Lehtimäki, Falk W. Lohoff8, Lili Milani9, Roger L. Milne25, Roger L. Milne29, Roger L. Milne30, Nicole Probst-Hensch6, Nicole Probst-Hensch7, Alexander P. Reiner48, Beate Ritz3, Jerome I. Rotter5, Jennifer A. Smith17, Jack A. Taylor8, Joyce B. J. van Meurs4, Paolo Vineis18, Melanie Waldenberger, Ian J. Deary1, Caroline L Relton2, Steve Horvath3, Riccardo E. Marioni1 
TL;DR: In this article, the authors identify 137 genome-wide significant loci, of which 113 are novel, from genomewide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels.
Abstract: Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality. Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field. Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function. Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs. A polygenic score for GrimAge acceleration showed strong associations with adiposity-related traits, educational attainment, parental longevity, and C-reactive protein levels. This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity.

Journal ArticleDOI
TL;DR: In this paper, the authors report the creation and annotation of a chromosome-level assembly for C. gigas, which was then scaffolded into 10 pseudo-chromosomes using both Hi-C sequencing and a high-density linkage map.
Abstract: Background The Pacific oyster (Crassostrea gigas) is a bivalve mollusc with vital roles in coastal ecosystems and aquaculture globally. While extensive genomic tools are available for C. gigas, highly contiguous reference genomes are required to support both fundamental and applied research. Herein we report the creation and annotation of a chromosome-level assembly for C. gigas. Findings High-coverage long- and short-read sequence data generated on Pacific Biosciences and Illumina platforms were used to generate an initial assembly, which was then scaffolded into 10 pseudo-chromosomes using both Hi-C sequencing and a high-density linkage map. The assembly has a scaffold N50 of 58.4 Mb and a contig N50 of 1.8 Mb, representing a step advance on the previously published C. gigas assembly. Annotation based on Pacific Biosciences Iso-Seq and Illumina RNA-Seq resulted in identification of ∼30,000 putative protein-coding genes. Annotation of putative repeat elements highlighted an enrichment of Helitron rolling-circle transposable elements, suggesting their potential role in shaping the evolution of the C. gigas genome. Conclusions This new chromosome-level assembly will be an enabling resource for genetics and genomics studies to support fundamental insight into bivalve biology, as well as for selective breeding of C. gigas in aquaculture.

Journal ArticleDOI
TL;DR: A review of the latest approaches for CRISPR-based functional genomics screens, including the adoption of single-cell transcriptomic readout and applications in characterizing the non-coding genome and mapping genetic interactions at scale, can be found in this paper.
Abstract: The past 25 years of genomics research first revealed which genes are encoded by the human genome and then a detailed catalogue of human genome variation associated with many diseases. Despite this, the function of many genes and gene regulatory elements remains poorly characterized, which limits our ability to apply these insights to human disease. The advent of new CRISPR functional genomics tools allows for scalable and multiplexable characterization of genes and gene regulatory elements encoded by the human genome. These approaches promise to reveal mechanisms of gene function and regulation, and to enable exploration of how genes work together to modulate complex phenotypes. In this Review, Przybyla and Gilbert describe the latest approaches for CRISPR-based functional genomics screens, including the adoption of single-cell transcriptomic read-outs and applications in characterizing the non-coding genome and mapping genetic interactions at scale.