Showing papers on "Genome published in 2019"

PDF

Open Access

Journal Article•DOI•

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

[...]

Daehwan Kim¹, Joseph M. Paggi², Chanhee Park¹, Christopher Bennett¹, Steven L. Salzberg³ - Show less +1 more•Institutions (3)

University of Texas Southwestern Medical Center¹, Stanford University², Johns Hopkins University³

01 Aug 2019-Nature Biotechnology

TL;DR: This work presents a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index, and uses it to represent and search an expanded model of the human reference genome.

...read moreread less

Abstract: The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays. A graph-based genome indexing scheme enables variant-aware alignment of sequences with very low memory requirements.

...read moreread less

4,855 citations

Journal Article•DOI•

Search-and-replace genome editing without double-strand breaks or donor DNA

[...]

Andrew V. Anzalone¹, Andrew V. Anzalone², Andrew V. Anzalone³, Peyton B. Randolph³, Peyton B. Randolph², Peyton B. Randolph¹, Jessie Rose Davis³, Jessie Rose Davis², Jessie Rose Davis¹, Alexander A. Sousa³, Alexander A. Sousa², Alexander A. Sousa¹, Luke W. Koblan³, Luke W. Koblan², Luke W. Koblan¹, Jonathan M. Levy¹, Jonathan M. Levy², Jonathan M. Levy³, Peter J. Chen¹, Peter J. Chen², Peter J. Chen³, Christine D. Wilson³, Christine D. Wilson¹, Christine D. Wilson², Gregory A. Newby³, Gregory A. Newby², Gregory A. Newby¹, Aditya Raguram¹, Aditya Raguram², Aditya Raguram³, David R. Liu¹, David R. Liu², David R. Liu³ - Show less +29 more•Institutions (3)

Broad Institute¹, Harvard University², Howard Hughes Medical Institute³

21 Oct 2019-Nature

TL;DR: A new DNA-editing technique called prime editing offers improved versatility and efficiency with reduced byproducts compared with existing techniques, and shows potential for correcting disease-associated mutations.

...read moreread less

Abstract: Most genetic variants that contribute to disease1 are challenging to correct efficiently and without excess byproducts2-5. Here we describe prime editing, a versatile and precise genome editing method that directly writes new genetic information into a specified DNA site using a catalytically impaired Cas9 endonuclease fused to an engineered reverse transcriptase, programmed with a prime editing guide RNA (pegRNA) that both specifies the target site and encodes the desired edit. We performed more than 175 edits in human cells, including targeted insertions, deletions, and all 12 types of point mutation, without requiring double-strand breaks or donor DNA templates. We used prime editing in human cells to correct, efficiently and with few byproducts, the primary genetic causes of sickle cell disease (requiring a transversion in HBB) and Tay-Sachs disease (requiring a deletion in HEXA); to install a protective transversion in PRNP; and to insert various tags and epitopes precisely into target loci. Four human cell lines and primary post-mitotic mouse cortical neurons support prime editing with varying efficiencies. Prime editing shows higher or similar efficiency and fewer byproducts than homology-directed repair, has complementary strengths and weaknesses compared to base editing, and induces much lower off-target editing than Cas9 nuclease at known Cas9 off-target sites. Prime editing substantially expands the scope and capabilities of genome editing, and in principle could correct up to 89% of known genetic variants associated with human diseases.

...read moreread less

2,260 citations

Journal Article•DOI•

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

[...]

Pierre-Alain Chaumeil¹, Aaron J. Mussig¹, Philip Hugenholtz¹, Donovan H. Parks¹•Institutions (1)

University of Queensland¹

15 Nov 2019-Bioinformatics

TL;DR: The accuracy of the GTDB-Tk taxonomic assignments is demonstrated by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes.

...read moreread less

Abstract: A Summary: The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes.

...read moreread less

2,053 citations

Journal Article•DOI•

Asymmetric paralog evolution between the “cryptic” gene Bmp16 and its well-studied sister genes Bmp2 and Bmp4

[...]

Nathalie Feiner¹, Fumio Motone², Axel Meyer¹, Shigehiro Kuraku¹•Institutions (2)

University of Konstanz¹, Kwansei Gakuin University²

28 Feb 2019-Scientific Reports

TL;DR: The phylogenetic analysis complemented with synteny analyses suggests that Bmp2, -4 and -16 are remnants of a gene quartet that originated during the two rounds of whole-genome duplication (2R-WGD) early in vertebrate evolution.

...read moreread less

Abstract: The vertebrate gene repertoire is characterized by “cryptic” genes whose identification has been hampered by their absence from the genomes of well-studied species. One example is the Bmp16 gene, a paralog of the developmental key genes Bmp2 and -4. We focus on the Bmp2/4/16 group of genes to study the evolutionary dynamics following gen(om)e duplications with special emphasis on the poorly studied Bmp16 gene. We reveal the presence of Bmp16 in chondrichthyans in addition to previously reported teleost fishes and reptiles. Using comprehensive, vertebrate-wide gene sampling, our phylogenetic analysis complemented with synteny analyses suggests that Bmp2, -4 and -16 are remnants of a gene quartet that originated during the two rounds of whole-genome duplication (2R-WGD) early in vertebrate evolution. We confirm that Bmp16 genes were lost independently in at least three lineages (mammals, archelosaurs and amphibians) and report that they have elevated rates of sequence evolution. This finding agrees with their more “flexible” deployment during development; while Bmp16 has limited embryonic expression domains in the cloudy catshark, it is broadly expressed in the green anole lizard. Our study illustrates the dynamics of gene family evolution by integrating insights from sequence diversification, gene repertoire changes, and shuffling of expression domains.

...read moreread less

1,376 citations

Journal Article•DOI•

TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy

[...]

Jan P. Meier-Kolthoff¹, Markus Göker¹•Institutions (1)

Leibniz Association¹

16 May 2019-Nature Communications

TL;DR: TYGS, the Type (Strain) Genome Server, a user-friendly high-throughput web server for genome-based prokaryote taxonomy and analysis connected to a large, continuously growing database of genomic, taxonomic and nomenclatural information.

...read moreread less

Abstract: Microbial taxonomy is increasingly influenced by genome-based computational methods. Yet such analyses can be complex and require expert knowledge. Here we introduce TYGS, the Type (Strain) Genome Server, a user-friendly high-throughput web server for genome-based prokaryote taxonomy, connected to a large, continuously growing database of genomic, taxonomic and nomenclatural information. It infers genome-scale phylogenies and state-of-the-art estimates for species and subspecies boundaries from user-defined and automatically determined closest type genome sequences. TYGS also provides comprehensive access to nomenclature, synonymy and associated taxonomic literature. Clinically important examples demonstrate how TYGS can yield new insights into microbial classification, such as evidence for a species-level separation of previously proposed subspecies of Salmonella enterica. TYGS is an integrated approach for the classification of microbes that unlocks novel scientific approaches to microbiologists worldwide and is particularly helpful for the rapidly expanding field of genome-based taxonomic descriptions of new genera, species or subspecies.

...read moreread less

1,202 citations

Posted Content•DOI•

Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes

[...]

Konrad J. Karczewski¹, Konrad J. Karczewski², Laurent C. Francioli¹, Laurent C. Francioli², Grace Tiao¹, Grace Tiao², Beryl B. Cummings¹, Beryl B. Cummings², Jessica Alföldi², Jessica Alföldi¹, Qingbo Wang², Qingbo Wang¹, Ryan L. Collins², Ryan L. Collins¹, Kristen M. Laricchia¹, Kristen M. Laricchia², Andrea Ganna¹, Andrea Ganna³, Andrea Ganna², Daniel P. Birnbaum¹, Laura D. Gauthier¹, Harrison Brand², Harrison Brand¹, Matthew Solomonson², Matthew Solomonson¹, Nicholas A. Watts², Nicholas A. Watts¹, Daniel R. Rhodes⁴, Moriel Singer-Berk¹, Eleanor G. Seaby¹, Eleanor G. Seaby², Jack A. Kosmicki², Jack A. Kosmicki¹, Raymond K. Walters¹, Raymond K. Walters², Katherine Tashman², Katherine Tashman¹, Yossi Farjoun¹, Eric Banks¹, Timothy Poterba¹, Timothy Poterba², Arcturus Wang², Arcturus Wang¹, Cotton Seed¹, Cotton Seed², Nicola Whiffin⁵, Nicola Whiffin¹, Jessica X. Chong⁶, Kaitlin E. Samocha⁷, Emma Pierce-Hoffman¹, Zachary Zappala¹, Zachary Zappala⁸, Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria⁹, Eric Vallabh Minikel¹, Ben Weisburd¹, Monkol Lek¹⁰, Monkol Lek¹, James S. Ware¹, James S. Ware⁵, Christopher Vittal², Christopher Vittal¹, Irina M. Armean¹¹, Irina M. Armean¹, Irina M. Armean², Louis Bergelson¹, Kristian Cibulskis¹, Kristen M. Connolly¹, Miguel Covarrubias¹, Stacey Donnelly¹, Steven Ferriera¹, Stacey Gabriel¹, Jeff Gentry¹, Namrata Gupta¹, Thibault Jeandet¹, Diane Kaplan¹, Christopher Llanwarne¹, Ruchi Munshi¹, Sam Novod¹, Nikelle Petrillo¹, David Roazen¹, Valentin Ruano-Rubio¹, Andrea Saltzman¹, Molly Schleicher¹, Jose Soto¹, Kathleen Tibbetts¹, Charlotte Tolonen¹, Gordon Wade¹, Michael E. Talkowski², Michael E. Talkowski¹, Benjamin M. Neale¹, Benjamin M. Neale², Mark J. Daly¹, Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +92 more•Institutions (11)

Broad Institute¹, Harvard University², University of Helsinki³, Queen Mary University of London⁴, National Institutes of Health⁵, University of Washington⁶, Wellcome Trust Sanger Institute⁷, Vertex Pharmaceuticals⁸, Boston Children's Hospital⁹, Yale University¹⁰, European Bioinformatics Institute¹¹

30 Jan 2019-bioRxiv

TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

...read moreread less

Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

...read moreread less

1,128 citations

Journal Article•DOI•

VFDB 2019: a comparative pathogenomic platform with an interactive web interface.

[...]

Bo Liu¹, Dandan Zheng¹, Qi Jin¹, Lihong Chen¹, Jian Yang¹ - Show less +1 more•Institutions (1)

Peking Union Medical College¹

08 Jan 2019-Nucleic Acids Research

TL;DR: An integrated and automatic pipeline, VFanalyzer, is introduced to VFDB to systematically identify known/potential VFs in complete/draft bacterial genomes through a context-based data refinement process for VFs encoded by gene clusters that can achieve relatively high specificity and sensitivity without manual curation.

...read moreread less

Abstract: The virulence factor database (VFDB, http://www.mgc.ac.cn/VFs/) is devoted to providing the scientific community with a comprehensive warehouse and online platform for deciphering bacterial pathogenesis. The various combinations, organizations and expressions of virulence factors (VFs) are responsible for the diverse clinical symptoms of pathogen infections. Currently, whole-genome sequencing is widely used to decode potential novel or variant pathogens both in emergent outbreaks and in routine clinical practice. However, the efficient characterization of pathogenomic compositions remains a challenge for microbiologists or physicians with limited bioinformatics skills. Therefore, we introduced to VFDB an integrated and automatic pipeline, VFanalyzer, to systematically identify known/potential VFs in complete/draft bacterial genomes. VFanalyzer first constructs orthologous groups within the query genome and preanalyzed reference genomes from VFDB to avoid potential false positives due to paralogs. Then, it conducts iterative and exhaustive sequence similarity searches among the hierarchical prebuilt datasets of VFDB to accurately identify potential untypical/strain-specific VFs. Finally, via a context-based data refinement process for VFs encoded by gene clusters, VFanalyzer can achieve relatively high specificity and sensitivity without manual curation. In addition, a thoroughly optimized interactive web interface is introduced to present VFanalyzer reports in comparative pathogenomic style for easy online analysis.

...read moreread less

1,008 citations

Journal Article•DOI•

Benefits and limitations of genome-wide association studies.

[...]

Vivian W.Y. Tam¹, Nikunj Patel¹, Michelle Turcotte¹, Yohan Bossé², Guillaume Paré¹, David Meyre³, David Meyre¹ - Show less +3 more•Institutions (3)

McMaster University¹, Laval University², University of Lorraine³

01 Aug 2019-Nature Reviews Genetics

TL;DR: This Review comprehensively assess the benefits and limitations of GWAS in human populations and discusses the relevance of performing more GWAS, with a focus on the cardiometabolic field.

...read moreread less

Abstract: Genome-wide association studies (GWAS) involve testing genetic variants across the genomes of many individuals to identify genotype–phenotype associations. GWAS have revolutionized the field of complex disease genetics over the past decade, providing numerous compelling associations for human complex traits and diseases. Despite clear successes in identifying novel disease susceptibility genes and biological pathways and in translating these findings into clinical care, GWAS have not been without controversy. Prominent criticisms include concerns that GWAS will eventually implicate the entire genome in disease predisposition and that most association signals reflect variants and genes with no direct biological relevance to disease. In this Review, we comprehensively assess the benefits and limitations of GWAS in human populations and discuss the relevance of performing more GWAS. Despite the success of human genome-wide association studies (GWAS) in associating genetic variants and complex diseases or traits, criticisms of the usefulness of this study design remain. This Review assesses the pros and cons of GWAS, with a focus on the cardiometabolic field.

...read moreread less

1,002 citations

Journal Article•DOI•

OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes.

[...]

Stephan Greiner¹, Pascal Lehwark, Ralph Bock¹•Institutions (1)

Max Planck Society¹

02 Jul 2019-Nucleic Acids Research

TL;DR: A new version of OGDRAW equipped with a new front end enables the user to easily visualize large sets of organellar genomes spanning entire taxonomic clades.

...read moreread less

Abstract: Organellar (plastid and mitochondrial) genomes play an important role in resolving phylogenetic relationships, and next-generation sequencing technologies have led to a burst in their availability. The ongoing massive sequencing efforts require software tools for routine assembly and annotation of organellar genomes as well as their display as physical maps. OrganellarGenomeDRAW (OGDRAW) has become the standard tool to draw graphical maps of plastid and mitochondrial genomes. Here, we present a new version of OGDRAW equipped with a new front end. Besides several new features, OGDRAW now has access to a local copy of the organelle genome database of the NCBI RefSeq project. Together with batch processing of (multi-)GenBank files, this enables the user to easily visualize large sets of organellar genomes spanning entire taxonomic clades. The new OGDRAW server can be accessed at https://chlorobox.mpimp-golm.mpg.de/OGDraw.html.

...read moreread less

888 citations

Journal Article•DOI•

CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing

[...]

Kornel Labun¹, Tessa G. Montague², Maximilian Krause¹, Yamila N. Torres Cleuren¹, Håkon Tjeldnes¹, Eivind Valen¹ - Show less +2 more•Institutions (2)

University of Bergen¹, Columbia University²

02 Jul 2019-Nucleic Acids Research

TL;DR: This major update of CHOPCHOP introduces functionality for targeting RNA with Cas13, which includes support for alternative transcript isoforms and RNA accessibility predictions, and incorporates new DNA targeting modes, including CRISPR activation/repression, targeted enrichment of loci for long-read sequencing, and prediction of Cas9 repair outcomes.

...read moreread less

Abstract: The CRISPR-Cas system is a powerful genome editing tool that functions in a diverse array of organisms and cell types. The technology was initially developed to induce targeted mutations in DNA, but CRISPR-Cas has now been adapted to target nucleic acids for a range of purposes. CHOPCHOP is a web tool for identifying CRISPR-Cas single guide RNA (sgRNA) targets. In this major update of CHOPCHOP, we expand our toolbox beyond knockouts. We introduce functionality for targeting RNA with Cas13, which includes support for alternative transcript isoforms and RNA accessibility predictions. We incorporate new DNA targeting modes, including CRISPR activation/repression, targeted enrichment of loci for long-read sequencing, and prediction of Cas9 repair outcomes. Finally, we expand our results page visualization to reveal alternative isoforms and downstream ATG sites, which will aid users in avoiding the expression of truncated proteins. The CHOPCHOP web tool now supports over 200 genomes and we have released a command-line script for running larger jobs and handling unsupported genomes. CHOPCHOP v3 can be found at https://chopchop.cbu.uib.no.

...read moreread less

879 citations

Journal Article•DOI•

The ENCODE Blacklist: Identification of Problematic Regions of the Genome.

[...]

Haley M. Amemiya¹, Anshul Kundaje², Alan P. Boyle¹•Institutions (2)

University of Michigan¹, Stanford University²

27 Jun 2019-Scientific Reports

TL;DR: The ENCODE blacklist is defined- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.

...read moreread less

Abstract: Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.

...read moreread less

Journal Article•DOI•

Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors.

[...]

Robert N. Kirchdoerfer¹, Andrew B. Ward¹•Institutions (1)

Scripps Research Institute¹

28 May 2019-Nature Communications

TL;DR: This structure illuminates the assembly of the coronavirus core RNA-synthesis machinery, provides key insights into nsp12 polymerase catalysis and fidelity and acts as a template for the design of novel antiviral therapeutics.

...read moreread less

Abstract: Recent history is punctuated by the emergence of highly pathogenic coronaviruses such as SARS- and MERS-CoV into human circulation. Upon infecting host cells, coronaviruses assemble a multi-subunit RNA-synthesis complex of viral non-structural proteins (nsp) responsible for the replication and transcription of the viral genome. Here, we present the 3.1 A resolution structure of the SARS-CoV nsp12 polymerase bound to its essential co-factors, nsp7 and nsp8, using single particle cryo-electron microscopy. nsp12 possesses an architecture common to all viral polymerases as well as a large N-terminal extension containing a kinase-like fold and is bound by two nsp8 co-factors. This structure illuminates the assembly of the coronavirus core RNA-synthesis machinery, provides key insights into nsp12 polymerase catalysis and fidelity and acts as a template for the design of novel antiviral therapeutics. The pathogenic human coronaviruses SARS- and MERS-CoV can cause severe respiratory disease. Here the authors present the 3.1A cryo-EM structure of the SARS-CoV RNA polymerase nsp12 bound to its essential co-factors nsp7 and nsp8, which is of interest for antiviral drug development.

...read moreread less

Posted Content•DOI•

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

[...]

Daniel Taliun¹, Daniel N. Harris², Michael D. Kessler², Jedidiah Carlson³ +191 more•Institutions (61)

06 Mar 2019-bioRxiv

TL;DR: The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation as well as resources and early insights from the sequence data.

...read moreread less

Abstract: Summary paragraph The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency

...read moreread less

Journal Article•DOI•

The UCSC Genome Browser database: 2019 update.

[...]

Maximilian Haeussler¹, Ann S. Zweig¹, Cath Tyner¹, Matthew L. Speir¹, Kate R. Rosenbloom¹, Brian J. Raney¹, Christopher Lee¹, Brian T. Lee¹, Angie S. Hinrichs¹, Jairo Navarro Gonzalez¹, David Gibson¹, Mark Diekhans¹, Hiram Clawson¹, Jonathan Casper¹, Galt P. Barber¹, David Haussler¹, Robert M. Kuhn¹, W. James Kent¹ - Show less +14 more•Institutions (1)

University of California, Santa Cruz¹

08 Jan 2019-Nucleic Acids Research

TL;DR: A new tool is added that lets users interactively arrange existing graphing tracks into new groups and create a 30-way primate alignment on the human genome in the UCSC Genome Browser.

...read moreread less

Abstract: The UCSC Genome Browser (https://genome.ucsc.edu) is a graphical viewer for exploring genome annotations. For almost two decades, the Browser has provided visualization tools for genetics and molecular biology and continues to add new data and features. This year, we added a new tool that lets users interactively arrange existing graphing tracks into new groups. Other software additions include new formats for chromosome interactions, a ChIP-Seq peak display for track hubs and improved support for HGVS. On the annotation side, we have added gnomAD, TCGA expression, RefSeq Functional elements, GTEx eQTLs, CRISPR Guides, SNPpedia and created a 30-way primate alignment on the human genome. Nine assemblies now have RefSeq-mapped gene models.

...read moreread less

Journal Article•DOI•

Multi-platform discovery of haplotype-resolved structural variation in human genomes

[...]

Mark Chaisson¹, Mark Chaisson², Ashley D. Sanders, Xuefang Zhao³, Xuefang Zhao⁴, Ankit Malhotra, David Porubsky⁵, David Porubsky⁶, Tobias Rausch, Eugene J. Gardner⁷, Oscar L. Rodriguez⁸, Li Guo⁹, Ryan L. Collins⁴, Xian Fan¹⁰, Jia Wen¹¹, Robert E. Handsaker⁴, Robert E. Handsaker¹², Susan Fairley¹³, Zev N. Kronenberg², Xiangmeng Kong¹⁴, Fereydoun Hormozdiari¹⁵, Dillon Lee¹⁶, Aaron M. Wenger¹⁷, Alex Hastie, Danny Antaki¹⁸, Thomas Anantharaman, Peter A. Audano², Harrison Brand⁴, Stuart Cantsilieris², Han Cao, Eliza Cerveira, Chong Chen¹⁰, Xintong Chen⁷, Chen-Shan Chin¹⁷, Zechen Chong¹⁰, Nelson T. Chuang⁷, Christine C. Lambert¹⁷, Deanna M. Church, Laura Clarke¹³, Andrew Farrell¹⁶, Joey Flores¹⁹, Timur R. Galeev¹⁴, David U. Gorkin¹⁸, David U. Gorkin²⁰, Madhusudan Gujral¹⁸, Victor Guryev⁵, William Haynes Heaton, Jonas Korlach¹⁷, Sushant Kumar¹⁴, Jee Young Kwon²¹, Ernest T. Lam, Jong Eun Lee, Joyce V. Lee, Wan-Ping Lee, Sau Peng Lee, Shantao Li¹⁴, Patrick Marks, Karine A. Viaud-Martinez¹⁹, Sascha Meiers, Katherine M. Munson², Fabio C. P. Navarro¹⁴, Bradley J. Nelson², Conor Nodzak¹¹, Amina Noor¹⁸, Sofia Kyriazopoulou-Panagiotopoulou, Andy Wing Chun Pang, Yunjiang Qiu¹⁸, Yunjiang Qiu²⁰, Gabriel Rosanio¹⁸, Mallory Ryan, Adrian M. Stütz, Diana C.J. Spierings⁵, Alistair Ward¹⁶, Anne Marie E. Welch², Ming Xiao²², Wei Xu, Chengsheng Zhang, Qihui Zhu, Xiangqun Zheng-Bradley¹³, Ernesto Lowy¹³, Sergei Yakneen, Steven A. McCarroll¹², Steven A. McCarroll⁴, Goo Jun²³, Li Ding²⁴, Chong-Lek Koh²⁵, Bing Ren²⁰, Bing Ren¹⁸, Paul Flicek¹³, Ken Chen¹⁰, Mark Gerstein, Pui-Yan Kwok²⁶, Peter M. Lansdorp²⁷, Peter M. Lansdorp⁵, Peter M. Lansdorp²⁸, Gabor T. Marth¹⁶, Jonathan Sebat¹⁸, Xinghua Shi¹¹, Ali Bashir⁸, Kai Ye⁹, Scott E. Devine⁷, Michael E. Talkowski⁴, Michael E. Talkowski¹², Ryan E. Mills³, Tobias Marschall⁶, Jan O. Korbel¹³, Evan E. Eichler², Charles Lee²¹ - Show less +104 more•Institutions (28)

University of Southern California¹, University of Washington², University of Michigan³, Harvard University⁴, University of Groningen⁵, Max Planck Society⁶, University of Maryland, Baltimore⁷, Icahn School of Medicine at Mount Sinai⁸, Xi'an Jiaotong University⁹, University of Texas MD Anderson Cancer Center¹⁰, University of North Carolina at Charlotte¹¹, Broad Institute¹², European Bioinformatics Institute¹³, Yale University¹⁴, University of California, Davis¹⁵, University of Utah¹⁶, Pacific Biosciences¹⁷, University of California, San Diego¹⁸, Illumina¹⁹, Ludwig Institute for Cancer Research²⁰, Ewha Womans University²¹, Drexel University²², University of Texas Health Science Center at Houston²³, Washington University in St. Louis²⁴, University of Malaya²⁵, University of California, San Francisco²⁶, University of British Columbia²⁷, BC Cancer Agency²⁸

16 Apr 2019-Nature Communications

TL;DR: A suite of long-read, short- read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms are applied to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner.

...read moreread less

Abstract: The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

...read moreread less

Journal Article•DOI•

Mouse Genome Database (MGD) 2019.

[...]

Carol J. Bult, Judith A. Blake, Cynthia L. Smith, James A. Kadin, Joel E. Richardson - Show less +1 more

08 Jan 2019-Nucleic Acids Research

TL;DR: Significant enhancements to MGD are described, including two new graphical user interfaces: the Multi Genome Viewer for exploring the genomes of multiple mouse strains and the Phenotype-Gene Expression matrix which was developed in collaboration with the Gene Expression Database (GXD) and allows researchers to compare gene expression and phenotype annotations for mouse genes.

...read moreread less

Abstract: The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the community model organism genetic and genome resource for the laboratory mouse. MGD is the authoritative source for biological reference data sets related to mouse genes, gene functions, phenotypes, and mouse models of human disease. MGD is the primary outlet for official gene, allele and mouse strain nomenclature based on the guidelines set by the International Committee on Standardized Nomenclature for Mice. In this report we describe significant enhancements to MGD, including two new graphical user interfaces: (i) the Multi Genome Viewer for exploring the genomes of multiple mouse strains and (ii) the Phenotype-Gene Expression matrix which was developed in collaboration with the Gene Expression Database (GXD) and allows researchers to compare gene expression and phenotype annotations for mouse genes. Other recent improvements include enhanced efficiency of our literature curation processes and the incorporation of Transcriptional Start Site (TSS) annotations from RIKEN's FANTOM 5 initiative.

...read moreread less

Journal Article•DOI•

Genome-wide cell-free DNA fragmentation in patients with cancer

[...]

Stephen Cristiano¹, Alessandro Leal¹, Jillian Phallen¹, Jacob Fiksel¹, Vilmos Adleff¹, Daniel C. Bruhm¹, Sarah Østrup Jensen², Jamie E. Medina¹, Carolyn Hruban¹, James R. White¹, Doreen N. Palsgrove¹, Noushin Niknafs¹, Valsamo Anagnostou¹, Patrick M. Forde¹, Jarushka Naidoo¹, Kristen A. Marrone¹, Julie R. Brahmer¹, Brian Woodward³, Hatim Husain³, Karlijn L. van Rooijen⁴, Mai Britt Worm Ørntoft², Anders Husted Madsen², Cornelis J.H. van de Velde⁵, Marcel Verheij⁶, Annemieke Cats⁶, Cornelis J. A. Punt⁷, Geraldine R. Vink⁴, Nicole C.T. van Grieken⁸, Miriam Koopman⁴, Remond J.A. Fijneman⁶, Julia S. Johansen⁹, Hans Jørgen Nielsen⁹, Gerrit A. Meijer⁶, Claus L. Andersen², Robert B. Scharpf¹, Victor E. Velculescu¹ - Show less +32 more•Institutions (9)

Johns Hopkins University¹, Aarhus University², University of California, San Diego³, Utrecht University⁴, Leiden University⁵, Netherlands Cancer Institute⁶, University of Amsterdam⁷, VU University Medical Center⁸, University of Copenhagen⁹

20 Jun 2019-Nature

TL;DR: An approach to evaluate fragmentation patterns of cell-free DNA across the genome was developed, and found that profiles of healthy individuals reflected nucleosomal patterns of white blood cells, whereas patients with cancer had altered fragmentation profiles.

...read moreread less

Abstract: Cell-free DNA in the blood provides a non-invasive diagnostic avenue for patients with cancer1. However, characteristics of the origins and molecular features of cell-free DNA are poorly understood. Here we developed an approach to evaluate fragmentation patterns of cell-free DNA across the genome, and found that profiles of healthy individuals reflected nucleosomal patterns of white blood cells, whereas patients with cancer had altered fragmentation profiles. We used this method to analyse the fragmentation profiles of 236 patients with breast, colorectal, lung, ovarian, pancreatic, gastric or bile duct cancer and 245 healthy individuals. A machine learning model that incorporated genome-wide fragmentation features had sensitivities of detection ranging from 57% to more than 99% among the seven cancer types at 98% specificity, with an overall area under the curve value of 0.94. Fragmentation profiles could be used to identify the tissue of origin of the cancers to a limited number of sites in 75% of cases. Combining our approach with mutation-based cell-free DNA analyses detected 91% of patients with cancer. The results of these analyses highlight important properties of cell-free DNA and provide a proof-of-principle approach for the screening, early detection and monitoring of human cancer.

...read moreread less

Journal Article•DOI•

Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations.

[...]

Charles P. Fulco¹, Charles P. Fulco², Joseph Nasser¹, Thouis R. Jones¹, Glen Munson¹, Drew T. Bergman¹, Vidya Subramanian¹, Sharon R. Grossman³, Sharon R. Grossman¹, Rockwell Anyoha¹, Benjamin R. Doughty¹, Tejal A. Patwardhan¹, Tung T. Nguyen¹, Michael Kane¹, Elizabeth M. Perez¹, Neva C. Durand, Caleb A. Lareau¹, Elena K. Stamenova¹, Erez Lieberman Aiden, Eric S. Lander², Eric S. Lander¹, Eric S. Lander³, Jesse M. Engreitz², Jesse M. Engreitz¹ - Show less +20 more•Institutions (3)

Broad Institute¹, Harvard University², Massachusetts Institute of Technology³

03 Jul 2019-Nature Genetics

TL;DR: A simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in the CRISPR dataset and allows systematic mapping of enhancer–gene connections in a given cell type, on the basis of chromatin-state measurements.

...read moreread less

Abstract: Enhancer elements in the human genome control how genes are expressed in specific cell types and harbor thousands of genetic variants that influence risk for common diseases1-4. Yet, we still do not know how enhancers regulate specific genes, and we lack general rules to predict enhancer-gene connections across cell types5,6. We developed an experimental approach, CRISPRi-FlowFISH, to perturb enhancers in the genome, and we applied it to test >3,500 potential enhancer-gene connections for 30 genes. We found that a simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in our CRISPR dataset. This activity-by-contact model allows us to construct genome-wide maps of enhancer-gene connections in a given cell type, on the basis of chromatin state measurements. Together, CRISPRi-FlowFISH and the activity-by-contact model provide a systematic approach to map and predict which enhancers regulate which genes, and will help to interpret the functions of the thousands of disease risk variants in the noncoding genome.

...read moreread less

Journal Article•DOI•

Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome

[...]

Víctor J. Carrión¹, Juan E. Pérez-Jaramillo², Viviane Cordovez¹, Vittorio Tracanna³, Mattias de Hollander, Daniel Ruiz-Buck, Lucas William Mendes⁴, Wilfred F. J. van IJcken⁵, Ruth Gomez-Exposito³, Somayah S. Elsayed¹, Prarthana Mohanraju³, Adini Q Arifah³, John van der Oost³, Joseph N. Paulson⁶, Rodrigo Mendes⁷, Gilles P. van Wezel¹, Marnix H. Medema³, Jos M. Raaijmakers¹ - Show less +14 more•Institutions (7)

Leiden University¹, University of Antioquia², Wageningen University and Research Centre³, University of São Paulo⁴, Erasmus University Rotterdam⁵, Genentech⁶, Empresa Brasileira de Pesquisa Agropecuária⁷

01 Nov 2019-Science

TL;DR: The results highlight that endophytic root microbiomes harbor a wealth of as yet unknown functional traits that, in concert, can protect the plant inside out.

...read moreread less

Abstract: Microorganisms living inside plants can promote plant growth and health, but their genomic and functional diversity remain largely elusive. Here, metagenomics and network inference show that fungal infection of plant roots enriched for Chitinophagaceae and Flavobacteriaceae in the root endosphere and for chitinase genes and various unknown biosynthetic gene clusters encoding the production of nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs). After strain-level genome reconstruction, a consortium of Chitinophaga and Flavobacterium was designed that consistently suppressed fungal root disease. Site-directed mutagenesis then revealed that a previously unidentified NRPS-PKS gene cluster from Flavobacterium was essential for disease suppression by the endophytic consortium. Our results highlight that endophytic root microbiomes harbor a wealth of as yet unknown functional traits that, in concert, can protect the plant inside out.

...read moreread less

Journal Article•DOI•

CPGAVAS2, an integrated plastome sequence annotator and analyzer

[...]

Linchun Shi¹, Haimei Chen¹, Mei Jiang¹, Liqiang Wang¹, Xi Wu¹, Linfang Huang¹, Chang Liu¹ - Show less +3 more•Institutions (1)

Peking Union Medical College¹

02 Jul 2019-Nucleic Acids Research

TL;DR: The results of two case studies show that CPGAVAS2 annotates better than several other servers, and will likely become an indispensible tool for plastome research.

...read moreread less

Abstract: We previously developed a web server CPGAVAS for annotation, visualization and GenBank submission of plastome sequences. Here, we upgrade the server into CPGAVAS2 to address the following challenges: (i) inaccurate annotation in the reference sequence likely causing the propagation of errors; (ii) difficulty in the annotation of small exons of genes petB, petD and rps16 and trans-splicing gene rps12; (iii) lack of annotation for other genome features and their visualization, such as repeat elements; and (iv) lack of modules for diversity analysis of plastomes. In particular, CPGAVAS2 provides two reference datasets for plastome annotation. The first dataset contains 43 plastomes whose annotation have been validated or corrected by RNA-seq data. The second one contains 2544 plastomes curated with sequence alignment. Two new algorithms are also implemented to correctly annotate small exons and trans-splicing genes. Tandem and dispersed repeats are identified, whose results are displayed on a circular map together with the annotated genes. DNA-seq and RNA-seq data can be uploaded for identification of single-nucleotide polymorphism sites and RNA-editing sites. The results of two case studies show that CPGAVAS2 annotates better than several other servers. CPGAVAS2 will likely become an indispensible tool for plastome research and can be accessed from http://www.herbalgenomics.org/cpgavas2.

...read moreread less

Journal Article•DOI•

Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis.

[...]

Rongbin Zheng¹, Changxin Wan¹, Shenglin Mei¹, Qian Qin¹, Qiu Wu¹, Hanfei Sun¹, Chen-Hao Chen², Myles Brown², Xiaoyan Zhang¹, Clifford A. Meyer², X. Shirley Liu¹, X. Shirley Liu² - Show less +8 more•Institutions (2)

Tongji University¹, Harvard University²

08 Jan 2019-Nucleic Acids Research

TL;DR: The Cistrome DB has a new Toolkit module with several features that allow users to better utilize the large-scale ChIP-seq, DNase-seq and ATAC-seq data, and the new tools will greatly benefit the biomedical research community.

...read moreread less

Abstract: The Cistrome Data Browser (DB) is a resource of human and mouse cis-regulatory information derived from ChIP-seq, DNase-seq and ATAC-seq chromatin profiling assays, which map the genome-wide locations of transcription factor binding sites, histone post-translational modifications and regions of chromatin accessible to endonuclease activity. Currently, the Cistrome DB contains approximately 47,000 human and mouse samples with about 24,000 newly collected datasets compared to the previous release two years ago. Furthermore, the Cistrome DB has a new Toolkit module with several features that allow users to better utilize the large-scale ChIP-seq, DNase-seq, and ATAC-seq data. First, users can query the factors which are likely to regulate a specific gene of interest. Second, the Cistrome DB Toolkit facilitates searches for factor binding, histone modifications, and chromatin accessibility in any given genomic interval shorter than 2Mb. Third, the Toolkit can determine the most similar ChIP-seq, DNase-seq, and ATAC-seq samples in terms of genomic interval overlaps with user-provided genomic interval sets. The Cistrome DB is a user-friendly, up-to-date, and well maintained resource, and the new tools will greatly benefit the biomedical research community. The database is freely available at http://cistrome.org/db, and the Toolkit is at http://dbtoolkit.cistrome.org.

...read moreread less

Journal Article•DOI•

Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants.

[...]

Xin Qiao¹, Qionghou Li¹, Hao Yin¹, Kaijie Qi¹, Leiting Li¹, Runze Wang¹, Shaoling Zhang¹, Andrew H. Paterson² - Show less +4 more•Institutions (2)

Nanjing Agricultural University¹, Plant Genome Mapping Laboratory²

21 Feb 2019-Genome Biology

TL;DR: A comprehensive landscape of different modes of gene duplication across the plant kingdom is identified by comparing 141 genomes, which provides a solid foundation for further investigation of the dynamic evolution of duplicate genes.

...read moreread less

Abstract: The sharp increase of plant genome and transcriptome data provide valuable resources to investigate evolutionary consequences of gene duplication in a range of taxa, and unravel common principles underlying duplicate gene retention. We survey 141 sequenced plant genomes to elucidate consequences of gene and genome duplication, processes central to the evolution of biodiversity. We develop a pipeline named DupGen_finder to identify different modes of gene duplication in plants. Genes derived from whole-genome, tandem, proximal, transposed, or dispersed duplication differ in abundance, selection pressure, expression divergence, and gene conversion rate among genomes. The number of WGD-derived duplicate genes decreases exponentially with increasing age of duplication events—transposed duplication- and dispersed duplication-derived genes declined in parallel. In contrast, the frequency of tandem and proximal duplications showed no significant decrease over time, providing a continuous supply of variants available for adaptation to continuously changing environments. Moreover, tandem and proximal duplicates experienced stronger selective pressure than genes formed by other modes and evolved toward biased functional roles involved in plant self-defense. The rate of gene conversion among WGD-derived gene pairs declined over time, peaking shortly after polyploidization. To provide a platform for accessing duplicated gene pairs in different plants, we constructed the Plant Duplicate Gene Database. We identify a comprehensive landscape of different modes of gene duplication across the plant kingdom by comparing 141 genomes, which provides a solid foundation for further investigation of the dynamic evolution of duplicate genes.

...read moreread less

Journal Article•DOI•

New insights from uncultivated genomes of the global human gut microbiome

[...]

Stephen Nayfach¹, Stephen Nayfach², Zhou Jason Shi³, Rekha Seshadri¹, Rekha Seshadri², Katherine S. Pollard, Nikos C. Kyrpides², Nikos C. Kyrpides¹ - Show less +4 more•Institutions (3)

United States Department of Energy¹, Lawrence Berkeley National Laboratory², Gladstone Institutes³

13 Mar 2019-Nature

TL;DR: Draft prokaryotic genomes from faecal metagenomes of diverse human populations enrich the understanding of the human gut microbiome by identifying over two thousand new species-level taxa that have numerous disease associations.

...read moreread less

Abstract: The genome sequences of many species of the human gut microbiome remain unknown, largely owing to challenges in cultivating microorganisms under laboratory conditions. Here we address this problem by reconstructing 60,664 draft prokaryotic genomes from 3,810 faecal metagenomes, from geographically and phenotypically diverse humans. These genomes provide reference points for 2,058 newly identified species-level operational taxonomic units (OTUs), which represents a 50% increase over the previously known phylogenetic diversity of sequenced gut bacteria. On average, the newly identified OTUs comprise 33% of richness and 28% of species abundance per individual, and are enriched in humans from rural populations. A meta-analysis of clinical gut-microbiome studies pinpointed numerous disease associations for the newly identified OTUs, which have the potential to improve predictive models. Finally, our analysis revealed that uncultured gut species have undergone genome reduction that has resulted in the loss of certain biosynthetic pathways, which may offer clues for improving cultivation strategies in the future.

...read moreread less

Journal Article•DOI•

Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks

[...]

Ho Bin Jang¹, Benjamin Bolduc¹, Olivier Zablocki¹, Jens H. Kuhn², Simon Roux³, Evelien M. Adriaenssens⁴, Evelien M. Adriaenssens⁵, J. Rodney Brister², Andrew M. Kropinski⁶, Andrew M. Kropinski⁷, Mart Krupovic⁸, Rob Lavigne⁹, Dann Turner¹⁰, Matthew B. Sullivan¹ - Show less +10 more•Institutions (10)

Ohio State University¹, National Institutes of Health², United States Department of Energy³, Norwich Research Park⁴, University of Liverpool⁵, Ontario Veterinary College⁶, University of Guelph⁷, Pasteur Institute⁸, Katholieke Universiteit Leuven⁹, University of the West of England¹⁰

06 May 2019-Nature Biotechnology

TL;DR: This work presents vConTACT v.2.0, a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions, and applies it to analyze 15,280 Global Ocean Virome genome fragments.

...read moreread less

Abstract: Microbiomes from every environment contain a myriad of uncultivated archaeal and bacterial viruses, but studying these viruses is hampered by the lack of a universal, scalable taxonomic framework. We present vConTACT v.2.0, a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions. We report near-identical (96%) replication of existing genus-level viral taxonomy assignments from the International Committee on Taxonomy of Viruses for National Center for Biotechnology Information virus RefSeq. Application of vConTACT v.2.0 to 1,364 previously unclassified viruses deposited in virus RefSeq as reference genomes produced automatic, high-confidence genus assignments for 820 of the 1,364. We applied vConTACT v.2.0 to analyze 15,280 Global Ocean Virome genome fragments and were able to provide taxonomic assignments for 31% of these data, which shows that our algorithm is scalable to very large metagenomic datasets. Our taxonomy tool can be automated and applied to metagenomes from any environment for virus classification.

...read moreread less

Journal Article•DOI•

Heterochromatin drives compartmentalization of inverted and conventional nuclei

[...]

Martin Falk¹, Yana Feodorova², Yana Feodorova³, Natalia Naumova⁴, Maxim Imakaev¹, Bryan R. Lajoie⁵, Bryan R. Lajoie⁴, Heinrich Leonhardt³, Boris Joffe³, Job Dekker⁴, Geoffrey Fudenberg⁶, Geoffrey Fudenberg¹, Irina Solovei³, Leonid A. Mirny¹ - Show less +10 more•Institutions (6)

Massachusetts Institute of Technology¹, Medical University Plovdiv², Ludwig Maximilian University of Munich³, University of Massachusetts Medical School⁴, Illumina⁵, University of California, San Francisco⁶

05 Jun 2019-Nature

TL;DR: Attractions between heterochromatic regions are essential for phase separation of the active and inactive genome in inverted and conventional nuclei, whereas chromatin–lamina interactions are necessary to build the conventional genomic architecture from these segregated phases.

...read moreread less

Abstract: The nucleus of mammalian cells displays a distinct spatial segregation of active euchromatic and inactive heterochromatic regions of the genome1,2. In conventional nuclei, microscopy shows that euchromatin is localized in the nuclear interior and heterochromatin at the nuclear periphery1,2. Genome-wide chromosome conformation capture (Hi-C) analyses show this segregation as a plaid pattern of contact enrichment within euchromatin and heterochromatin compartments3, and depletion between them. Many mechanisms for the formation of compartments have been proposed, such as attraction of heterochromatin to the nuclear lamina2,4, preferential attraction of similar chromatin to each other1,4–12, higher levels of chromatin mobility in active chromatin13–15 and transcription-related clustering of euchromatin16,17. However, these hypotheses have remained inconclusive, owing to the difficulty of disentangling intra-chromatin and chromatin–lamina interactions in conventional nuclei18. The marked reorganization of interphase chromosomes in the inverted nuclei of rods in nocturnal mammals19,20 provides an opportunity to elucidate the mechanisms that underlie spatial compartmentalization. Here we combine Hi-C analysis of inverted rod nuclei with microscopy and polymer simulations. We find that attractions between heterochromatic regions are crucial for establishing both compartmentalization and the concentric shells of pericentromeric heterochromatin, facultative heterochromatin and euchromatin in the inverted nucleus. When interactions between heterochromatin and the lamina are added, the same model recreates the conventional nuclear organization. In addition, our models allow us to rule out mechanisms of compartmentalization that involve strong euchromatin interactions. Together, our experiments and modelling suggest that attractions between heterochromatic regions are essential for the phase separation of the active and inactive genome in inverted and conventional nuclei, whereas interactions of the chromatin with the lamina are necessary to build the conventional architecture from these segregated phases. Attractions between heterochromatic regions are essential for phase separation of the active and inactive genome in inverted and conventional nuclei, whereas chromatin–lamina interactions are necessary to build the conventional genomic architecture from these segregated phases.

...read moreread less

Journal Article•DOI•

The role of 3D genome organization in development and cell differentiation

[...]

Hui Zheng¹, Wei Xie¹•Institutions (1)

Tsinghua University¹

01 Sep 2019-Nature Reviews Molecular Cell Biology

TL;DR: This Review discusses recent progress in understanding of the general principles of chromatin folding, its regulation and its functions in mammalian development, and discusses the dynamics of 3D chromatin and genome organization during gametogenesis, embryonic development, lineage commitment and stem cell differentiation.

...read moreread less

Abstract: In eukaryotes, the genome does not exist as a linear molecule but instead is hierarchically packaged inside the nucleus. This complex genome organization includes multiscale structural units of chromosome territories, compartments, topologically associating domains, which are often demarcated by architectural proteins such as CTCF and cohesin, and chromatin loops. The 3D organization of chromatin modulates biological processes such as transcription, DNA replication, cell division and meiosis, which are crucial for cell differentiation and animal development. In this Review, we discuss recent progress in our understanding of the general principles of chromatin folding, its regulation and its functions in mammalian development. Specifically, we discuss the dynamics of 3D chromatin and genome organization during gametogenesis, embryonic development, lineage commitment and stem cell differentiation, and focus on the functions of chromatin architecture in transcription regulation. Finally, we discuss the role of 3D genome alterations in the aetiology of developmental disorders and human diseases.

...read moreread less

Journal Article•DOI•

RNA-guided DNA insertion with CRISPR-associated transposases.

[...]

Jonathan Strecker, Alim Ladha, Zachary Gardner, Jonathan L. Schmid-Burgk, Kira S. Makarova¹, Eugene V. Koonin¹, Feng Zhang - Show less +3 more•Institutions (1)

National Institutes of Health¹

05 Jul 2019-Science

TL;DR: This work expands the understanding of the functional diversity of CRISPR-Cas systems and establishes a paradigm for precision DNA insertion.

...read moreread less

Abstract: CRISPR-Cas nucleases are powerful tools for manipulating nucleic acids; however, targeted insertion of DNA remains a challenge, as it requires host cell repair machinery. Here we characterize a CRISPR-associated transposase from cyanobacteria Scytonema hofmanni (ShCAST) that consists of Tn7-like transposase subunits and the type V-K CRISPR effector (Cas12k). ShCAST catalyzes RNA-guided DNA transposition by unidirectionally inserting segments of DNA 60 to 66 base pairs downstream of the protospacer. ShCAST integrates DNA into targeted sites in the Escherichia coli genome with frequencies of up to 80% without positive selection. This work expands our understanding of the functional diversity of CRISPR-Cas systems and establishes a paradigm for precision DNA insertion.

...read moreread less

Journal Article•DOI•

A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens.

[...]

Molly Gasperini¹, Andrew J. Hill¹, José L. McFaline-Figueroa¹, Beth Martin¹, Seungsoo Kim¹, Melissa D. Zhang¹, Dana Jackson¹, Anh Leith¹, Jacob Schreiber¹, William Stafford Noble¹, Cole Trapnell¹, Nadav Ahituv², Jay Shendure³, Jay Shendure¹ - Show less +10 more•Institutions (3)

University of Washington¹, University of California, San Francisco², Howard Hughes Medical Institute³

10 Jan 2019-Cell

TL;DR: A multiplex, expression quantitative trait locus (eQTL)-inspired framework for mapping enhancer-gene pairs by introducing random combinations of CRISPR/Cas9-mediated perturbations to each of many cells, followed by single-cell RNA sequencing (RNA-seq).

...read moreread less

Journal Article•DOI•

Principles of genome folding into topologically associating domains

[...]

Quentin Szabo¹, Frédéric Bantignies¹, Giacomo Cavalli¹•Institutions (1)

University of Montpellier¹

01 Apr 2019-Science Advances

TL;DR: The main features of topologically associating domains across species are depicted and the relation between chromatin structure, genome activity, and epigenome is discussed, highlighting mechanistic principles of TAD formation.

...read moreread less

Abstract: Understanding the mechanisms that underlie chromosome folding within cell nuclei is essential to determine the relationship between genome structure and function. The recent application of "chromosome conformation capture" techniques has revealed that the genome of many species is organized into domains of preferential internal chromatin interactions called "topologically associating domains" (TADs). This chromosome chromosome folding has emerged as a key feature of higher-order genome organization and function through evolution. Although TADs have now been described in a wide range of organisms, they appear to have specific characteristics in terms of size, structure, and proteins involved in their formation. Here, we depict the main features of these domains across species and discuss the relation between chromatin structure, genome activity, and epigenome, highlighting mechanistic principles of TAD formation. We also consider the potential influence of TADs in genome evolution.

...read moreread less

Journal Article•DOI•

A human gut bacterial genome and culture collection for improved metagenomic analyses.

[...]

Samuel C. Forster¹, Samuel C. Forster², Samuel C. Forster³, Nitin Kumar², Blessing O. Anonye², Blessing O. Anonye⁴, Alexandre Almeida², Alexandre Almeida⁵, Elisa Viciani², Mark D. Stares², Matthew Dunn², Tapoka T. Mkandawire², Ana Zhu², Yan Shao², Lindsay J. Pike², Thomas J. Louie⁶, Hilary P. Browne², Alex L. Mitchell⁵, B. Anne Neville², Robert D. Finn⁵, Trevor D. Lawley² - Show less +17 more•Institutions (6)

Hudson Institute of Medical Research¹, Wellcome Trust Sanger Institute², Monash University, Clayton campus³, University of Warwick⁴, European Bioinformatics Institute⁵, University of Calgary⁶

04 Feb 2019-Nature Biotechnology

TL;DR: The improved resource of gastrointestinal bacterial reference sequences circumvents dependence on de novo assembly of metagenomes and enables accurate and cost-effective shotgun metagenomic analyses of human gastrointestinal microbiota.

...read moreread less

Abstract: Understanding gut microbiome functions requires cultivated bacteria for experimental validation and reference bacterial genome sequences to interpret metagenome datasets and guide functional analyses. We present the Human Gastrointestinal Bacteria Culture Collection (HBC), a comprehensive set of 737 whole-genome-sequenced bacterial isolates, representing 273 species (105 novel species) from 31 families found in the human gastrointestinal microbiota. The HBC increases the number of bacterial genomes derived from human gastrointestinal microbiota by 37%. The resulting global Human Gastrointestinal Bacteria Genome Collection (HGG) classifies 83% of genera by abundance across 13,490 shotgun-sequenced metagenomic samples, improves taxonomic classification by 61% compared to the Human Microbiome Project (HMP) genome collection and achieves subspecies-level classification for almost 50% of sequences. The improved resource of gastrointestinal bacterial reference sequences circumvents dependence on de novo assembly of metagenomes and enables accurate and cost-effective shotgun metagenomic analyses of human gastrointestinal microbiota.

...read moreread less

Collapse