Showing papers in "Genome Biology in 2018"

PDF

Open Access

Journal Article•DOI•

SCANPY: large-scale single-cell gene expression data analysis

[...]

F. Alexander Wolf, Philipp Angerer, Fabian J. Theis¹•Institutions (1)

06 Feb 2018-Genome Biology

TL;DR: This work presents Scanpy, a scalable toolkit for analyzing single-cell gene expression data that includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks, and AnnData, a generic class for handling annotated data matrices.

...read moreread less

Abstract: Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells ( https://github.com/theislab/Scanpy ). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices ( https://github.com/theislab/anndata ).

...read moreread less

3,343 citations

Journal Article•DOI•

RNA virus interference via CRISPR/Cas13a system in plants.

[...]

Rashid Aman¹, Zahir Ali¹, Haroon Butt¹, Ahmed Mahas¹, Fatimah R. Aljedaani¹, Muhammad Zhuhaib Khan¹, Shou-Wei Ding², Magdy M. Mahfouz¹ - Show less +4 more•Institutions (2)

King Abdullah University of Science and Technology¹, University of California, Riverside²

04 Jan 2018-Genome Biology

TL;DR: The data indicate that CRISPR/Cas13a can be used for engineering interference againstRNA viruses, providing a potential novel mechanism for RNA-guided immunity against RNA viruses and for other RNA manipulations in plants.

...read moreread less

Abstract: CRISPR/Cas systems confer immunity against invading nucleic acids and phages in bacteria and archaea. CRISPR/Cas13a (known previously as C2c2) is a class 2 type VI-A ribonuclease capable of targeting and cleaving single-stranded RNA (ssRNA) molecules of the phage genome. Here, we employ CRISPR/Cas13a to engineer interference with an RNA virus, Turnip Mosaic Virus (TuMV), in plants. CRISPR/Cas13a produces interference against green fluorescent protein (GFP)-expressing TuMV in transient assays and stable overexpression lines of Nicotiana benthamiana. CRISPR RNA (crRNAs) targeting the HC-Pro and GFP sequences exhibit better interference than those targeting other regions such as coat protein (CP) sequence. Cas13a can also process pre-crRNAs into functional crRNAs. Our data indicate that CRISPR/Cas13a can be used for engineering interference against RNA viruses, providing a potential novel mechanism for RNA-guided immunity against RNA viruses and for other RNA manipulations in plants.

...read moreread less

771 citations

Journal Article•DOI•

Ten things you should know about transposable elements

[...]

Guillaume Bourque¹, Kathleen H. Burns², Mary Gehring³, Vera Gorbunova⁴, Andrei Seluanov⁴, Molly Hammell⁵, Michael Imbeault⁶, Zsuzsanna Izsvák⁷, Henry L. Levin⁸, Todd S. Macfarlan⁸, Dixie L. Mager⁹, Cédric Feschotte¹⁰ - Show less +8 more•Institutions (10)

McGill University¹, Johns Hopkins University School of Medicine², Massachusetts Institute of Technology³, University of Rochester⁴, Cold Spring Harbor Laboratory⁵, University of Cambridge⁶, Max Delbrück Center for Molecular Medicine⁷, National Institutes of Health⁸, University of British Columbia⁹, Cornell University¹⁰

19 Nov 2018-Genome Biology

TL;DR: The fundamental properties of TEs and their complex interactions with their cellular environment are introduced, which are crucial to understanding their impact and manifold consequences for organismal biology.

...read moreread less

Abstract: Transposable elements (TEs) are major components of eukaryotic genomes. However, the extent of their impact on genome evolution, function, and disease remain a matter of intense interrogation. The rise of genomics and large-scale functional assays has shed new light on the multi-faceted activities of TEs and implies that they should no longer be marginalized. Here, we introduce the fundamental properties of TEs and their complex interactions with their cellular environment, which are crucial to understanding their impact and manifold consequences for organismal biology. While we draw examples primarily from mammalian systems, the core concepts outlined here are relevant to a broad range of organisms.

...read moreread less

691 citations

Journal Article•DOI•

Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics

[...]

Marlon Stoeckius¹, Shiwei Zheng, Brian Houck-Loomis¹, Stephanie Hao¹, Bertrand Z. Yeung, William M. Mauck, Peter Smibert¹, Rahul Satija - Show less +4 more•Institutions (1)

Harvard University¹

19 Dec 2018-Genome Biology

TL;DR: Cell Hashing is introduced, where oligo-tagged antibodies against ubiquitously expressed surface proteins uniquely label cells from distinct samples, which can be subsequently pooled and can robustly identify cross-sample multiplets.

...read moreread less

Abstract: Despite rapid developments in single cell sequencing, sample-specific batch effects, detection of cell multiplets, and experimental costs remain outstanding challenges. Here, we introduce Cell Hashing, where oligo-tagged antibodies against ubiquitously expressed surface proteins uniquely label cells from distinct samples, which can be subsequently pooled. By sequencing these tags alongside the cellular transcriptome, we can assign each cell to its original sample, robustly identify cross-sample multiplets, and “super-load” commercial droplet-based systems for significant cost reduction. We validate our approach using a complementary genetic approach and demonstrate how hashing can generalize the benefits of single cell multiplexing to diverse samples and experimental designs.

...read moreread less

608 citations

Journal Article•DOI•

HiGlass: web-based visual exploration and analysis of genome interaction maps

[...]

Peter Kerpedjiev¹, Nezar Abdennur², Fritz Lekschas¹, Chuck McCallum¹, Kasper Dinkla¹, Hendrik Strobelt¹, Jacob M. Luber¹, Scott Ouellette¹, Alaleh Azhir¹, Nikhil Kumar¹, Jeewon Hwang¹, Soohyun Lee¹, Burak H. Alver¹, Hanspeter Pfister¹, Leonid A. Mirny², Peter J. Park¹, Nils Gehlenborg¹ - Show less +13 more•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

24 Aug 2018-Genome Biology

TL;DR: HiGlass is presented, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others.

...read moreread less

Abstract: We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.

...read moreread less

569 citations

Journal Article•DOI•

SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation.

[...]

Bogdan Budnik¹, Ezra Levy², Guillaume Harmange², Nikolai Slavov²•Institutions (2)

Harvard University¹, Northeastern University²

22 Oct 2018-Genome Biology

TL;DR: This work develops Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and validate its ability to identify distinct human cancer cell types based on their proteomes and uses it to quantify over a thousand proteins in differentiating mouse embryonic stem cells.

...read moreread less

Abstract: Some exciting biological questions require quantifying thousands of proteins in single cells. To achieve this goal, we develop Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) and validate its ability to identify distinct human cancer cell types based on their proteomes. We use SCoPE-MS to quantify over a thousand proteins in differentiating mouse embryonic stem cells. The single-cell proteomes enable us to deconstruct cell populations and infer protein abundance relationships. Comparison between single-cell proteomes and transcriptomes indicates coordinated mRNA and protein covariation, yet many genes exhibit functionally concerted and distinct regulatory patterns at the mRNA and the protein level.

...read moreread less

508 citations

Journal Article•DOI•

From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.

[...]

Franka J. Rang¹, Wigard P. Kloosterman¹, Jeroen de Ridder¹•Institutions (1)

Utrecht University¹

13 Jul 2018-Genome Biology

TL;DR: Computational approaches determining the nanopore sequencing error rate are reviewed, and strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences are outlined.

...read moreread less

Abstract: Nanopore sequencing is a rapidly maturing technology delivering long reads in real time on a portable instrument at low cost. Not surprisingly, the community has rapidly taken up this new way of sequencing and has used it successfully for a variety of research applications. A major limitation of nanopore sequencing is its high error rate, which despite recent improvements to the nanopore chemistry and computational tools still ranges between 5% and 15%. Here, we review computational approaches determining the nanopore sequencing error rate. Furthermore, we outline strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences.

...read moreread less

451 citations

Journal Article•DOI•

The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions.

[...]

Yanli Wang¹, Fan Song¹, Bo Zhang¹, Lijun Zhang, Jie Xu, Da Kuang², Daofeng Li³, Mayank N. K. Choudhary³, Yun Li⁴, Ming Hu⁵, Ross C. Hardison¹, Ting Wang³, Feng Yue¹ - Show less +9 more•Institutions (5)

Pennsylvania State University¹, University of Pennsylvania², Washington University in St. Louis³, University of North Carolina at Chapel Hill⁴, Cleveland Clinic⁵

04 Oct 2018-Genome Biology

TL;DR: The 3D Genome Browser is introduced, which provides multiple methods linking distal cis-regulatory elements with their potential target genes and a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds.

...read moreread less

Abstract: Here, we introduce the 3D Genome Browser, http://3dgenome.org , which allows users to conveniently explore both their own and over 300 publicly available chromatin interaction data of different types. We design a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds. Our browser provides multiple methods linking distal cis-regulatory elements with their potential target genes. Users can seamlessly integrate thousands of other omics data to gain a comprehensive view of both regulatory landscape and 3D genome structure.

...read moreread less

390 citations

Journal Article•DOI•

Expanded base editing in rice and wheat using a Cas9-adenosine deaminase fusion

[...]

Chao Li¹, Yuan Zong¹, Yanpeng Wang¹, Shuai Jin¹, Dingbo Zhang¹, Qianna Song¹, Rui Zhang¹, Caixia Gao¹ - Show less +4 more•Institutions (1)

Chinese Academy of Sciences¹

29 May 2018-Genome Biology

TL;DR: A new plant adenine base editor based on an evolved tRNA adenosine deaminase fused to the nickase CRISPR/Cas9 is described, enabling A•T to G•C conversion at frequencies up to 7.5% in protoplasts and 59.1% in regenerated rice and wheat plants.

...read moreread less

Abstract: Nucleotide base editors in plants have been limited to conversion of cytosine to thymine. Here, we describe a new plant adenine base editor based on an evolved tRNA adenosine deaminase fused to the nickase CRISPR/Cas9, enabling A•T to G•C conversion at frequencies up to 7.5% in protoplasts and 59.1% in regenerated rice and wheat plants. An endogenous gene is also successfully modified through introducing a gain-of-function point mutation to directly produce an herbicide-tolerant rice plant. With this new adenine base editing system, it is now possible to precisely edit all base pairs, thus expanding the toolset for precise editing in plants.

...read moreread less

343 citations

Journal Article•DOI•

SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions

[...]

Juan L. Trincado¹, Juan Carlos Entizne², Gerald Hysenaj³, Babita Singh¹, Miha Skalic¹, David Elliott³, Eduardo Eyras¹, Eduardo Eyras⁴ - Show less +4 more•Institutions (4)

Pompeu Fabra University¹, University of Dundee², Newcastle University³, Catalan Institution for Research and Advanced Studies⁴

23 Mar 2018-Genome Biology

TL;DR: This work uses SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.

...read moreread less

Abstract: Despite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.

...read moreread less

328 citations

Journal Article•DOI•

SKESA: strategic k-mer extension for scrupulous assemblies.

[...]

Alexandre Souvorov¹, Richa Agarwala¹, David Lipman¹•Institutions (1)

United States Department of Health and Human Services¹

04 Oct 2018-Genome Biology

TL;DR: Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources.

...read moreread less

Abstract: SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases .

...read moreread less

Journal Article•DOI•

DeepCRISPR: optimized CRISPR guide RNA design by deep learning

[...]

Guohui Chuai¹, Hanhui Ma², Jifang Yan¹, Ming Chen³, Nanfang Hong¹, Dongyu Xue¹, Chi Zhou¹, Chenyu Zhu¹, Chen Ke¹, Bin Duan¹, Feng Gu⁴, Sheng Qu¹, Deshuang Huang¹, Jia Wei³, Qi Liu¹ - Show less +11 more•Institutions (4)

Tongji University¹, ShanghaiTech University², AstraZeneca³, Wenzhou Medical College⁴

26 Jun 2018-Genome Biology

TL;DR: DeepCRISPR is presented, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools.

...read moreread less

Abstract: A major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity. Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. In addition, DeepCRISPR fully automates the identification of sequence and epigenetic features that may affect sgRNA knockout efficacy in a data-driven manner. DeepCRISPR is available at http://www.deepcrispr.net/ .

...read moreread less

Journal Article•DOI•

A novel FLI1 exonic circular RNA promotes metastasis in breast cancer by coordinately regulating TET1 and DNMT1

[...]

Naifei Chen¹, Gang Zhao¹, Xu Yan¹, Zheng Lv¹, Hongmei Yin¹, Shilin Zhang¹, Shilin Zhang², Wei Song¹, Xueli Li², Xueli Li¹, Lingyu Li¹, Zhonghua Du¹, Lin Jia², Lin Jia¹, Lei Zhou¹, Wei Li¹, Andrew R. Hoffman², Ji-Fan Hu², Ji-Fan Hu¹, Jiuwei Cui¹ - Show less +16 more•Institutions (2)

Jilin University¹, Stanford University²

11 Dec 2018-Genome Biology

TL;DR: Data suggest that FECR1 circular RNA acts as an upstream regulator to control breast cancer tumor growth by coordinating the regulation of DNA methylating and demethylating enzymes.

...read moreread less

Abstract: Friend leukemia virus integration 1 (FLI1), an ETS transcription factor family member, acts as an oncogenic driver in hematological malignancies and promotes tumor growth in solid tumors. However, little is known about the mechanisms underlying the activation of this proto-oncogene in tumors. Immunohistochemical staining showed that FLI1 is aberrantly overexpressed in advanced stage and metastatic breast cancers. Using a CRISPR Cas9-guided immunoprecipitation assay, we identify a circular RNA in the FLI1 promoter chromatin complex, consisting of FLI1 exons 4-2-3, referred to as FECR1.Overexpression of FECR1 enhances invasiveness of MDA-MB231 breast cancer cells. Notably, FECR1 utilizes a positive feedback mechanism to activate FLI1 by inducing DNA hypomethylation in CpG islands of the promoter. FECR1 binds to the FLI1 promoter in cis and recruits TET1, a demethylase that is actively involved in DNA demethylation. FECR1 also binds to and downregulates in trans DNMT1, a methyltransferase that is essential for the maintenance of DNA methylation. These data suggest that FECR1 circular RNA acts as an upstream regulator to control breast cancer tumor growth by coordinating the regulation of DNA methylating and demethylating enzymes. Thus, FLI1 drives tumor metastasis not only through the canonical oncoprotein pathway, but also by using epigenetic mechanisms mediated by its exonic circular RNA.

...read moreread less

Journal Article•DOI•

Applications and potential of genome editing in crop improvement

[...]

Yi Zhang¹, Karen Massel², Ian D. Godwin², Caixia Gao³•Institutions (3)

Shandong Normal University¹, University of Queensland², Chinese Academy of Sciences³

30 Nov 2018-Genome Biology

TL;DR: The current applications of genome editing in plants are described, focusing on its potential for crop improvement in terms of adaptation, resilience, and end-use, and novel breakthroughs that are extending the potential of genome-edited crops and the possibilities of their commercialization are reviewed.

...read moreread less

Abstract: Genome-editing tools provide advanced biotechnological techniques that enable the precise and efficient targeted modification of an organism’s genome. Genome-editing systems have been utilized in a wide variety of plant species to characterize gene functions and improve agricultural traits. We describe the current applications of genome editing in plants, focusing on its potential for crop improvement in terms of adaptation, resilience, and end-use. In addition, we review novel breakthroughs that are extending the potential of genome-edited crops and the possibilities of their commercialization. Future prospects for integrating this revolutionary technology with conventional and new-age crop breeding strategies are also discussed.

...read moreread less

Journal Article•DOI•

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

[...]

Mihaela Pertea¹, Mihaela Pertea², Alaina Shumate¹, Alaina Shumate², Geo Pertea¹, Ales Varabyou², Ales Varabyou¹, Florian P. Breitwieser¹, Yu Chi Chang², Anil K. Madugundu, Akhilesh Pandey³, Akhilesh Pandey¹, Steven L. Salzberg - Show less +9 more•Institutions (3)

Johns Hopkins University School of Medicine¹, Johns Hopkins University², Mayo Clinic³

28 Nov 2018-Genome Biology

TL;DR: The sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project are assembled to create a new catalog of human genes and transcripts, called CHESS, revealing a heretofore unappreciated amount of transcriptional noise in human cells.

...read moreread less

Abstract: We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .

...read moreread less

Journal Article•DOI•

KrakenUniq: confident and fast metagenomics classification using unique k -mer counts

[...]

Florian P. Breitwieser¹, Dannon Baker¹, Steven L. Salzberg¹•Institutions (1)

Johns Hopkins University¹

16 Nov 2018-Genome Biology

TL;DR: KrakenUniq is a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset by using the probabilistic cardinality estimator HyperLogLog.

...read moreread less

Abstract: False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq .

...read moreread less

Journal Article•DOI•

Ythdf2-mediated m6A mRNA clearance modulates neural development in mice

[...]

Miaomiao Li¹, Miaomiao Li², Xu Zhao¹, Xu Zhao², Wei Wang³, Hailing Shi⁴, Hailing Shi⁵, Qingfei Pan⁶, Zhike Lu⁴, Zhike Lu⁵, Sonia Peña Perez¹, Rajikala Suganthan¹, Chuan He⁵, Chuan He⁴, Magnar Bjørås¹, Magnar Bjørås³, Arne Klungland², Arne Klungland¹ - Show less +14 more•Institutions (6)

Oslo University Hospital¹, University of Oslo², Norwegian University of Science and Technology³, University of Chicago⁴, Howard Hughes Medical Institute⁵, St. Jude Children's Research Hospital⁶

31 May 2018-Genome Biology

TL;DR: It is demonstrated that neural stem/progenitor cell (NSPC) self-renewal and spatiotemporal generation of neurons and other cell types are severely impacted by the loss of Ythdf2 in embryonic neocortex.

...read moreread less

Abstract: N 6 -methyladenosine (m6A) modification in mRNAs was recently shown to be dynamically regulated, indicating a pivotal role in multiple developmental processes. Most recently, it was shown that the Mettl3-Mettl14 writer complex of this mark is required for the temporal control of cortical neurogenesis. The m6A reader protein Ythdf2 promotes mRNA degradation by recognizing m6A and recruiting the mRNA decay machinery. We show that the conditional depletion of the m6A reader protein Ythdf2 in mice causes lethality at late embryonic developmental stages, with embryos characterized by compromised neural development. We demonstrate that neural stem/progenitor cell (NSPC) self-renewal and spatiotemporal generation of neurons and other cell types are severely impacted by the loss of Ythdf2 in embryonic neocortex. Combining in vivo and in vitro assays, we show that the proliferation and differentiation capabilities of NSPCs decrease significantly in Ythdf2 −/− embryos. The Ythdf2 −/− neurons are unable to produce normally functioning neurites, leading to failure in recovery upon reactive oxygen species stimulation. Consistently, expression of genes enriched in neural development pathways is significantly disturbed. Detailed analysis of the m6A-methylomes of Ythdf2 −/− NSPCs identifies that the JAK-STAT cascade inhibitory genes contribute to neuroprotection and neurite outgrowths show increased expression and m6A enrichment. In agreement with the function of Ythdf2, delayed degradation of neuron differentiation-related m6A-containing mRNAs is seen in Ythdf2 −/− NSPCs. We show that the m6A reader protein Ythdf2 modulates neural development by promoting m6A-dependent degradation of neural development-related mRNA targets.

...read moreread less

Journal Article•DOI•

Interaction between the microbiome and TP53 in human lung cancer

[...]

K. Leigh Greathouse¹, K. Leigh Greathouse², James R. White, Ashely J. Vargas², Valery Bliskovsky², Jessica Beck², Natalia von Muhlinen², Eric C. Polley³, Elise D. Bowman², Mohammed A. Khan², Ana I. Robles², Tomer Cooks², Bríd M. Ryan², Noah Padgett¹, Amiran Dzutsev, Giorgio Trinchieri, Marbin Pineda², Sven Bilke², Paul S. Meltzer², Alexis N. Hokenstad³, Tricia M. Stickrod³, Marina Walther-Antonio³, Joshua P. Earl⁴, Joshua Chang Mell⁴, Jarosław E. Król⁴, Sergey Balashov⁴, Archana S. Bhat⁴, Garth D. Ehrlich⁴, Alex M. Valm², Clayton Deming², Sean Conlan², Julia Oh, Julie A. Segre², Curtis C. Harris² - Show less +30 more•Institutions (4)

Baylor University¹, National Institutes of Health², Mayo Clinic³, Drexel University⁴

24 Aug 2018-Genome Biology

TL;DR: This comprehensive study demonstrates a lower alpha diversity in normal lung as compared to non-tumor adjacent or tumor tissue, and shows both microbiome-gene and microbiome-exposure interactions in squamous cell carcinoma lung cancer tissue.

...read moreread less

Abstract: Lung cancer is the leading cancer diagnosis worldwide and the number one cause of cancer deaths. Exposure to cigarette smoke, the primary risk factor in lung cancer, reduces epithelial barrier integrity and increases susceptibility to infections. Herein, we hypothesize that somatic mutations together with cigarette smoke generate a dysbiotic microbiota that is associated with lung carcinogenesis. Using lung tissue from 33 controls and 143 cancer cases, we conduct 16S ribosomal RNA (rRNA) bacterial gene sequencing, with RNA-sequencing data from lung cancer cases in The Cancer Genome Atlas serving as the validation cohort. Overall, we demonstrate a lower alpha diversity in normal lung as compared to non-tumor adjacent or tumor tissue. In squamous cell carcinoma specifically, a separate group of taxa are identified, in which Acidovorax is enriched in smokers. Acidovorax temporans is identified within tumor sections by fluorescent in situ hybridization and confirmed by two separate 16S rRNA strategies. Further, these taxa, including Acidovorax, exhibit higher abundance among the subset of squamous cell carcinoma cases with TP53 mutations, an association not seen in adenocarcinomas. The results of this comprehensive study show both microbiome-gene and microbiome-exposure interactions in squamous cell carcinoma lung cancer tissue. Specifically, tumors harboring TP53 mutations, which can impair epithelial function, have a unique bacterial consortium that is higher in relative abundance in smoking-associated tumors of this type. Given the significant need for clinical diagnostic tools in lung cancer, this study may provide novel biomarkers for early detection.

...read moreread less

Journal Article•DOI•

An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray.

[...]

Lucas A. Salas¹, Devin C. Koestler², Rondi A. Butler³, Helen M. Hansen⁴, John K. Wiencke⁴, Karl T. Kelsey³, Brock C. Christensen¹ - Show less +3 more•Institutions (4)

Dartmouth College¹, University of Kansas², Brown University³, University of California, San Francisco⁴

29 May 2018-Genome Biology

TL;DR: Three approaches to select reference libraries for deconvoluting neutrophil, monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cell fractions based on blood-derived DNA methylation signatures assayed using the Illumina HumanMethylationEPIC array are compared.

...read moreread less

Abstract: Genome-wide methylation arrays are powerful tools for assessing cell composition of complex mixtures. We compare three approaches to select reference libraries for deconvoluting neutrophil, monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cell fractions based on blood-derived DNA methylation signatures assayed using the Illumina HumanMethylationEPIC array. The IDOL algorithm identifies a library of 450 CpGs, resulting in an average R2 = 99.2 across cell types when applied to EPIC methylation data collected on artificial mixtures constructed from the above cell types. Of the 450 CpGs, 69% are unique to EPIC. This library has the potential to reduce unintended technical differences across array platforms.

...read moreread less

Journal Article•DOI•

Impact of transposable elements on genome structure and evolution in bread wheat.

[...]

Thomas Wicker¹, Heidrun Gundlach, Manuel Spannagl, Cristobal Uauy², Philippa Borrill², Ricardo H. Ramirez-Gonzalez², Romain De Oliveira³, Klaus F. X. Mayer⁴, Etienne Paux³, Frédéric Choulet³ - Show less +6 more•Institutions (4)

University of Zurich¹, Norwich Research Park², University of Auvergne³, Technische Universität München⁴

17 Aug 2018-Genome Biology

TL;DR: Even though the intergenic space is changed by the TE turnover, an unexpected preservation is observed between the A, B, and D subgenomes for features like TE family proportions, gene spacing, and TE enrichment near genes.

...read moreread less

Abstract: Transposable elements (TEs) are major components of large plant genomes and main drivers of genome evolution. The most recent assembly of hexaploid bread wheat recovered the highly repetitive TE space in an almost complete chromosomal context and enabled a detailed view into the dynamics of TEs in the A, B, and D subgenomes. The overall TE content is very similar between the A, B, and D subgenomes, although we find no evidence for bursts of TE amplification after the polyploidization events. Despite the near-complete turnover of TEs since the subgenome lineages diverged from a common ancestor, 76% of TE families are still present in similar proportions in each subgenome. Moreover, spacing between syntenic genes is also conserved, even though syntenic TEs have been replaced by new insertions over time, suggesting that distances between genes, but not sequences, are under evolutionary constraints. The TE composition of the immediate gene vicinity differs from the core intergenic regions. We find the same TE families to be enriched or depleted near genes in all three subgenomes. Evaluations at the subfamily level of timed long terminal repeat-retrotransposon insertions highlight the independent evolution of the diploid A, B, and D lineages before polyploidization and cases of concerted proliferation in the AB tetraploid. Even though the intergenic space is changed by the TE turnover, an unexpected preservation is observed between the A, B, and D subgenomes for features like TE family proportions, gene spacing, and TE enrichment near genes.

...read moreread less

Journal Article•DOI•

A large-scale whole-genome sequencing analysis reveals highly specific genome editing by both Cas9 and Cpf1 (Cas12a) nucleases in rice.

[...]

Xu Tang¹, Guanqing Liu², Jianping Zhou¹, Qiurong Ren¹, Qi You², Li Tian¹, Xuhui Xin¹, Zhaohui Zhong¹, Binglin Liu¹, Xuelian Zheng¹, Dengwei Zhang¹, Aimee Malzahn³, Zhiyun Gong², Yiping Qi³, Tao Zhang², Zhang Yong¹ - Show less +12 more•Institutions (3)

University of Electronic Science and Technology of China¹, Yangzhou University², University of Maryland, College Park³

04 Jul 2018-Genome Biology

TL;DR: A comprehensive and rigorous analysis of WGS data across multiple sample types suggests both Cas9 and Cpf1 nucleases are very specific in generating targeted DNA modifications and off-targeting can be avoided by designing guide RNAs with high specificity.

...read moreread less

Abstract: Targeting specificity has been a barrier to applying genome editing systems in functional genomics, precise medicine and plant breeding. In plants, only limited studies have used whole-genome sequencing (WGS) to test off-target effects of Cas9. The cause of numerous discovered mutations is still controversial. Furthermore, WGS-based off-target analysis of Cpf1 (Cas12a) has not been reported in any higher organism to date. We conduct a WGS analysis of 34 plants edited by Cas9 and 15 plants edited by Cpf1 in T0 and T1 generations along with 20 diverse control plants in rice. The sequencing depths range from 45× to 105× with read mapping rates above 96%. Our results clearly show that most mutations in edited plants are created by the tissue culture process, which causes approximately 102 to 148 single nucleotide variations (SNVs) and approximately 32 to 83 insertions/deletions (indels) per plant. Among 12 Cas9 single guide RNAs (sgRNAs) and three Cpf1 CRISPR RNAs (crRNAs) assessed by WGS, only one Cas9 sgRNA resulted in off-target mutations in T0 lines at sites predicted by computer programs. Moreover, we cannot find evidence for bona fide off-target mutations due to continued expression of Cas9 or Cpf1 with guide RNAs in T1 generation. Our comprehensive and rigorous analysis of WGS data across multiple sample types suggests both Cas9 and Cpf1 nucleases are very specific in generating targeted DNA modifications and off-targeting can be avoided by designing guide RNAs with high specificity.

...read moreread less

Journal Article•DOI•

A novel long noncoding RNA HOXC-AS3 mediates tumorigenesis of gastric cancer by binding to YBX1

[...]

Erbao Zhang¹, Xuezhi He¹, Chongguo Zhang¹, Jun Su², Xiyi Lu¹, Xinxin Si³, Jinfei Chen¹, Dandan Yin², Liang Han⁴, Liang Han², Wei De¹ - Show less +7 more•Institutions (4)

Nanjing Medical University¹, Southeast University², Huaihai Institute of Technology³, Xuzhou Medical College⁴

04 Oct 2018-Genome Biology

TL;DR: The data demonstrate that abnormal histone modification-activated HOXC-AS3 may play important roles in gastric cancer oncogenesis and may serve as a target for Gastric cancer diagnosis and therapy.

...read moreread less

Abstract: Recently, increasing evidence shows that long noncoding RNAs (lncRNAs) play a significant role in human tumorigenesis. However, the function of lncRNAs in human gastric cancer remains largely unknown. By using publicly available expression profiling data from gastric cancer and integrating bioinformatics analyses, we screen and identify a novel lncRNA, HOXC-AS3. HOXC-AS3 is significantly increased in gastric cancer tissues and is correlated with clinical outcomes of gastric cancer. In addition, HOXC-AS3 regulates cell proliferation and migration both in vitro and in vivo. RNA-seq analysis reveals that HOXC-AS3 knockdown preferentially affects genes that are linked to proliferation and migration. Mechanistically, we find that HOXC-AS3 is obviously activated by gain of H3K4me3 and H3K27ac, both in cells and in tissues. RNA pull-down mass spectrometry analysis identifies that YBX1 interacts with HOXC-AS3, and RNA-seq analysis finds a marked overlap in genes differentially expressed after YBX1 knockdown and those transcriptionally regulated by HOXC-AS3, suggesting that YBX1 participates in HOXC-AS3-mediated gene transcriptional regulation in the tumorigenesis of gastric cancer. Together, our data demonstrate that abnormal histone modification-activated HOXC-AS3 may play important roles in gastric cancer oncogenesis and may serve as a target for gastric cancer diagnosis and therapy.

...read moreread less

Journal Article•DOI•

RNA m 6 A methylation participates in regulation of postnatal development of the mouse cerebellum

[...]

Chunhui Ma¹, Mengqi Chang¹, Hongyi Lv², Hongyi Lv³, Zhi-Wei Zhang¹, Weilong Zhang¹, Xue He¹, Gaolang Wu¹, Shunli Zhao¹, Yao Zhang¹, Di Wang¹, Xufei Teng², Xufei Teng³, C Liu¹, Qing Li¹, Arne Klungland⁴, Arne Klungland⁵, Yamei Niu¹, Shuhui Song³, Wei-Min Tong¹ - Show less +16 more•Institutions (5)

Peking Union Medical College¹, Chinese Academy of Sciences², Beijing Institute of Genomics³, University of Oslo⁴, Oslo University Hospital⁵

31 May 2018-Genome Biology

TL;DR: Findings provide strong evidence that RNA m6A methylation is controlled in a precise spatiotemporal manner and participates in the regulation of postnatal development of the mouse cerebellum.

...read moreread less

Abstract: N6-methyladenosine (m6A) is an important epitranscriptomic mark with high abundance in the brain. Recently, it has been found to be involved in the regulation of memory formation and mammalian cortical neurogenesis. However, while it is now established that m6A methylation occurs in a spatially restricted manner, its functions in specific brain regions still await elucidation. We identify widespread and dynamic RNA m6A methylation in the developing mouse cerebellum and further uncover distinct features of continuous and temporal-specific m6A methylation across the four postnatal developmental processes. Temporal-specific m6A peaks from P7 to P60 exhibit remarkable changes in their distribution patterns along the mRNA transcripts. We also show spatiotemporal-specific expression of m6A writers METTL3, METTL14, and WTAP and erasers ALKBH5 and FTO in the mouse cerebellum. Ectopic expression of METTL3 mediated by lentivirus infection leads to disorganized structure of both Purkinje and glial cells. In addition, under hypobaric hypoxia exposure, Alkbh5-deletion causes abnormal cell proliferation and differentiation in the cerebellum through disturbing the balance of RNA m6A methylation in different cell fate determination genes. Notably, nuclear export of the hypermethylated RNAs is enhanced in the cerebellum of Alkbh5-deficient mice exposed to hypobaric hypoxia. Together, our findings provide strong evidence that RNA m6A methylation is controlled in a precise spatiotemporal manner and participates in the regulation of postnatal development of the mouse cerebellum.

...read moreread less

Journal Article•DOI•

Conservation of biodiversity in the genomics era

[...]

Megan A. Supple¹, Beth Shapiro¹•Institutions (1)

University of California, Santa Cruz¹

11 Sep 2018-Genome Biology

TL;DR: How genome-scale data can inform species delineation in the face of admixture, facilitate evolution through the identification of adaptive alleles, and enhance evolutionary rescue based on genomic patterns of inbreeding are discussed.

...read moreread less

Abstract: “Conservation genomics” encompasses the idea that genome-scale data will improve the capacity of resource managers to protect species. Although genetic approaches have long been used in conservation research, it has only recently become tractable to generate genome-wide data at a scale that is useful for conservation. In this Review, we discuss how genome-scale data can inform species delineation in the face of admixture, facilitate evolution through the identification of adaptive alleles, and enhance evolutionary rescue based on genomic patterns of inbreeding. As genomic approaches become more widely adopted in conservation, we expect that they will have a positive impact on management and policy decisions.

...read moreread less

Journal Article•DOI•

Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data

[...]

Nelly Olova¹, Nelly Olova², Felix Krueger¹, Simon Andrews¹, David Oxley¹, Rebecca V. Berrens¹, Miguel R. Branco³, Wolf Reik⁴, Wolf Reik¹, Wolf Reik⁵ - Show less +6 more•Institutions (5)

Babraham Institute¹, University of Edinburgh², Queen Mary University of London³, University of Cambridge⁴, Wellcome Trust Sanger Institute⁵

15 Mar 2018-Genome Biology

TL;DR: It is shown that amplification-free library preparation is the least biased approach for WGBS, and in protocols with amplification, the choice of bisulfite conversion protocol or polymerase can significantly minimize artefacts.

...read moreread less

Abstract: Whole-genome bisulfite sequencing (WGBS) is becoming an increasingly accessible technique, used widely for both fundamental and disease-oriented research. Library preparation methods benefit from a variety of available kits, polymerases and bisulfite conversion protocols. Although some steps in the procedure, such as PCR amplification, are known to introduce biases, a systematic evaluation of biases in WGBS strategies is missing. We perform a comparative analysis of several commonly used pre- and post-bisulfite WGBS library preparation protocols for their performance and quality of sequencing outputs. Our results show that bisulfite conversion per se is the main trigger of pronounced sequencing biases, and PCR amplification builds on these underlying artefacts. The majority of standard library preparation methods yield a significantly biased sequence output and overestimate global methylation. Importantly, both absolute and relative methylation levels at specific genomic regions vary substantially between methods, with clear implications for DNA methylation studies. We show that amplification-free library preparation is the least biased approach for WGBS. In protocols with amplification, the choice of bisulfite conversion protocol or polymerase can significantly minimize artefacts. To aid with the quality assessment of existing WGBS datasets, we have integrated a bias diagnostic tool in the Bismark package and offer several approaches for consideration during the preparation and analysis of WGBS datasets.

...read moreread less

Journal Article•DOI•

Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data.

[...]

Michael Alaux¹, Jane Rogers, Thomas Letellier¹, Raphael Flores¹, Françoise Alfama¹, Cyril Pommier¹, Nacer Mohellibi¹, Sophie Durand¹, Erik Kimmel¹, Célia Michotey¹, Claire Guerche¹, Mikaël Loaec¹, Mathilde Lainé¹, Delphine Steinbach¹, Frédéric Choulet², Hélène Rimbert², Philippe Leroy², Nicolas Guilhot², Jérôme Salse², Catherine Feuillet², Etienne Paux², Kellye Eversole, Anne-Françoise Adam-Blondon¹, Hadi Quesneville¹ - Show less +20 more•Institutions (2)

Université Paris-Saclay¹, University of Auvergne²

17 Aug 2018-Genome Biology

TL;DR: The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium.

...read moreread less

Abstract: The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium. Genome browsers, BLAST, and InterMine tools have been established for in-depth exploration of the genome sequence together with additional linked datasets including physical maps, sequence variations, gene expression, and genetic and phenomic data from other international collaborative projects already stored in the GnpIS information system. The portal provides enhanced search and browser features that will facilitate the deployment of the latest genomics resources in wheat improvement.

...read moreread less

Journal Article•DOI•

Comparison of computational methods for the identification of topologically associating domains

[...]

Marie Zufferey¹, Marie Zufferey², Daniele Tavernari¹, Daniele Tavernari², Elisa Oricchio³, Giovanni Ciriello¹, Giovanni Ciriello² - Show less +3 more•Institutions (3)

University of Lausanne¹, Swiss Institute of Bioinformatics², École Polytechnique Fédérale de Lausanne³

10 Dec 2018-Genome Biology

TL;DR: This study provides a reference for the analysis of chromatin domains from Hi-C experiments and useful guidelines for choosing a suitable approach based on the experimental design, available data, and biological question of interest.

...read moreread less

Abstract: Chromatin folding gives rise to structural elements among which are clusters of densely interacting DNA regions termed topologically associating domains (TADs). TADs have been characterized across multiple species, tissue types, and differentiation stages, sometimes in association with regulation of biological functions. The reliability and reproducibility of these findings are intrinsically related with the correct identification of these domains from high-throughput chromatin conformation capture (Hi-C) experiments. Here, we test and compare 22 computational methods to identify TADs across 20 different conditions. We find that TAD sizes and numbers vary significantly among callers and data resolutions, challenging the definition of an average TAD size, but strengthening the hypothesis that TADs are hierarchically organized domains, rather than disjoint structural elements. Performances of these methods differ based on data resolution and normalization strategy, but a core set of TAD callers consistently retrieve reproducible domains, even at low sequencing depths, that are enriched for TAD-associated biological features. This study provides a reference for the analysis of chromatin domains from Hi-C experiments and useful guidelines for choosing a suitable approach based on the experimental design, available data, and biological question of interest.

...read moreread less

Journal Article•DOI•

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.

[...]

Koen Van den Berge¹, Fanny Perraudeau², Charlotte Soneson³, Charlotte Soneson⁴, Michael I. Love⁵, Davide Risso⁶, Jean-Philippe Vert, Mark D. Robinson³, Mark D. Robinson⁴, Sandrine Dudoit², Lieven Clement¹ - Show less +7 more•Institutions (6)

Ghent University¹, University of California, Berkeley², Swiss Institute of Bioinformatics³, University of Zurich⁴, University of North Carolina at Chapel Hill⁵, Cornell University⁶

26 Feb 2018-Genome Biology

TL;DR: A weighting strategy is introduced, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero- inflated data, boosting performance for scRNA-seq.

...read moreread less

Abstract: Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

...read moreread less

Journal Article•DOI•

A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog

[...]

Joannella Morales¹, Danielle Welter¹, Emily H. Bowler¹, Maria Cerezo¹, Laura W. Harris¹, Aoife McMahon¹, Peggy Hall², Heather Junkins², Annalisa Milano¹, Emma Hastings¹, Cinzia Malangone¹, Annalisa Buniello¹, Tony Burdett¹, Paul Flicek¹, Helen Parkinson¹, Fiona Cunningham¹, Lucia A. Hindorff², Jacqueline A. L. MacArthur¹ - Show less +14 more•Institutions (2)

European Bioinformatics Institute¹, National Institutes of Health²

15 Feb 2018-Genome Biology

TL;DR: It is found that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations.

...read moreread less

Abstract: The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.

...read moreread less

Journal Article•DOI•

dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments

[...]

Viktor Petukhov¹, Viktor Petukhov², Jimin Guo², Ninib Baryawno², Nicolas Severe², David T. Scadden², Maria Samsonova¹, Peter V. Kharchenko² - Show less +4 more•Institutions (2)

Saint Petersburg State Polytechnic University¹, Harvard University²

19 Jun 2018-Genome Biology

TL;DR: A flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries is described.

...read moreread less

Abstract: Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.

...read moreread less

Collapse