scispace - formally typeset
Search or ask a question
Author

Andreas J. Gruber

Bio: Andreas J. Gruber is an academic researcher from University of Oxford. The author has contributed to research in topics: Polyadenylation & Untranslated region. The author has an hindex of 16, co-authored 31 publications receiving 1138 citations. Previous affiliations of Andreas J. Gruber include University of Basel & Dalhousie University.

Papers
More filters
Journal ArticleDOI
TL;DR: A methodology that models gene expression or chromatin modifications in terms of genome-wide predictions of regulatory sites and completely automated it into a web-based tool called ISMARA (Integrated System for Motif Activity Response Analysis), which consistently identifies known key regulators ab initio.
Abstract: Accurate reconstruction of the regulatory networks that control gene expression is one of the key current challenges in molecular biology. Although gene expression and chromatin state dynamics are ultimately encoded by constellations of binding sites recognized by regulators such as transcriptions factors (TFs) and microRNAs (miRNAs), our understanding of this regulatory code and its context-dependent read-out remains very limited. Given that there are thousands of potential regulators in mammals, it is not practical to use direct experimentation to identify which of these play a key role for a particular system of interest. We developed a methodology that models gene expression or chromatin modifications in terms of genome-wide predictions of regulatory sites, and completely automated it into a web-based tool called ISMARA (Integrated System for Motif Activity Response Analysis), located at http://ismara.unibas.ch. Given as input only gene expression or chromatin state data across a set of samples, ISMARA identifies the key TFs and miRNAs driving expression/chromatin changes and makes detailed predictions regarding their regulatory roles. These include predicted activities of the regulators across the samples, their genome-wide targets, enriched gene categories among the targets, and direct interactions between the regulators. Applying ISMARA to data sets from well-studied systems, we show that it consistently identifies known key regulators ab initio. We also present a number of novel predictions including regulatory interactions in innate immunity, a master regulator of mucociliary differentiation, TFs consistently disregulated in cancer, and TFs that mediate specific chromatin modifications.

279 citations

Journal ArticleDOI
TL;DR: The experimental and computational methods that have enabled the global mapping of mRNA and of long non-coding RNA 3ʹ ends, quantification of the resulting isoforms and the discovery of regulators of alternative cleavage and polyadenylation (APA) are reviewed.
Abstract: Most human genes have multiple sites at which RNA 3' end cleavage and polyadenylation can occur, enabling the expression of distinct transcript isoforms under different conditions. Novel methods to sequence RNA 3' ends have generated comprehensive catalogues of polyadenylation (poly(A)) sites; their analysis using innovative computational methods has revealed how poly(A) site choice is regulated by core RNA 3' end processing factors, such as cleavage factor I and cleavage and polyadenylation specificity factor, as well as by other RNA-binding proteins, particularly splicing factors. Here, we review the experimental and computational methods that have enabled the global mapping of mRNA and of long non-coding RNA 3' ends, quantification of the resulting isoforms and the discovery of regulators of alternative cleavage and polyadenylation (APA). We highlight the different types of APA-derived isoforms and their functional differences, and illustrate how APA contributes to human diseases, including cancer and haematological, immunological and neurological diseases.

250 citations

Journal ArticleDOI
TL;DR: This study establishes an up-to-date, high-confidence catalog of 3' end processing sites and poly(A) signals, and it uncovers an important role of HNRNPC in regulating 3'end processing, and suggests that U-rich elements mediate interactions with multiple RBPs that regulate different stages in a transcript's life cycle.
Abstract: Alternative polyadenylation (APA) is a general mechanism of transcript diversification in mammals, which has been recently linked to proliferative states and cancer. Different 3' untranslated region (3' UTR) isoforms interact with different RNA-binding proteins (RBPs), which modify the stability, translation, and subcellular localization of the corresponding transcripts. Although the heterogeneity of pre-mRNA 3' end processing has been established with high-throughput approaches, the mechanisms that underlie systematic changes in 3' UTR lengths remain to be characterized. Through a uniform analysis of a large number of 3' end sequencing data sets, we have uncovered 18 signals, six of which are novel, whose positioning with respect to pre-mRNA cleavage sites indicates a role in pre-mRNA 3' end processing in both mouse and human. With 3' end sequencing we have demonstrated that the heterogeneous ribonucleoprotein C (HNRNPC), which binds the poly(U) motif whose frequency also peaks in the vicinity of polyadenylation (poly(A)) sites, has a genome-wide effect on poly(A) site usage. HNRNPC-regulated 3' UTRs are enriched in ELAV-like RBP 1 (ELAVL1) binding sites and include those of the CD47 gene, which participate in the recently discovered mechanism of 3' UTR-dependent protein localization (UDPL). Our study thus establishes an up-to-date, high-confidence catalog of 3' end processing sites and poly(A) signals, and it uncovers an important role of HNRNPC in regulating 3' end processing. It further suggests that U-rich elements mediate interactions with multiple RBPs that regulate different stages in a transcript's life cycle.

177 citations

Journal ArticleDOI
TL;DR: 3' UTR shortening in proliferating cells is conserved between human and mouse, but orthologous genes do not exhibit similar expression of alternative 3' U TR isoforms, which suggests that although 3’ UTRShortening may lead to changes in the RNA-binding protein interactome, it has limited effects on protein output.
Abstract: Alternative polyadenylation is a cellular mechanism that generates mRNA isoforms differing in their 3' untranslated regions (3' UTRs). Changes in polyadenylation site usage have been described upon induction of proliferation in resting cells, but the underlying mechanism and functional significance of this phenomenon remain largely unknown. To understand the functional consequences of shortened 3' UTR isoforms in a physiological setting, we used 3' end sequencing and quantitative mass spectrometry to determine polyadenylation site usage, mRNA and protein levels in murine and human naive and activated T cells. Although 3' UTR shortening in proliferating cells is conserved between human and mouse, orthologous genes do not exhibit similar expression of alternative 3' UTR isoforms. We generally find that 3' UTR shortening is not accompanied by a corresponding change in mRNA and protein levels. This suggests that although 3' UTR shortening may lead to changes in the RNA-binding protein interactome, it has limited effects on protein output.

162 citations

Journal ArticleDOI
TL;DR: It is found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements.
Abstract: Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the level of individual transcript isoforms. To comparatively evaluate the accuracy of the many methods that have been proposed for estimating transcript isoform abundance from RNA sequencing data, we have used both synthetic data as well as an independent experimental method for quantifying the abundance of transcript ends at the genome-wide level. We found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements. Nucleotide composition and intron/exon structure have comparatively little influence on the accuracy of expression estimates, which correlates most strongly with transcript/gene expression levels. To facilitate the reproduction and further extension of our study, we provide datasets, source code, and an online analysis tool on a companion website, where developers can upload expression estimates obtained with their own tool to compare them to those inferred by the methods assessed here. As many methods for quantifying isoform abundance with comparable accuracy are available, a user’s choice will likely be determined by factors such as the memory and runtime requirements, as well as the availability of methods for downstream analyses. Sequencing-based methods to quantify the abundance of specific transcript regions could complement validation schemes based on synthetic data and quantitative PCR in future or ongoing assessments of RNA-seq analysis methods.

148 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is illustrated that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets.
Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

2,420 citations

01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

2,187 citations

Journal ArticleDOI
TL;DR: The R/Bioconductor package scater is developed to facilitate rigorous pre‐processing, quality control, normalization and visualization of scRNA‐seq data and provides a convenient, flexible workflow to process raw sequencing reads into a high‐quality expression dataset ready for downstream analysis.
Abstract: Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization.We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater .davis@ebi.ac.uk.Supplementary data are available at Bioinformatics online.

1,093 citations

Journal ArticleDOI
TL;DR: The roles of APA in diverse cellular processes, including mRNA metabolism, protein diversification and protein localization, and more generally in gene regulation are discussed, and the molecular mechanisms underlying APA are discussed.
Abstract: Alternative polyadenylation (APA) is an RNA-processing mechanism that generates distinct 3' termini on mRNAs and other RNA polymerase II transcripts. It is widespread across all eukaryotic species and is recognized as a major mechanism of gene regulation. APA exhibits tissue specificity and is important for cell proliferation and differentiation. In this Review, we discuss the roles of APA in diverse cellular processes, including mRNA metabolism, protein diversification and protein localization, and more generally in gene regulation. We also discuss the molecular mechanisms underlying APA, such as variation in the concentration of core processing factors and RNA-binding proteins, as well as transcription-based regulation.

758 citations

01 Jan 2009
TL;DR: In this article, a review outlines the current understanding of miRNA target recognition in animals and discusses the widespread impact of miRNAs on both the expression and evolution of protein-coding genes.
Abstract: MicroRNAs (miRNAs) are endogenous ∼23 nt RNAs that play important gene-regulatory roles in animals and plants by pairing to the mRNAs of protein-coding genes to direct their posttranscriptional repression. This review outlines the current understanding of miRNA target recognition in animals and discusses the widespread impact of miRNAs on both the expression and evolution of protein-coding genes.

646 citations