scispace - formally typeset
Search or ask a question

Showing papers on "Gene expression profiling published in 2013"


Journal ArticleDOI
TL;DR: This work introduces Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner and constitutes a starting point to build pathway-centric models of biology.
Abstract: Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org .

6,125 citations


Journal ArticleDOI
TL;DR: A method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples and prediction accuracy is corroborated using 3,809 transcriptional profiles available elsewhere in the public domain.
Abstract: Infiltrating stromal and immune cells form the major fraction of normal cells in tumour tissue and not only perturb the tumour signal in molecular studies but also have an important role in cancer biology. Here we describe 'Estimation of STromal and Immune cells in MAlignant Tumours using Expression data' (ESTIMATE)--a method that uses gene expression signatures to infer the fraction of stromal and immune cells in tumour samples. ESTIMATE scores correlate with DNA copy number-based tumour purity across samples from 11 different tumour types, profiled on Agilent, Affymetrix platforms or based on RNA sequencing and available through The Cancer Genome Atlas. The prediction accuracy is further corroborated using 3,809 transcriptional profiles available elsewhere in the public domain. The ESTIMATE method allows consideration of tumour-associated normal cells in genomic and transcriptomic studies. An R-library is available on https://sourceforge.net/projects/estimateproject/.

4,651 citations


Journal ArticleDOI
10 Oct 2013-Cell
TL;DR: Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM.

3,593 citations


Journal ArticleDOI
TL;DR: Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
Abstract: Identifying the downstream effects of disease-associated SNPs is challenging. To help overcome this problem, we performed expression quantitative trait locus (eQTL) meta-analysis in non-transformed peripheral blood samples from 5,311 individuals with replication in 2,775 individuals. We identified and replicated trans eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Some of these SNPs affect multiple genes in trans that are known to be altered in individuals with disease: rs4917014, previously associated with systemic lupus erythematosus (SLE), altered gene expression of C1QB and five type I interferon response genes, both hallmarks of SLE. DeepSAGE RNA sequencing showed that rs4917014 strongly alters the 3' UTR levels of IKZF1 in cis, and chromatin immunoprecipitation and sequencing analysis of the trans-regulated genes implicated IKZF1 as the causal gene. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.

1,627 citations


Journal ArticleDOI
TL;DR: It is found that EPI cells and primary hESC outgrowth have dramatically different transcriptomes, with 1,498 genes showing differential expression between them, and this work provides a comprehensive framework of the transcriptome landscapes of human early embryos and hESCs.
Abstract: Measuring gene expression in individual cells is crucial for understanding the gene regulatory network controlling human embryonic development. Here we apply single-cell RNA sequencing (RNA-Seq) analysis to 124 individual cells from human preimplantation embryos and human embryonic stem cells (hESCs) at different passages. The number of maternally expressed genes detected in our data set is 22,687, including 8,701 long noncoding RNAs (lncRNAs), which represents a significant increase from 9,735 maternal genes detected previously by cDNA microarray. We discovered 2,733 novel lncRNAs, many of which are expressed in specific developmental stages. To address the long-standing question whether gene expression signatures of human epiblast (EPI) and in vitro hESCs are the same, we found that EPI cells and primary hESC outgrowth have dramatically different transcriptomes, with 1,498 genes showing differential expression between them. This work provides a comprehensive framework of the transcriptome landscapes of human early embryos and hESCs.

1,362 citations


Journal ArticleDOI
19 May 2013-Nature
TL;DR: The authors used single-cell RNA-Seq to investigate heterogeneity in the response of bone marrow derived dendritic cells (BMDCs) to lipopolysaccharide (LPS) and found extensive, and previously unobserved, bimodal variation in mRNA abundance and splicing patterns.
Abstract: Recent molecular studies have revealed that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels, and phenotypic output 1–5 , with important functional consequences 4,5 . Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs 1,2 or proteins 5,6 simultaneously because genomic profiling methods 3 could not be applied to single cells until very recently 7–10 . Here, we use single-cell RNA-Seq to investigate heterogeneity in the response of bone marrow derived dendritic cells (BMDCs) to lipopolysaccharide (LPS). We find extensive, and previously unobserved, bimodal variation in mRNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization (RNA-FISH) for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.

1,215 citations


Journal ArticleDOI
TL;DR: A new classification of CC into six molecular subtypes that arise through distinct biological pathways that improves the current disease stratification based on clinicopathological variables and common DNA markers is described.
Abstract: Background Colon cancer (CC) pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses.

1,065 citations


Journal ArticleDOI
TL;DR: This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR.
Abstract: RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.

1,029 citations


Journal ArticleDOI
TL;DR: This integrated molecular analysis of clear-cell renal cell carcinoma unmasked new correlations between DNA methylation, gene mutation and/or gene expression and copy number profiles, enabling the stratification of clinical risks for patients with ccRCC.
Abstract: Clear-cell renal cell carcinoma (ccRCC) is the most prevalent kidney cancer and its molecular pathogenesis is incompletely understood. Here we report an integrated molecular study of ccRCC in which ≥100 ccRCC cases were fully analyzed by whole-genome and/or whole-exome and RNA sequencing as well as by array-based gene expression, copy number and/or methylation analyses. We identified a full spectrum of genetic lesions and analyzed gene expression and DNA methylation signatures and determined their impact on tumor behavior. Defective VHL-mediated proteolysis was a common feature of ccRCC, which was caused not only by VHL inactivation but also by new hotspot TCEB1 mutations, which abolished Elongin C-VHL binding, leading to HIF accumulation. Other newly identified pathways and components recurrently mutated in ccRCC included PI3K-AKT-mTOR signaling, the KEAP1-NRF2-CUL3 apparatus, DNA methylation, p53-related pathways and mRNA processing. This integrated molecular analysis unmasked new correlations between DNA methylation, gene mutation and/or gene expression and copy number profiles, enabling the stratification of clinical risks for patients with ccRCC.

938 citations


Journal ArticleDOI
TL;DR: Three subtypes have markedly better disease-free survival (DFS) after surgical resection, suggesting these patients might be spared from the adverse effects of chemotherapy when they have localized disease.
Abstract: �Colorectal cancer (CRC) is a major cause of cancer mortality. Whereas some patients respond well to therapy, others do not, and thus more precise, individualized treatment strategies are needed. To that end, we analyzed gene expression profiles from 1,290 CRC tumors using consensus-based unsupervised clustering. The resultant clusters were then associated with therapeutic response data to the epidermal growth factor receptor–targeted drug cetuximab in 80 patients. The results of these studies define six clinically relevant CRC subtypes. Each subtype shares similarities to distinct cell types within the normal colon crypt and shows differing degrees of ‘stemness’ and Wnt signaling. Subtype-specific gene signatures are proposed to identify these subtypes. Three subtypes have markedly better disease-free survival (DFS) after surgical resection, suggesting these patients might be spared from the adverse effects of chemotherapy when they have localized disease. One of these three subtypes, identified by filamin A expression, does not respond to cetuximab but may respond to cMET receptor tyrosine kinase inhibitors in the metastatic setting. Two other subtypes, with poor and intermediate DFS, associate with improved response to the chemotherapy regimen FOLFIRI 1 in adjuvant or metastatic settings. Development of clinically deployable assays for these subtypes and of subtype-specific therapies may contribute to more effective management of this challenging disease. Previous studies have identified molecular subtypes of various human cancers by gene expression profiling 2–8 , including CRC subtypes 9,10 . However, these subtypes have not been associated with outcomes in patients treated with specific therapeutic interventions. Therefore, we sought to refine the approach of molecular classification of CRC by associating gene expression profiles of CRC tumors with corresponding clinical response to cetuximab. We first used consensusbased non-negative matrix factorization (NMF) 11 to cluster two published gene expression data sets (GSE13294 (ref. 12) and GSE14333 (ref. 13)) derived from resected primary CRCs (core data sets, n = 445). These data were corrected for batch effects and merged using the distance-weighted discrimination method 5,14 before clustering. This analysis defined five distinct high-consensus molecular subtypes of CRC (Supplementary Fig. 1a–e and Supplementary Results and

847 citations


Journal ArticleDOI
29 Aug 2013-Nature
TL;DR: A comprehensive analysis of transcriptome dynamics from oocyte to morula in both human and mouse embryos, using single-cell RNA sequencing finds that each developmental stage can be delineated concisely by a small number of functional modules of co-expressed genes.
Abstract: Mammalian pre-implantation development is a complex process involving dramatic changes in the transcriptional architecture. We report here a comprehensive analysis of transcriptome dynamics from oocyte to morula in both human and mouse embryos, using single-cell RNA sequencing. Based on single-nucleotide variants in human blastomere messenger RNAs and paternal-specific single-nucleotide polymorphisms, we identify novel stage-specific monoallelic expression patterns for a significant portion of polymorphic gene transcripts (25 to 53%). By weighted gene co-expression network analysis, we find that each developmental stage can be delineated concisely by a small number of functional modules of co-expressed genes. This result indicates a sequential order of transcriptional changes in pathways of cell cycle, gene regulation, translation and metabolism, acting in a step-wise fashion from cleavage to morula. Cross-species comparisons with mouse pre-implantation embryos reveal that the majority of human stage-specific modules (7 out of 9) are notably preserved, but developmental specificity and timing differ between human and mouse. Furthermore, we identify conserved key members (or hub genes) of the human and mouse networks. These genes represent novel candidates that are likely to be key in driving mammalian pre-implantation development. Together, the results provide a valuable resource to dissect gene regulatory mechanisms underlying progressive development of early mammalian embryos.

Journal ArticleDOI
TL;DR: Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis.
Abstract: Oranges are an important nutritional source for human health and have immense economic value Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis) The assembled sequence covers 873% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements We predicted 29,445 protein-coding genes, half of which are in the heterozygous state With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future

Journal ArticleDOI
TL;DR: It is demonstrated that CRISPR-on can efficiently activate exogenous reporter genes in both human and mouse cells in a tunable manner and robust reporter gene activation in vivo can be achieved by injecting the system components into mouse zygotes.
Abstract: Technologies allowing for specific regulation of endogenous genes are valuable for the study of gene functions and have great potential in therapeutics. We created the CRISPR-on system, a two-component transcriptional activator consisting of a nuclease-dead Cas9 (dCas9) protein fused with a transcriptional activation domain and single guide RNAs (sgRNAs) with complementary sequence to gene promoters. We demonstrate that CRISPR-on can efficiently activate exogenous reporter genes in both human and mouse cells in a tunable manner. In addition, we show that robust reporter gene activation in vivo can be achieved by injecting the system components into mouse zygotes. Furthermore, we show that CRISPR-on can activate the endogenous IL1RN, SOX2, and OCT4 genes. The most efficient gene activation was achieved by clusters of 3-4 sgRNAs binding to the proximal promoters, suggesting their synergistic action in gene induction. Significantly, when sgRNAs targeting multiple genes were simultaneously introduced into cells, robust multiplexed endogenous gene activation was achieved. Genome-wide expression profiling demonstrated high specificity of the system.

Journal ArticleDOI
TL;DR: The coexpression of CD49b and LAG-3 enables the isolation of highly suppressive human Tr 1 cells from in vitro anergized cultures and allows the tracking of Tr1 cells in the peripheral blood of subjects who developed tolerance after allogeneic hematopoietic stem cell transplantation.
Abstract: CD4(+) type 1 T regulatory (Tr1) cells are induced in the periphery and have a pivotal role in promoting and maintaining tolerance. The absence of surface markers that uniquely identify Tr1 cells has limited their study and clinical applications. By gene expression profiling of human Tr1 cell clones, we identified the surface markers CD49b and lymphocyte activation gene 3 (LAG-3) as being stably and selectively coexpressed on mouse and human Tr1 cells. We showed the specificity of these markers in mouse models of intestinal inflammation and helminth infection and in the peripheral blood of healthy volunteers. The coexpression of CD49b and LAG-3 enables the isolation of highly suppressive human Tr1 cells from in vitro anergized cultures and allows the tracking of Tr1 cells in the peripheral blood of subjects who developed tolerance after allogeneic hematopoietic stem cell transplantation. The use of these markers makes it feasible to track Tr1 cells in vivo and purify Tr1 cells for cell therapy to induce or restore tolerance in subjects with immune-mediated diseases.

Journal ArticleDOI
TL;DR: In situ sequencing of point mutations and multiplexed gene expression profiling in human breast cancer tissue sections is demonstrated and the method for parallel targeted analysis of short RNA fragments in morphologically preserved cells and tissue is developed.
Abstract: Tissue gene expression profiling is performed on homogenates or on populations of isolated single cells to resolve molecular states of different cell types. In both approaches, histological context is lost. We have developed an in situ sequencing method for parallel targeted analysis of short RNA fragments in morphologically preserved cells and tissue. We demonstrate in situ sequencing of point mutations and multiplexed gene expression profiling in human breast cancer tissue sections.

Journal ArticleDOI
TL;DR: The R package Piano is developed that collects a range of GSA methods into the same system, for the benefit of the end-user, and suggests to use a consensus scoring approach, based on multiple GSA runs, in combination with the directionality classes.
Abstract: Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes. As a consequence of this, we suggest to use a consensus scoring approach, based on multiple GSA runs. In combination with the directionality classes, this constitutes a more thorough basis for an enriched biological interpretation.

Journal ArticleDOI
TL;DR: The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv.
Abstract: The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

Journal ArticleDOI
TL;DR: The results suggest that future clinical trials focused on TN disease should consider stratifying patients based upon BL versus non-BL gene expression profiles, which appears to be the main biological difference seen in patients with TN breast cancer.
Abstract: Triple-negative (TN) and basal-like (BL) breast cancer definitions have been used interchangeably to identify breast cancers that lack expression of the hormone receptors and overexpression and/or amplification of HER2. However, both classifications show substantial discordance rates when compared to each other. Here, we molecularly characterize TN tumors and BL tumors, comparing and contrasting the results in terms of common patterns and distinct patterns for each. In total, when testing 412 TN and 473 BL tumors, 21.4% and 31.5% were identified as non-BL and non-TN, respectively. TN tumors identified as luminal or HER2-enriched (HER2E) showed undistinguishable overall gene expression profiles when compared versus luminal or HER2E tumors that were not TN. Similar findings were observed within BL tumors regardless of their TN status, which suggests that molecular subtype is preserved regardless of individual marker results. Interestingly, most TN tumors identified as HER2E showed low HER2 expression and lacked HER2 amplification, despite the similar overall gene expression profiles to HER2E tumors that were clinically HER2-positive. Lastly, additional genomic classifications were examined within TN and BL cancers, most of which were highly concordant with tumor intrinsic subtype. These results suggest that future clinical trials focused on TN disease should consider stratifying patients based upon BL versus non-BL gene expression profiles, which appears to be the main biological difference seen in patients with TN breast cancer.

Journal ArticleDOI
TL;DR: This work identifies Sox4 as a master regulator of EMT by governing the expression of the epigenetic modifier Ezh2, encoding the Polycomb group histone methyltransferase that trimethylates histone 3 lysine 27 for gene repression.

Journal ArticleDOI
TL;DR: BETA is a software package that integrates ChIP-seq of TFs or chromatin regulators with differential gene expression data to infer direct target genes and identifies the motif of the factor and its collaborators, which might modulate the factor's activating or repressive function.
Abstract: The combination of ChIP-seq and transcriptome analysis is a compelling approach to unravel the regulation of gene expression. Several recently published methods combine transcription factor (TF) binding and gene expression for target prediction, but few of them provide an efficient software package for the community. Binding and expression target analysis (BETA) is a software package that integrates ChIP-seq of TFs or chromatin regulators with differential gene expression data to infer direct target genes. BETA has three functions: (i) to predict whether the factor has activating or repressive function; (ii) to infer the factor's target genes; and (iii) to identify the motif of the factor and its collaborators, which might modulate the factor's activating or repressive function. Here we describe the implementation and features of BETA to demonstrate its application to several data sets. BETA requires ~1 GB of RAM, and the procedure takes 20 min to complete. BETA is available open source at http://cistrome.org/BETA/.

Journal ArticleDOI
18 Sep 2013-Neuron
TL;DR: It is shown that neuronal Tet1 regulates normal DNA methylation levels, expression of activity-regulated genes, synaptic plasticity, and memory extinction.

Journal ArticleDOI
TL;DR: An integrative genomic analysis of ICC samples from a large series of patients identified a gene expression signature that was associated with reduced survival times of patients with ICC and was enriched in the proliferation class.

Journal ArticleDOI
TL;DR: This work profiled gene activity genome-wide in every organ, tissue, and cell type of Arabidopsis seeds from fertilization through maturity, offering the most comprehensive description of gene activity in seeds with high spatial and temporal resolution.
Abstract: Seeds are complex structures that consist of the embryo, endosperm, and seed-coat regions that are of different ontogenetic origins, and each region can be further divided into morphologically distinct subregions. Despite the importance of seeds for food, fiber, and fuel globally, little is known of the cellular processes that characterize each subregion or how these processes are integrated to permit the coordinated development of the seed. We profiled gene activity genome-wide in every organ, tissue, and cell type of Arabidopsis seeds from fertilization through maturity. The resulting mRNA datasets offer the most comprehensive description of gene activity in seeds with high spatial and temporal resolution, providing unique insights into the function of understudied seed regions. Global comparisons of mRNA populations reveal unexpected overlaps in the functional identities of seed subregions. Analyses of coexpressed gene sets suggest that processes that regulate seed size and filling are coordinated across several subregions. Predictions of gene regulatory networks based on the association of transcription factors with enriched DNA sequence motifs upstream of coexpressed genes identify regulators of seed development. These studies emphasize the utility of these datasets as an essential resource for the study of seed biology.

Journal ArticleDOI
15 Aug 2013-Blood
TL;DR: A whole-genome-sequencing-based perspective ofDLBCL mutational complexity is provided by characterizing 40 de novo DLBCL cases and 13 cell lines and combining these data with DNA copy number analysis and RNA-seq from an extended cohort of 96 cases, which uncovered new gene targets of recurrent somatic point mutations and genes that are targeted by focal somatic deletions in this disease.

Journal ArticleDOI
TL;DR: This work developed a sequencing method called 3'-seq to quantitatively map the 3' ends of the transcriptome of diverse human tissues and isogenic transformation systems and found that cell type-specific gene expression is accomplished by two complementary programs.
Abstract: More than half of human genes use alternative cleavage and polyadenylation (ApA) to generate mRNA transcripts that differ in the lengths of their 3' untranslated regions (UTRs), thus altering the post-transcriptional fate of the message and likely the protein output. The extent of 3' UTR variation across tissues and the functional role of ApA remain poorly understood. We developed a sequencing method called 3'-seq to quantitatively map the 3' ends of the transcriptome of diverse human tissues and isogenic transformation systems. We found that cell type-specific gene expression is accomplished by two complementary programs. Tissue-restricted genes tend to have single 3' UTRs, whereas a majority of ubiquitously transcribed genes generate multiple 3' UTRs. During transformation and differentiation, single-UTR genes change their mRNA abundance levels, while multi-UTR genes mostly change 3' UTR isoform ratios to achieve tissue specificity. However, both regulation programs target genes that function in the same pathways and processes that characterize the new cell type. Instead of finding global shifts in 3' UTR length during transformation and differentiation, we identify tissue-specific groups of multi-UTR genes that change their 3' UTR ratios; these changes in 3' UTR length are largely independent from changes in mRNA abundance. Finally, tissue-specific usage of ApA sites appears to be a mechanism for changing the landscape targetable by ubiquitously expressed microRNAs.

Journal ArticleDOI
TL;DR: The proposed subtypes provide a novel perspective on the heterogeneity of CRC and should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity.
Abstract: The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026.

Journal ArticleDOI
TL;DR: The emerging picture of mammalian transcription is complex with further refinement expected with the integration of epigenomic data generated by projects such as ENCODE.

Journal ArticleDOI
31 Jan 2013-Cell
TL;DR: EQTL analyses of 15 previously reported breast cancer risk loci resulted in the discovery of three variants that are significantly associated with transcript levels (false discovery rate [FDR] < 0.1), which provides a more comprehensive picture of gene expression determinants in breast cancer as well as insights into the underlying biology of breast cancerrisk loci.

Journal ArticleDOI
TL;DR: It is suggested that matching patients to treatments based on transcriptional subtype will improve response rates, and inclusion of additional features from other profiling data types may provide additional benefit.
Abstract: Background: First-generation molecular profiles for human breast cancers have enabled the identification of features that can predict therapeutic response; however, little is known about how the various data types can best be combined to yield optimal predictors. Collections of breast cancer cell lines mirror many aspects of breast cancer molecular pathobiology, and measurements of their omic and biological therapeutic responses are well-suited for development of strategies to identify the most predictive molecular feature sets. Results: We used least squares-support vector machines and random forest algorithms to identify molecular features associated with responses of a collection of 70 breast cancer cell lines to 90 experimental or approved therapeutic agents. The datasets analyzed included measurements of copy number aberrations, mutations, gene and isoform expression, promoter methylation and protein expression. Transcriptional subtype contributed strongly to response predictors for 25% of compounds, and adding other molecular data types improved prediction for 65%. No single molecular dataset consistently out-performed the others, suggesting that therapeutic response is mediated at multiple levels in the genome. Response predictors were developed and applied to TCGA data, and were found to be present in subsets of those patient samples. Conclusions: These results suggest that matching patients to treatments based on transcriptional subtype will improve response rates, and inclusion of additional features from other profiling data types may provide additional benefit. Further, we suggest a systems biology strategy for guiding clinical trials so that patient cohorts most likely to respond to new therapies may be more efficiently identified.

Journal ArticleDOI
TL;DR: Using a novel NSCLC cohort together with a meta-analysis validation approach, a set of single genes with independent prognostic impact are identified and one of these genes, CADM1, was further established as an immunohistochemical marker with a potential application in clinical diagnostics.
Abstract: Purpose: Global gene expression profiling has been widely used in lung cancer research to identify clinically relevant molecular subtypes as well as to predict prognosis and therapy response. So fa ...