scispace - formally typeset
Search or ask a question

Showing papers by "Timo Lassmann published in 2017"


Journal ArticleDOI
09 Mar 2017-Nature
TL;DR: This work integrates multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5′ ends and expression profiles across 1,829 samples from the major human primary cell types and tissues, identifying 19,175 potentially functional lncRNAs in the human genome.
Abstract: Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.

821 citations


Journal ArticleDOI
TL;DR: An integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA libraries, with matching Cap Analysis Gene Expression (CAGE) data, is created, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
Abstract: MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.

406 citations


Journal ArticleDOI
Shuhei Noguchi, Takahiro Arakawa, Shiro Fukuda, Masaaki Furuno  +182 moreInstitutions (45)
TL;DR: In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE coupled with single-molecule sequencing to represent the consequence of transcriptional regulation in each analyzed state of mammalian cells.
Abstract: In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.

167 citations


Journal ArticleDOI
TL;DR: IBD loci are strongly-enriched for monocyte-specific genes, and at least 134 additional candidate genes associated with IBD susceptibility from reanalysis of published GWA studies are identified.
Abstract: The FANTOM5 consortium utilised cap analysis of gene expression (CAGE) to provide an unprecedented insight into transcriptional regulation in human cells and tissues. In the current study, we have used CAGE-based transcriptional profiling on an extended dense time course of the response of human monocyte-derived macrophages grown in macrophage colony-stimulating factor (CSF1) to bacterial lipopolysaccharide (LPS). We propose that this system provides a model for the differentiation and adaptation of monocytes entering the intestinal lamina propria. The response to LPS is shown to be a cascade of successive waves of transient gene expression extending over at least 48 hours, with hundreds of positive and negative regulatory loops. Promoter analysis using motif activity response analysis (MARA) identified some of the transcription factors likely to be responsible for the temporal profile of transcriptional activation. Each LPS-inducible locus was associated with multiple inducible enhancers, and in each case, transient eRNA transcription at multiple sites detected by CAGE preceded the appearance of promoter-associated transcripts. LPS-inducible long non-coding RNAs were commonly associated with clusters of inducible enhancers. We used these data to re-examine the hundreds of loci associated with susceptibility to inflammatory bowel disease (IBD) in genome-wide association studies. Loci associated with IBD were strongly and specifically (relative to rheumatoid arthritis and unrelated traits) enriched for promoters that were regulated in monocyte differentiation or activation. Amongst previously-identified IBD susceptibility loci, the vast majority contained at least one promoter that was regulated in CSF1-dependent monocyte-macrophage transitions and/or in response to LPS. On this basis, we concluded that IBD loci are strongly-enriched for monocyte-specific genes, and identified at least 134 additional candidate genes associated with IBD susceptibility from reanalysis of published GWA studies. We propose that dysregulation of monocyte adaptation to the environment of the gastrointestinal mucosa is the key process leading to inflammatory bowel disease.

91 citations


Journal ArticleDOI
TL;DR: This work provides evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation.
Abstract: Cap Analysis of Gene Expression (CAGE) in combination with single-molecule sequencing technology allows precision mapping of transcription start sites (TSSs) and genome-wide capture of promoter activities in differentiated and steady state cell populations. Much less is known about whether TSS profiling can characterize diverse and non-steady state cell populations, such as the approximately 400 transitory and heterogeneous cell types that arise during ontogeny of vertebrate animals. To gain such insight, we used the chick model and performed CAGE-based TSS analysis on embryonic samples covering the full 3-week developmental period. In total, 31,863 robust TSS peaks (>1 tag per million [TPM]) were mapped to the latest chicken genome assembly, of which 34% to 46% were active in any given developmental stage. ZENBU, a web-based, open-source platform, was used for interactive data exploration. TSSs of genes critical for lineage differentiation could be precisely mapped and their activities tracked throughout development, suggesting that non-steady state and heterogeneous cell populations are amenable to CAGE-based transcriptional analysis. Our study also uncovered a large set of extremely stable housekeeping TSSs and many novel stage-specific ones. We furthermore demonstrated that TSS mapping could expedite motif-based promoter analysis for regulatory modules associated with stage-specific and housekeeping genes. Finally, using Brachyury as an example, we provide evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation. Taken together, our results represent the first report of genome-wide TSS mapping in birds and the first systematic developmental TSS analysis in any amniote species (birds and mammals). By facilitating promoter-based molecular analysis and genetic manipulation, our work also underscores the value of avian models in unravelling the complex regulatory mechanism of cell lineage specification during amniote development.

32 citations


Journal ArticleDOI
01 Jan 2017-Diabetes
TL;DR: The transcriptional events that occur during in vitro differentiation of human adipocytes were investigated and the findings linked to WAT phenotypes suggest a complex but highly coordinated regulation of adipogenesis.
Abstract: White adipose tissue (WAT) can develop into several phenotypes with different pathophysiological impact on type 2 diabetes. To better understand the adipogenic process, the transcriptional events that occur during in vitro differentiation of human adipocytes were investigated and the findings linked to WAT phenotypes. Single-molecule transcriptional profiling provided a detailed map of the expressional changes of genes, enhancers, and long noncoding RNAs, where different types of transcripts share common dynamics during differentiation. Common signatures include early downregulated, transient, and late induced transcripts, all of which are linked to distinct developmental processes during adipogenesis. Enhancers expressed during adipogenesis overlap significantly with genetic variants associated with WAT distribution. Transiently expressed and late induced genes are associated with hypertrophic WAT (few but large fat cells), a phenotype closely linked to insulin resistance and type 2 diabetes. Transcription factors that are expressed early or transiently affect differentiation and adipocyte function and are controlled by several well-known upstream regulators such as glucocorticosteroids, insulin, cAMP, and thyroid hormones. Taken together, our results suggest a complex but highly coordinated regulation of adipogenesis.

26 citations


Journal ArticleDOI
TL;DR: This report identifies a robust list of 22 candidate driver genes that are epigenetically regulated in lung cancer; such genes may complement the known mutational drivers.
Abstract: Lung cancer is the leading cause of cancer-related deaths worldwide. The majority of cancer driver mutations have been identified; however, relevant epigenetic regulation involved in tumorigenesis has only been fragmentarily analyzed. Epigenetically regulated genes have a great theranostic potential, especially in tumors with no apparent driver mutations. Here, epigenetically regulated genes were identified in lung cancer by an integrative analysis of promoter-level expression profiles from Cap Analysis of Gene Expression (CAGE) of 16 non-small cell lung cancer (NSCLC) cell lines and 16 normal lung primary cell specimens with DNA methylation data of 69 NSCLC cell lines and 6 normal lung epithelial cells. A core set of 49 coding genes and 10 long noncoding RNAs (lncRNA), which are upregulated in NSCLC cell lines due to promoter hypomethylation, was uncovered. Twenty-two epigenetically regulated genes were validated (upregulated genes with hypomethylated promoters) in the adenocarcinoma and squamous cell cancer subtypes of lung cancer using The Cancer Genome Atlas data. Furthermore, it was demonstrated that multiple copies of the REP522 DNA repeat family are prominently upregulated due to hypomethylation in NSCLC cell lines, which leads to cancer-specific expression of lncRNAs, such as RP1-90G24.10, AL022344.4, and PCAT7. Finally, Myeloma Overexpressed (MYEOV) was identified as the most promising candidate. Functional studies demonstrated that MYEOV promotes cell proliferation, survival, and invasion. Moreover, high MYEOV expression levels were associated with poor prognosis.Implications: This report identifies a robust list of 22 candidate driver genes that are epigenetically regulated in lung cancer; such genes may complement the known mutational drivers.Visual Overview: http://mcr.aacrjournals.org/content/molcanres/15/10/1354/F1.large.jpg Mol Cancer Res; 15(10); 1354-65. ©2017 AACR.

25 citations



Journal ArticleDOI
01 Jan 2017-Leukemia
TL;DR: Molecular studies showed that ROM reduces expression of cytidine deaminase, an enzyme involved in ARAC deactivation, and enhances the DNA damage–response to ARAC, and have identified ROM as a promising therapeutic for MLL-rearranged iALL.
Abstract: To address the poor prognosis of mixed lineage leukemia (MLL)-rearranged infant acute lymphoblastic leukemia (iALL), we generated a panel of cell lines from primary patient samples and investigated cytotoxic responses to contemporary and novel Food and Drug Administration-approved chemotherapeutics. To characterize representation of primary disease within cell lines, molecular features were compared using RNA-sequencing and cytogenetics. High-throughput screening revealed variable efficacy of currently used drugs, however identified consistent efficacy of three novel drug classes: proteasome inhibitors, histone deacetylase inhibitors and cyclin-dependent kinase inhibitors. Gene expression of drug targets was highly reproducible comparing iALL cell lines to matched primary specimens. Histone deacetylase inhibitors, including romidepsin (ROM), enhanced the activity of a key component of iALL therapy, cytarabine (ARAC) in vitro and combined administration of ROM and ARAC to xenografted mice further reduced leukemia burden. Molecular studies showed that ROM reduces expression of cytidine deaminase, an enzyme involved in ARAC deactivation, and enhances the DNA damage-response to ARAC. In conclusion, we present a valuable resource for drug discovery, including the first systematic analysis of transcriptome reproducibility in vitro, and have identified ROM as a promising therapeutic for MLL-rearranged iALL.

22 citations


Journal ArticleDOI
TL;DR: The production and quality control of CAGEscan libraries from 56 FANTom5 RNA sources are presented, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.
Abstract: The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines. Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5' ends of capped RNAs after their conversion to cDNAs. While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan. To address this, we used the CAGEscan method, in which random-primed 5'-cDNAs are paired-end sequenced. Pairs starting in the same region are assembled in transcript models called CAGEscan clusters. Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.

15 citations


Journal ArticleDOI
TL;DR: This work generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE technology combined with single molecule sequencing, and identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes.
Abstract: The promoter landscape of several non-human model organisms is far from complete. As a part of FANTOM5 data collection, we generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE (Cap Analysis of Gene Expression) technology combined with single molecule sequencing. Our analyses show that the CAGE profiles recapitulate known transcription start sites (TSSs) consistently, in addition to uncover novel TSSs. Our dataset can be thus used with high confidence to support gene annotation in dog and rat species. We identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes. This approach could be seen as a standard method for improvement of existing gene models, as well as discovery of novel genes. Given that the FANTOM5 data collection includes dog and rat matched cell types in human and mouse as well, this data would also be useful for cross-species studies.

Journal ArticleDOI
TL;DR: FantOM5 profiled 15 distinct anatomical regions of the aged macaque central nervous system using Cap Analysis of Gene Expression, a high-resolution, annotation-independent technology that allows monitoring of transcription initiation events with high accuracy.
Abstract: Rhesus macaque was the second non-human primate whose genome has been fully sequenced and is one of the most used model organisms to study human biology and disease, thanks to the close evolutionary relationship between the two species. But compared to human, where several previously unknown RNAs have been uncovered, the macaque transcriptome is less studied. Publicly available RNA expression resources for macaque are limited, even for brain, which is highly relevant to study human cognitive abilities. In an effort to complement those resources, FANTOM5 profiled 15 distinct anatomical regions of the aged macaque central nervous system using Cap Analysis of Gene Expression, a high-resolution, annotation-independent technology that allows monitoring of transcription initiation events with high accuracy. We identified 25,869 CAGE peaks, representing bona fide promoters. For each peak we provide detailed annotation, expanding the landscape of ‘known’ macaque genes, and we show concrete examples on how to use the resulting data. We believe this data represents a useful resource to understand the central nervous system in macaque. Machine-accessible metadata file describing the reported data (ISA-Tab format)

Posted ContentDOI
11 Apr 2017-bioRxiv
TL;DR: The production and quality control of CAGEscan libraries from 56 FANTom5 RNA sources are presented, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.
Abstract: The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5′ ends of capped RNAs after their conversion to cDNAs While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan To address this, we used the CAGEscan method, in which random-primed 5′-cDNAs are paired-end sequenced Pairs starting in the same region are assembled in transcript models called CAGEscan clusters Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts