scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An atlas of active enhancers across human cell types and tissues

TL;DR: It is shown that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity.
Abstract: Enhancers control the correct temporal and cell-type-specific activation of gene expression in multicellular eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. Here we use the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. The atlas is used to compare regulatory programs between different cells at unprecedented depth, to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation.

Summary (3 min read)

INTRODUCTION

  • Precise regulation of gene expression in time and space is required for development, differentiation and homeostasis in higher organisms 1 .
  • Enhancers were originally defined as remote elements that increase transcription independent of their orientation, position and distance to a promoter 3 .
  • They were only recently found to initiate RNA polymerase II transcription, producing so-called eRNAs 4 .
  • Genomic locations of enhancers used by cells can be detected by mapping of chromatin marks and transcription factor binding sites from chromatin immunoprecipitation (ChIP) assays and DNase I hypersensitive sites (DHSs) (reviewed in ref. 1), but there has been no systematic analysis of enhancer usage in the large variety of cell types and tissues present in the human body.
  • Using Cap Analysis of Gene Expression 5 (CAGE), the authors show that enhancer activity can be detected through the presence of balanced bidirectional capped transcripts, enabling the identification of enhancers from small primary cell populations.

Bidirectional pairs of capped RNAs identify active enhancers

  • The FANTOM5 project has generated a CAGE-based transcription start site (TSS) atlas across a broad panel of primary cells, tissues, and cell lines covering the vast majority of human cell types 6 .
  • Similar patterns were observed in other cell lines (Supplementary Fig. 2a ).
  • While capped RNAs of protein-coding gene promoters were strongly biased towards the sense direction, similar levels of capped RNA in both directions were detected at enhancers (Fig. 1b , and Supplementary Fig. 2b, c ).
  • Interestingly, the candidates were depleted of CpG islands (CGI) and repeats (with the exception of neural stem cells, see ref. 9).
  • While 67.4-73.9% of the CAGE-defined enhancers showed significant reporter activity, only 20-33.3% of the untranscribed candidate regulatory regions were active (Fig. 1c , and Supplementary Fig. 9a ).

Enhancer TSSs share regulatory features with mRNA TSSs but produce short, exosomesensitive RNAs

  • RNA-seq data from matching primary cells and tissues showed that ~95% of RNAs originating from enhancers were unspliced and typically short (median 346 nt) -a striking difference to mRNAs (19% unspliced, median 56 nt) (Fig. 2a , and Supplementary Fig. 11ac ).
  • Unlike TSSs of mRNAs, which are enriched for predicted 5' splice sites but depleted of downstream polyadenylation (pA) signals 11, 12 , enhancers showed no evidence of associated downstream RNA processing motifs, and thus resemble antisense PROMoter uPstream Transcripts 11 (Fig. 2b , and Supplementary Fig. 11d ).
  • Furthermore, de novo motif analysis revealed sequence signatures in CAGE-defined enhancers closely resembling non-CGI promoters (Fig. 2d , and Supplementary Fig. 13b ).
  • Indeed, siRNA-mediated depletion of the hMTR4 (SKIV2L2) co-factor of the exosome complex resulted in a median 3.14-fold increase of capped enhancer-RNA abundance (Fig. 2e , and Supplementary Fig. 14a, b ), but only a negligible increase at mRNA TSSs.
  • The number of detectable bidirectional CAGE peaks increased 1.7-fold upon hMTR4 depletion and novel enhancer candidates had on average similar, but weaker, chromatin modification signals compared to control HeLa cells (Supplementary Fig. 14e ).

CAGE expression identifies cell-specific enhancer usage

  • To test whether CAGE expression can identify cell type-specific enhancer usage in vivo, ChIP-seq (H3K27ac and H3K4me1), DNA methylation and triplicate CAGE analyses were performed in five primary blood cell types, and compared to published DHS data (www.roadmapepigenomics.org, Supplementary Table 4 ).
  • CAGE-defined enhancers were strongly supported by proximal H3K4me1/H3K27ac peaks (71%) and DHSs (87%) from the same cell type.
  • Moreover, there was a clear correlation between CAGE, DNase I hypersensitivity, H3K4me1 and H3K27ac for CAGE-defined enhancers expressed in blood cells (Fig. 3a ).
  • Accordingly, cell type-specific enhancer expression corresponds to cell type-specific histone modifications (Fig. 3b ).
  • Thus, bidirectional CAGE pairs are robust predictors for cell type-specific enhancer activity.

An atlas of transcribed enhancers over human cells and tissues

  • The FANTOM5 CAGE library collection 6 enables the dissection of enhancer usage across cell types and tissues comprehensively sampled across the human body.
  • The results corroborate the functional relevance of these enhancers for tissue-specific gene expression and suggest that they are an important part of the regulatory programs of cellular differentiation and organogenesis.
  • The authors grouped the primary cell and tissue samples into larger, mutually exclusive cell type and organ/tissue groups , respectively, with similar function or morphology (Supplementary Tables 10 and 11 ).
  • From the data the authors can draw several conclusions:.
  • Facets in which the authors detect many enhancers typically also have a higher fraction of facet-specific enhancers (Supplementary Fig. 22c, d ).

Expression clustering reveals ubiquitous enhancers

  • Compared to other enhancers, the ubiquitous (u-) enhancers are 8 times more likely to overlap CGIs and they are twice as conserved (Supplementary Fig. 26a-c ).
  • U-enhancers overlap typical chromatin enhancer marks but have higher H3K4me3 signal (Supplementary Fig. 26d ).
  • P<1.5e-8, Mann-Whitney U test), the transcripts remain predominantly (~78%) unspliced and significantly shorter (P<4.2e-18, Mann-Whitney U test) than mRNAs (Supplementary Fig. 27-28), do not share exons with known genes, and are exosome-sensitive (Supplementary Fig. 14b ).
  • Therefore, it is unlikely that these are novel mRNA promoters.
  • They are also highly enriched for P300 and cohesin ChIP-seq peaks 20 and RNAPII-mediated ChIA-PET signal 21 compared to other enhancers (Supplementary Fig. 26d ).

Linking enhancer usage with TSS expression

  • Uniquely, FANTOM5 CAGE allows for direct comparison between transcriptional activity of the enhancer and of putative target gene TSSs across a diverse set ofhuman cells.
  • Based on pair-wise expression correlation, nearly half (40%) of the inferred TSS-associated enhancers were linked with the nearest TSS, and 64% of enhancers have at least one correlated TSS within 500kb.
  • Several (10,260, 15.3%) associations are supported by ChIA-PET interaction data 21 , and the supported fraction increases with the correlation threshold (Supplementary Fig. 29a ).
  • The fraction of supported associations is 4.8-fold higher than that of associations predicted from DNase hypersensitivity correlations 10 (20.6% vs. 4.3%, at the same correlation threshold), indicating that transcription is a better predictor of regulatory targets than chromatin accessibility.
  • One hypothesis explaining the function of multiple enhancers driving the same expression pattern is that they might confer higher transcriptional output of a gene 25, 26 .

Disease-associated SNPs are enriched in enhancers

  • Many disease-associated SNPs are located outside of protein-coding exons and a large proportion of human genes display expression polymorphism 28 .
  • Using the NHGRI GWAS catalog 29 and extending the compilation of lead SNPs with proxy SNPs in strong linkage disequilibrium (similar to refs. 30,31), the authors identified diseases/traits whose associated SNPs overlapped enhancers, promoters, exons and random regions significantly more than expected by chance (Fisher's Exact Test P<0.01, Supplementary Table 16 ).
  • For many traits where enriched disease-associated SNPs were within enhancers, enhancer activity was detected in pathologically relevant cell types (Fig. 6d , and Supplementary Figs 31 and 32 ).
  • Examples include Graves' disease-associated SNPs enriched in enhancers that are expressed predominantly in thyroid tissue, and similarly lymphocytes for chronic lymphocytic leukemia.

CONCLUSIONS

  • The data presented here demonstrate that bidirectional capped RNAs, as measured by CAGE, are robust predictors of enhancer activity in a cell.
  • Transcription is only measured at a fraction of chromatin-defined enhancers and few untranscribed enhancers show potential enhancer activity.
  • Of course, given the relative instability of enhancer RNAs some chromatin-defined sites may be active but fall below the limits of detection of CAGE.
  • This view is not supported by the larger FANTOM5 dataset.
  • It has clear applications in human genetics, to narrow the search windows for functional association, and for the definition of regulatory networks that underpin the processes of cellular differentiation and organogenesis in human development.

Did you find this useful? Give us your feedback

Figures (6)

Content maybe subject to copyright    Report

An atlas of active enhancers across human cell types and
tissues
Author
Andersson, Robin, Gebhard, Claudia, Miguel-Escalada, Irene, Hoof, Ilka, Bornholdt, Jette,
Boyd, Mette, Chen, Yun, Zhao, Xiaobei, Schmidl, Christian, Suzuki, Takahiro, Ntini, Evgenia,
Arner, Erik, Valen, Eivind, Li, Kang, Schwarzfischer, Lucia, Glatz, Dagmar, Raithel, Johanna,
Lilje, Berit, Rapin, Nicolas, Bagger, Frederik Otzen, Jorgensen, Mette, Andersen, Peter
Refsing, Bertin, Nicolas, Rackham, Owen, Burroughs, A Maxwell, Baillie, J Kenneth, Ishizu,
Yuri, Shimizu, Yuri, Furuhata, Erina, Maeda, Shiori, Negishi, Yutaka, Mungall, Christopher
J, Meehan, Terrence F, Lassmann, Timo, Itoh, Masayoshi, Kawaji, Hideya, Kondo, Naoto,
Kawai, Jun, Lennartsson, Andreas, Daub, Carsten O, Heutink, Peter, Hume, David A, Jensen,
Torben Heick, Suzuki, Harukazu, Hayashizaki, Yoshihide, Mueller, Ferenc, Forrest, Alistair RR,
Carninci, Piero, Rehli, Michael, Sandelin, Albin
Published
2014
Journal Title
Nature
Version
Accepted Manuscript (AM)
DOI
https://doi.org/10.1038/nature12787
Copyright Statement
© 2014 Nature Publishing Group. This is the author-manuscript version of this paper.
Reproduced in accordance with the copyright policy of the publisher. Please refer to the journal
website for access to the definitive, published version.
Downloaded from
http://hdl.handle.net/10072/102456
Griffith Research Online
https://research-repository.griffith.edu.au

An atlas of active enhancers across human cell types and
tissues
Robin Andersson
#1
, Claudia Gebhard
#2
, Irene Miguel-Escalada
3
, Ilka Hoof
1
, Jette Bornholdt
1
,
Mette Boyd
1
, Yun Chen
1
, Xiaobei Zhao
1,4
, Christian Schmidl
2
, Takahiro Suzuki
5,6
, Evgenia Ntini
7
,
Erik Arner
5,6
, Eivind Valen
1,8
, Kang Li
1
, Lucia Schwarzfischer
2
, Dagmar Glatz
2
, Johanna
Raithel
2
, Berit Lilje
1
, Nicolas Rapin
1,9
, Frederik Otzen Bagger
1,9
, Mette Jørgensen
1
, Peter
Refsing Andersen
7
, Nicolas Bertin
5,6
, Owen Rackham
5,6
, A. Maxwell Burroughs
5,6
, J. Kenneth
Baillie
10
, Yuri Ishizu
5,6
, Yuri Shimizu
5,6
, Erina Furuhata
5,6
, Shiori Maeda
5,6
, Yutaka Negishi
5,6
,
Christopher J. Mungall
11
, Terrence F. Meehan
12
, Timo Lassmann
5,6
, Masayoshi Itoh
5,6,13
, Hideya
Kawaji
5,13
, Naoto Kondo
5,13
, Jun Kawai
5,13
, Andreas Lennartsson
14
, Carsten O. Daub
5,6,14
, Peter
Heutink
15
, David A. Hume
10
, Torben Heick Jensen
7
, Harukazu Suzuki
5,6
, Yoshihide
Hayashizaki
5,13
, Ferenc Müller
3
, Alistair R.R. Forrest
5,6,*
, Piero Carninci
5,6,*
, Michael Rehli
#2,*
,
and Albin Sandelin
#1,*
1
The Bioinformatics Centre, Department of Biology & Biotech Research and Innovation Centre,
University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen, Denmark
2
Department of
Internal Medicine III, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93042
Regensburg, Germany
3
School of Clinical and Experimental Medicine, College of Medical and
Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
4
Lineberger
Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
5
RIKEN OMICS Science Centre, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama City, Kanagawa, 230-0045, Japan
6
RIKEN Center for Life Science Technologies
(Division of Genomic Technologies), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama City, Kanagawa, 230-0045, Japan
7
Centre for mRNP Biogenesis and Metabolism,
Department of Molecular Biology and Genetics, C.F. Møllers Alle 3, Bldg. 1130, DK-8000 Aarhus,
Denmark
8
Department of Molecular and Cellular Biology, Harvard University, USA
9
The Finsen
Laboratory, Rigshospitalet and Danish Stem Cell Centre (DanStem), University of Copenhagen,
Ole Maaloes Vej 5, DK-2200, Denmark
10
Roslin Institute, Edinburgh University, Easter Bush,
Midlothian, EH25 9RG Scotland, UK
11
Genomics Division, Lawrence Berkeley National
Laboratory, 1 Cyclotron Road MS 64-121, Berkeley, CA 94720, USA
12
EMBL Outstation -
*
Correspondence should be addressed to ARRF (alistair.forrest@gmail.com), PC (carninci@riken.jp), MR (michael.rehli@ukr.de) or
AS (albin@binf.ku.dk).
Author contributions
RA, IH, EA, EV, KL, YC, BL, XZ, MJ, HK, TM, TL, NB, OR, MB, KB, CM, NR, FOB, MR, AS made the computational analysis.
JB, MB, TL, HK, NK, JK, HS, MI, CD, ARRF
, PC, YH prepared and preprocessed CAGE and/or RNA-seq libraries. EN, PRA, THJ,
JB, MB made the knockdown experiments followed by CAGE. CG, CS, LS, JR, DG, ME, MR made the blood cell ChIP experiments,
methylation assays and
in vitro
blood cell validations. TS, CG, YI, YS, EF, SM, YN, ARRF, PC and HS made the HeLa/HepG2
in
vitro
validations. IME, RA, AS, FM designed and carried out zebrafish
in vivo
tests. RA, CG, IH, CS, EA, EV, FM, IME, PC, AF, AK,
MB, JB, AL, CD, DH, PH, MR, AS interpreted results. RA, CG, IH, EV, IME, JB, FM, DAH, MR, AS wrote the paper with input
from all authors.
Competing interests.
The authors declare no competing interests.
Published in final edited form as:
Nature
. 2014 March 27; 507(7493): 455–461. doi:10.1038/nature12787.

Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD
13
RIKEN Preventive Medicine and Diagnosis Innovation Program, RIKEN
Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045,
Japan
14
Department of Biosciences and Nutrition, Karolinska Institutet, 14183 Huddinge,
Stockholm, Sweden.
15
Department of Clinical Genetics, VU University Medical Center, van der
Boechorststraat 7, 1081 BT Amsterdam, Netherlands
#
These authors contributed equally to this work.
SUMMARY
Enhancers control the correct temporal and cell type-specific activation of gene expression in
higher eukaryotes. Knowing their properties, regulatory activity and targets is crucial to
understand the regulation of differentiation and homeostasis. We use the FANTOM5 panel of
samples covering the majority of human tissues and cell types to produce an atlas of active,
in vivo
transcribed enhancers. We show that enhancers share properties with CpG-poor mRNA promoters
but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of
which is strongly related to enhancer activity. The atlas is used to compare regulatory programs
between different cells at unprecedented depth, identify disease-associated regulatory single
nucleotide polymorphisms, and classify cell type-specific and ubiquitous enhancers. We further
explore the utility of enhancer redundancy, which explains gene expression strength rather than
expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies
on cell type-specific enhancers and gene regulation.
INTRODUCTION
Precise regulation of gene expression in time and space is required for development,
differentiation and homeostasis in higher organisms
1
. Sequence elements within or near core
promoter regions contribute to regulation
2
, but promoter-distal regulatory regions like
enhancers are essential in the control of cell type specificity
1
. Enhancers were originally
defined as remote elements that increase transcription independent of their orientation,
position and distance to a promoter
3
. They were only recently found to initiate RNA
polymerase II (RNAPII) transcription, producing so-called eRNAs
4
. Genomic locations of
enhancers used by cells can be detected by mapping of chromatin marks and transcription
factor binding sites from chromatin immunoprecipitation (ChIP) assays and DNase I
hypersensitive sites (DHSs) (reviewed in ref. 1), but there has been no systematic analysis of
enhancer usage in the large variety of cell types and tissues present in the human body.
Using Cap Analysis of Gene Expression
5
(CAGE), we show that enhancer activity can be
detected through the presence of balanced bidirectional capped transcripts, enabling the
identification of enhancers from small primary cell populations. Based upon the FANTOM5
CAGE expression atlas encompassing 432 primary cell, 135 tissue and 241 cell line samples
from human
6
, we identify 43,011 enhancer candidates and characterize their activity across
the majority of human cell types and tissues. The resulting catalogue of transcribed
enhancers enables classification of ubiquitous and cell type-specific enhancers, modeling of
Andersson et al. Page 2
Nature
. Author manuscript; available in PMC 2017 January 05.

physical interactions between multiple enhancers and TSSs, and identification of potential
disease-associated regulatory single nucleotide polymorphisms (SNPs).
RESULTS
Bidirectional pairs of capped RNAs identify active enhancers
The FANTOM5 project has generated a CAGE-based transcription start site (TSS) atlas
across a broad panel of primary cells, tissues, and cell lines covering the vast majority of
human cell types
6
. Within that dataset, well-studied enhancers often have CAGE peaks
delineating nucleosome-deficient regions (NDRs) (Supplementary Fig. 1). To determine
whether this is a general enhancer feature, FANTOM5 CAGE (Supplementary Table 1) was
superimposed on active (H3K27ac-marked) enhancers defined by HeLa-S3 ENCODE ChIP-
seq data
7
. CAGE tags showed a bimodal distribution flanking the central P300 peak, with
divergent transcription from the enhancer (Fig. 1a). Similar patterns were observed in other
cell lines (Supplementary Fig. 2a). Enhancer-associated reverse and forward strand
transcription initiation events were, on average, separated by 180 bp and corresponded to
nucleosome boundaries (Supplementary Figs 3 and 4). As a class, active HeLa-S3 enhancers
had 231-fold more CAGE tags than polycomb-repressed enhancers, suggesting that
transcription is a marker for active usage. Indeed, ENCODE-predicted enhancers
7
with
significant reporter activity
8
had greater CAGE expression levels than those lacking reporter
activity (
P
<4e-22, Mann-Whitney U test). A lenient threshold on enhancer expression
increased the validation rate of ENCODE enhancers from 27% to 57% (Supplementary Fig.
5).
While capped RNAs of protein-coding gene promoters were strongly biased towards the
sense direction, similar levels of capped RNA in both directions were detected at enhancers
(Fig. 1b, and Supplementary Fig. 2b, c). Thus, bidirectional capped RNAs is a signature
feature of active enhancers. On this basis, we identified 43,011 enhancer candidates across
808 human CAGE libraries (see Supplementary Text and Supplementary Figs 6-8).
Interestingly, the candidates were depleted of CpG islands (CGI) and repeats (with the
exception of neural stem cells, see ref. 9).
To confirm the activity of newly-identified candidate enhancers, we randomly selected 46
strong, 41 moderate and 36 low activity enhancers (as defined by CAGE tag frequency) and
examined their activity using enhancer reporter assays compared to randomly selected
untranscribed loci with regulatory potential in HeLa-S3 cells: 15 DHSs
10
, 26 ENCODE-
predicted ‘strong enhancers’
7
and 20 enhancers defined as in Figure 1A (Supplementary
Tables 2 and 3). While 67.4-73.9% of the CAGE-defined enhancers showed significant
reporter activity, only 20-33.3% of the untranscribed candidate regulatory regions were
active (Fig. 1c, and Supplementary Fig. 9a). The same trend was observed in HepG2 cells
(Supplementary Fig. 10a, b). Corresponding promoter-less constructs showed that the
enhancer transcription read-through is negligible (Supplementary Fig. 9b, c). Large fractions
of CAGE-defined enhancers overlapped predicted ENCODE ‘strong enhancers’ or ‘TSS’
states (25% and 62%, respectively, for HeLa-S3), but there was no substantial difference in
validation rates between these classes (Supplementary Fig. 10c, d). In summary, active
Andersson et al. Page 3
Nature
. Author manuscript; available in PMC 2017 January 05.

CAGE-defined enhancers were much more likely to be validated in functional assays than
untranscribed candidate enhancers defined by histone modifications or DHSs.
Enhancer TSSs share regulatory features with mRNA TSSs but produce short, exosome-
sensitive RNAs
RNA-seq data from matching primary cells and tissues showed that ~95% of RNAs
originating from enhancers were unspliced and typically short (median 346 nt) - a striking
difference to mRNAs (19% unspliced, median 56 nt) (Fig. 2a, and Supplementary Fig. 11a-
c). Unlike TSSs of mRNAs, which are enriched for predicted 5’ splice sites but depleted of
downstream polyadenylation (pA) signals
11
,12
, enhancers showed no evidence of associated
downstream RNA processing motifs, and thus resemble antisense PROMoter uPstream
Transcripts (PROMPTs)
11
(Fig. 2b, and Supplementary Fig. 11d). Most CAGE-defined
enhancers gave rise to nuclear (>80%) and non-polyadenylated (~90%) RNAs
13
(Supplementary Fig. 11e). Based on RNA-seq, few enhancer RNAs overlap exons of known
protein-coding genes or lincRNAs (9 and 1 out of 4208 enhancers detected, respectively),
suggesting that they are not a substantial source of alternative promoters for known genes (as
in ref. 14).
TSS-associated, uncapped small RNAs (TSSa-RNAs), attributed to RNAPII protection and
found immediately downstream of mRNA TSSs
15
,16
, were detectable in the same positions
downstream of enhancer TSSs (Supplementary Fig. 12), indicating that RNAPII initiation at
enhancer and mRNA TSSs is similar. Indeed, CAGE-defined enhancer TSSs resembled the
proximal position-specific sequence patterns of non-CGI RefSeq TSSs (Fig. 2c, and
Supplementary Fig. 13a). Furthermore,
de novo
motif analysis revealed sequence signatures
in CAGE-defined enhancers closely resembling non-CGI promoters (Fig. 2d, and
Supplementary Fig. 13b).
Because of the similarity with PROMPTs, we reasoned that capped enhancer RNAs might
be rapidly degraded by the exosome. Indeed, siRNA-mediated depletion of the hMTR4
(
SKIV2L2
) co-factor of the exosome complex resulted in a median 3.14-fold increase of
capped enhancer-RNA abundance (Fig. 2e, and Supplementary Fig. 14a, b), but only a
negligible increase at mRNA TSSs. This increasing trend is similar to that of PROMPT
regions upstream of TSSs, although the increase of enhancer RNAs was significantly higher
(
P
<4.6e-67, Mann-Whitney U test, Fig. 2e, and Supplementary Fig. 14b, c). Thus, the
bidirectional transcriptional activity observed at enhancers is also present at promoters, as
suggested previously
17
, but in promoters only the antisense RNA is degraded. Furthermore,
the CAGE expression of enhancers in control and hMTR4-depleted cells was proportional
(Supplementary Fig. 14d), suggesting that virtually all identified enhancers produce
exosome-sensitive RNAs. The number of detectable bidirectional CAGE peaks increased
1.7-fold upon hMTR4 depletion and novel enhancer candidates had on average similar, but
weaker, chromatin modification signals compared to control HeLa cells (Supplementary Fig.
14e).
Andersson et al. Page 4
Nature
. Author manuscript; available in PMC 2017 January 05.

Citations
More filters
Journal ArticleDOI
TL;DR: A new method is introduced, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers, which is computationally tractable at very large sample sizes and leverages genome-wide information.
Abstract: Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

1,939 citations


Cites background from "An atlas of active enhancers across..."

  • ...The 24 main annotations include: coding, UTR, promoter, and intron [14, 17]; histone marks H3K4me1, H3K4me3, H3K9ac [3–5] and two versions of H3K27ac [18, 19]; open chromatin reflected by DNase I hypersensitivity Site (DHS) regions [5, 14]; combined chromHMM/Segway predictions [20], which make use of many ENCODE annotations to produce a single partition of the genome into seven underlying “chromatin states”; regions that are conserved in mammals [21, 22]; superenhancers, which are large clusters of highly active enhancers [19]; and enhancers with balanced bidirectional capped transcripts identified using cap analysis of gene expression in the FANTOM5 panel of samples, which we call FANTOM5 enhancers [23]....

    [...]

  • ...Second, FANTOM5 Enhancers [23] were extremely enriched in the three immunological diseases, with 0....

    [...]

  • ...Second, FANTOM5 Enhancers [23] were extremely enriched in the three immunological diseases, with 0.4% of SNPs explaining an estimated 15% of SNP-heritability on average across these three diseases (P = 10−4, 2×10−4, and 0.03 for Crohn’s disease, Ulcerative Colitis, and Rheumatoid arthritis, respectively), but showed no evidence of enrichment for non-immunological traits (Figure 5)....

    [...]

Journal ArticleDOI
27 Mar 2014-Nature
TL;DR: For example, the authors mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body.
Abstract: Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research

1,715 citations

Journal ArticleDOI
TL;DR: A review of the mechanisms of lncRNA biogenesis, localization and functions in transcriptional, post-transcriptional and other modes of gene regulation, and their potential therapeutic applications is presented in this article.
Abstract: Evidence accumulated over the past decade shows that long non-coding RNAs (lncRNAs) are widely expressed and have key roles in gene regulation. Recent studies have begun to unravel how the biogenesis of lncRNAs is distinct from that of mRNAs and is linked with their specific subcellular localizations and functions. Depending on their localization and their specific interactions with DNA, RNA and proteins, lncRNAs can modulate chromatin function, regulate the assembly and function of membraneless nuclear bodies, alter the stability and translation of cytoplasmic mRNAs and interfere with signalling pathways. Many of these functions ultimately affect gene expression in diverse biological and physiopathological contexts, such as in neuronal disorders, immune responses and cancer. Tissue-specific and condition-specific expression patterns suggest that lncRNAs are potential biomarkers and provide a rationale to target them clinically. In this Review, we discuss the mechanisms of lncRNA biogenesis, localization and functions in transcriptional, post-transcriptional and other modes of gene regulation, and their potential therapeutic applications.

1,630 citations

Journal ArticleDOI
TL;DR: A genetic meta-analysis of depression found 269 associated genes that highlight several potential drug repositioning opportunities, and relationships with depression were found for neuroticism and smoking.
Abstract: Major depression is a debilitating psychiatric illness that is typically associated with low mood and anhedonia. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximize sample size, we meta-analyzed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 genesets associated with depression, including both genes and gene pathways associated with synaptic structure and neurotransmission. An enrichment analysis provided further evidence of the importance of prefrontal brain regions. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant after multiple testing correction. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding etiology and developing new treatment approaches.

1,312 citations

Journal ArticleDOI
02 Nov 2017-Nature
TL;DR: A genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry finds that heritability of Breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2–5-fold enriched relative to the genome- wide average.
Abstract: Breast cancer risk is influenced by rare coding variants in susceptibility genes, such as BRCA1, and many common, mostly non-coding variants. However, much of the genetic contribution to breast cancer risk remains unknown. Here we report the results of a genome-wide association study of breast cancer in 122,977 cases and 105,974 controls of European ancestry and 14,068 cases and 13,104 controls of East Asian ancestry. We identified 65 new loci that are associated with overall breast cancer risk at P < 5 × 10-8. The majority of credible risk single-nucleotide polymorphisms in these loci fall in distal regulatory elements, and by integrating in silico data to predict target genes in breast cells at each locus, we demonstrate a strong overlap between candidate target genes and somatic driver genes in breast tumours. We also find that heritability of breast cancer due to all single-nucleotide polymorphisms in regulatory features was 2-5-fold enriched relative to the genome-wide average, with strong enrichment for particular transcription factor binding sites. These results provide further insight into genetic susceptibility to breast cancer and will improve the use of genetic risk scores for individualized screening and prevention.

1,014 citations

References
More filters
Journal ArticleDOI
TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.

31,015 citations

Journal ArticleDOI
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

29,413 citations


"An atlas of active enhancers across..." refers methods in this paper

  • ...These were grouped according to Hela-S3 expression tertiles: low (36), mid-level (41) and strong (46)....

    [...]

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

Journal ArticleDOI
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Abstract: Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing webbased methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

18,858 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations

Related Papers (5)
19 Feb 2015-Nature
Anshul Kundaje, Wouter Meuleman, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Pouya Kheradpour, Zhizhuo Zhang, Zhizhuo Zhang, Jianrong Wang, Jianrong Wang, Michael J. Ziller, Viren Amin, John W. Whitaker, Matthew D. Schultz, Lucas D. Ward, Lucas D. Ward, Abhishek Sarkar, Abhishek Sarkar, Gerald Quon, Gerald Quon, Richard Sandstrom, Matthew L. Eaton, Matthew L. Eaton, Yi-Chieh Wu, Yi-Chieh Wu, Andreas R. Pfenning, Andreas R. Pfenning, Xinchen Wang, Xinchen Wang, Melina Claussnitzer, Melina Claussnitzer, Yaping Liu, Yaping Liu, Cristian Coarfa, R. Alan Harris, Noam Shoresh, Charles B. Epstein, Elizabeta Gjoneska, Elizabeta Gjoneska, Danny Leung, Wei Xie, R. David Hawkins, Ryan Lister, Chibo Hong, Philippe Gascard, Andrew J. Mungall, Richard A. Moore, Eric Chuah, Angela Tam, Theresa K. Canfield, R. Scott Hansen, Rajinder Kaul, Peter J. Sabo, Mukul S. Bansal, Mukul S. Bansal, Mukul S. Bansal, Annaick Carles, Jesse R. Dixon, Kai How Farh, Soheil Feizi, Soheil Feizi, Rosa Karlic, Ah Ram Kim, Ah Ram Kim, Ashwinikumar Kulkarni, Daofeng Li, Rebecca F. Lowdon, Ginell Elliott, Tim R. Mercer, Shane Neph, Vitor Onuchic, Paz Polak, Paz Polak, Nisha Rajagopal, Pradipta R. Ray, Richard C Sallari, Richard C Sallari, Kyle Siebenthall, Nicholas A Sinnott-Armstrong, Nicholas A Sinnott-Armstrong, Michael Stevens, Robert E. Thurman, Jie Wu, Bo Zhang, Xin Zhou, Arthur E. Beaudet, Laurie A. Boyer, Philip L. De Jager, Philip L. De Jager, Peggy J. Farnham, Susan J. Fisher, David Haussler, Steven J.M. Jones, Steven J.M. Jones, Wei Li, Marco A. Marra, Michael T. McManus, Shamil R. Sunyaev, Shamil R. Sunyaev, James A. Thomson, Thea D. Tlsty, Li-Huei Tsai, Li-Huei Tsai, Wei Wang, Robert A. Waterland, Michael Q. Zhang, Lisa Helbling Chadwick, Bradley E. Bernstein, Bradley E. Bernstein, Bradley E. Bernstein, Joseph F. Costello, Joseph R. Ecker, Martin Hirst, Alexander Meissner, Aleksandar Milosavljevic, Bing Ren, John A. Stamatoyannopoulos, Ting Wang, Manolis Kellis, Manolis Kellis 
Frequently Asked Questions (18)
Q1. What contributions have the authors mentioned in the paper "An atlas of active enhancers across human cell types and tissues" ?

In this paper, Andersson, Robin, Gebhard, Claudia, Miguel-Escalada, Irene, Hoof, and Hoof 's daughter, Jorgensen, Jørgensen, Mette, Chen, Ntini, Evgenia, Arner, Eivind, Li, Kang, Schwarzfischer, Lucia, Glatz, Dagmar, Raithel, Johanna, Johanne, Lilje, Berit, Rapin, Nicolas, Bagger, Frederik Otzen, Jordi, Jormann, 

Features distinguishing enhancers from mRNA promoters are: i) enhancer RNAs are exosome-sensitive regardless of direction while (sense) mRNAs have a longer half-life than their antisense counterpart; ii) enhancer RNAs are short, unspliced, nuclear and non-polyadenylated and iii) enhancers have downstream pA and 5’ splice motif frequencies at genomic background level similar to antisense PROMPTs, while mRNAs are depleted of termination signals and enriched for 5’ splice sites 11,12. 

To confirm that candidate enhancers can drive tissue-specific gene expression in vivo, five evolutionarily conserved CAGE-defined human enhancers (including the POU3F2 and MEF2C-proximal enhancers identified above) were tested via Tol2-mediated transgenesis in zebrafish embryos. 

The number of robustly expressed enhancers and genes per sample were normalized to enhancers and genes per million mapped tags, utilizing the total number of mapped CAGE tags in each sample, and further log-transformed. 

Amplification of target regions was followed by SAP treatment, reverse transcription and subsequent RNA base-specific cleavage (MassCLEAVE, San Diego, CA) as previously described58. 

The expression values of both flanking windows were normalized by converting tag counts to tags per million mapped reads (TPM) and further normalization between samples was done using the RLE normalization procedure in edgeR41. 

Genome browser tracks for enhancers with user-definable expression specificity-constraints can be generated at http://enhancer.binf.ku.dk. 

Examples include Graves’ disease-associated SNPs enriched in enhancers that are expressed predominantly in thyroid tissue, and similarly lymphocytes for chronic lymphocytic leukemia. 

Positional cross correlations were calculated between reverse and forward CAGE tag 5’ ends at ChIP-seq derived active HeLa-S3 and GM12878 enhancer center positions (as determined by P300 peaks) +/−300 bp (max lag 300) to identify their most likely separation. 

The resulting sub-clusters broke up enhancers into 201 and 247 ubiquitous enhancers (u-enhancers) defined by cell type and tissue facets, respectively, (these sets intersect by 106 enhancers) and non-ubiquitous enhancers. 

In order to visualize the complexity and specialization of facets according to usage and specificity score of enhancers and genes, the authors counted the frequency of facet-used enhancers (significantly expressed in at least one contained sample) and gene promoters (≥ 1 TPM in at least one sample) with a specificity score in any of 20 bins distributed between 0 and 1. 

For many traits where enriched disease-associated SNPs were within enhancers, enhancer activity was detected in pathologically relevant cell types (Fig. 6d, and Supplementary Figs 31 and 32). 

Each enhancer was considered differentially expressed in a facet with at least one pair-wise significant differential expression and overall positive standard linear statistics. 

These were further filtered to not overlap with their set of 43,011 predicted enhancers, which yielded 98,942 random genomic regions whose expression levels were quantified and normalized in the same manner as described for bidirectional loci (above). 

To summarize the features of u-enhancers in terms of expression width and variance, identified in a single plot, the authors used those enhancers falling into u-enhancer group from the tissue clustering. 

where facets use a higher fraction of specific enhancers include immune cells, neurons, neural stem cells and hepatocytes amongst the cell type facets, and brain, blood, liver and testis amongst the organ/tissue facets. 

Reporter activity of ENCODE enhancers in relation to transcriptional statusThe authors used published8 results on a massively parallel reporter assay measuring the activity of ENCODE-predicted enhancers in HepG2 and K562 cells. 

Motif enrichment was analyzed using HOMER36 version 3, a suite of tools for motif discovery and next-generation sequencing analysis (http://biowhat.ucsd.edu/homer/).