An atlas of active enhancers across human cell types and tissues
Summary (3 min read)
INTRODUCTION
- Precise regulation of gene expression in time and space is required for development, differentiation and homeostasis in higher organisms 1 .
- Enhancers were originally defined as remote elements that increase transcription independent of their orientation, position and distance to a promoter 3 .
- They were only recently found to initiate RNA polymerase II transcription, producing so-called eRNAs 4 .
- Genomic locations of enhancers used by cells can be detected by mapping of chromatin marks and transcription factor binding sites from chromatin immunoprecipitation (ChIP) assays and DNase I hypersensitive sites (DHSs) (reviewed in ref. 1), but there has been no systematic analysis of enhancer usage in the large variety of cell types and tissues present in the human body.
- Using Cap Analysis of Gene Expression 5 (CAGE), the authors show that enhancer activity can be detected through the presence of balanced bidirectional capped transcripts, enabling the identification of enhancers from small primary cell populations.
Bidirectional pairs of capped RNAs identify active enhancers
- The FANTOM5 project has generated a CAGE-based transcription start site (TSS) atlas across a broad panel of primary cells, tissues, and cell lines covering the vast majority of human cell types 6 .
- Similar patterns were observed in other cell lines (Supplementary Fig. 2a ).
- While capped RNAs of protein-coding gene promoters were strongly biased towards the sense direction, similar levels of capped RNA in both directions were detected at enhancers (Fig. 1b , and Supplementary Fig. 2b, c ).
- Interestingly, the candidates were depleted of CpG islands (CGI) and repeats (with the exception of neural stem cells, see ref. 9).
- While 67.4-73.9% of the CAGE-defined enhancers showed significant reporter activity, only 20-33.3% of the untranscribed candidate regulatory regions were active (Fig. 1c , and Supplementary Fig. 9a ).
CAGE expression identifies cell-specific enhancer usage
- To test whether CAGE expression can identify cell type-specific enhancer usage in vivo, ChIP-seq (H3K27ac and H3K4me1), DNA methylation and triplicate CAGE analyses were performed in five primary blood cell types, and compared to published DHS data (www.roadmapepigenomics.org, Supplementary Table 4 ).
- CAGE-defined enhancers were strongly supported by proximal H3K4me1/H3K27ac peaks (71%) and DHSs (87%) from the same cell type.
- Moreover, there was a clear correlation between CAGE, DNase I hypersensitivity, H3K4me1 and H3K27ac for CAGE-defined enhancers expressed in blood cells (Fig. 3a ).
- Accordingly, cell type-specific enhancer expression corresponds to cell type-specific histone modifications (Fig. 3b ).
- Thus, bidirectional CAGE pairs are robust predictors for cell type-specific enhancer activity.
An atlas of transcribed enhancers over human cells and tissues
- The FANTOM5 CAGE library collection 6 enables the dissection of enhancer usage across cell types and tissues comprehensively sampled across the human body.
- The results corroborate the functional relevance of these enhancers for tissue-specific gene expression and suggest that they are an important part of the regulatory programs of cellular differentiation and organogenesis.
- The authors grouped the primary cell and tissue samples into larger, mutually exclusive cell type and organ/tissue groups , respectively, with similar function or morphology (Supplementary Tables 10 and 11 ).
- From the data the authors can draw several conclusions:.
- Facets in which the authors detect many enhancers typically also have a higher fraction of facet-specific enhancers (Supplementary Fig. 22c, d ).
Expression clustering reveals ubiquitous enhancers
- Compared to other enhancers, the ubiquitous (u-) enhancers are 8 times more likely to overlap CGIs and they are twice as conserved (Supplementary Fig. 26a-c ).
- U-enhancers overlap typical chromatin enhancer marks but have higher H3K4me3 signal (Supplementary Fig. 26d ).
- P<1.5e-8, Mann-Whitney U test), the transcripts remain predominantly (~78%) unspliced and significantly shorter (P<4.2e-18, Mann-Whitney U test) than mRNAs (Supplementary Fig. 27-28), do not share exons with known genes, and are exosome-sensitive (Supplementary Fig. 14b ).
- Therefore, it is unlikely that these are novel mRNA promoters.
- They are also highly enriched for P300 and cohesin ChIP-seq peaks 20 and RNAPII-mediated ChIA-PET signal 21 compared to other enhancers (Supplementary Fig. 26d ).
Linking enhancer usage with TSS expression
- Uniquely, FANTOM5 CAGE allows for direct comparison between transcriptional activity of the enhancer and of putative target gene TSSs across a diverse set ofhuman cells.
- Based on pair-wise expression correlation, nearly half (40%) of the inferred TSS-associated enhancers were linked with the nearest TSS, and 64% of enhancers have at least one correlated TSS within 500kb.
- Several (10,260, 15.3%) associations are supported by ChIA-PET interaction data 21 , and the supported fraction increases with the correlation threshold (Supplementary Fig. 29a ).
- The fraction of supported associations is 4.8-fold higher than that of associations predicted from DNase hypersensitivity correlations 10 (20.6% vs. 4.3%, at the same correlation threshold), indicating that transcription is a better predictor of regulatory targets than chromatin accessibility.
- One hypothesis explaining the function of multiple enhancers driving the same expression pattern is that they might confer higher transcriptional output of a gene 25, 26 .
Disease-associated SNPs are enriched in enhancers
- Many disease-associated SNPs are located outside of protein-coding exons and a large proportion of human genes display expression polymorphism 28 .
- Using the NHGRI GWAS catalog 29 and extending the compilation of lead SNPs with proxy SNPs in strong linkage disequilibrium (similar to refs. 30,31), the authors identified diseases/traits whose associated SNPs overlapped enhancers, promoters, exons and random regions significantly more than expected by chance (Fisher's Exact Test P<0.01, Supplementary Table 16 ).
- For many traits where enriched disease-associated SNPs were within enhancers, enhancer activity was detected in pathologically relevant cell types (Fig. 6d , and Supplementary Figs 31 and 32 ).
- Examples include Graves' disease-associated SNPs enriched in enhancers that are expressed predominantly in thyroid tissue, and similarly lymphocytes for chronic lymphocytic leukemia.
CONCLUSIONS
- The data presented here demonstrate that bidirectional capped RNAs, as measured by CAGE, are robust predictors of enhancer activity in a cell.
- Transcription is only measured at a fraction of chromatin-defined enhancers and few untranscribed enhancers show potential enhancer activity.
- Of course, given the relative instability of enhancer RNAs some chromatin-defined sites may be active but fall below the limits of detection of CAGE.
- This view is not supported by the larger FANTOM5 dataset.
- It has clear applications in human genetics, to narrow the search windows for functional association, and for the definition of regulatory networks that underpin the processes of cellular differentiation and organogenesis in human development.
Did you find this useful? Give us your feedback
Citations
1,939 citations
Cites background from "An atlas of active enhancers across..."
...The 24 main annotations include: coding, UTR, promoter, and intron [14, 17]; histone marks H3K4me1, H3K4me3, H3K9ac [3–5] and two versions of H3K27ac [18, 19]; open chromatin reflected by DNase I hypersensitivity Site (DHS) regions [5, 14]; combined chromHMM/Segway predictions [20], which make use of many ENCODE annotations to produce a single partition of the genome into seven underlying “chromatin states”; regions that are conserved in mammals [21, 22]; superenhancers, which are large clusters of highly active enhancers [19]; and enhancers with balanced bidirectional capped transcripts identified using cap analysis of gene expression in the FANTOM5 panel of samples, which we call FANTOM5 enhancers [23]....
[...]
...Second, FANTOM5 Enhancers [23] were extremely enriched in the three immunological diseases, with 0....
[...]
...Second, FANTOM5 Enhancers [23] were extremely enriched in the three immunological diseases, with 0.4% of SNPs explaining an estimated 15% of SNP-heritability on average across these three diseases (P = 10−4, 2×10−4, and 0.03 for Crohn’s disease, Ulcerative Colitis, and Rheumatoid arthritis, respectively), but showed no evidence of enrichment for non-immunological traits (Figure 5)....
[...]
1,715 citations
1,630 citations
1,312 citations
1,014 citations
References
31,015 citations
29,413 citations
"An atlas of active enhancers across..." refers methods in this paper
...These were grouped according to Hela-S3 expression tertiles: low (36), mid-level (41) and strong (46)....
[...]
20,335 citations
18,858 citations
13,656 citations
Related Papers (5)
Frequently Asked Questions (18)
Q2. What are the features distinguishing enhancers from mRNA promoters?
Features distinguishing enhancers from mRNA promoters are: i) enhancer RNAs are exosome-sensitive regardless of direction while (sense) mRNAs have a longer half-life than their antisense counterpart; ii) enhancer RNAs are short, unspliced, nuclear and non-polyadenylated and iii) enhancers have downstream pA and 5’ splice motif frequencies at genomic background level similar to antisense PROMPTs, while mRNAs are depleted of termination signals and enriched for 5’ splice sites 11,12.
Q3. How did the authors test the enhancers in zebrafish embryos?
To confirm that candidate enhancers can drive tissue-specific gene expression in vivo, five evolutionarily conserved CAGE-defined human enhancers (including the POU3F2 and MEF2C-proximal enhancers identified above) were tested via Tol2-mediated transgenesis in zebrafish embryos.
Q4. How many enhancers were normalized to the number of tags in each sample?
The number of robustly expressed enhancers and genes per sample were normalized to enhancers and genes per million mapped tags, utilizing the total number of mapped CAGE tags in each sample, and further log-transformed.
Q5. What was the first step in the amplification of target regions?
Amplification of target regions was followed by SAP treatment, reverse transcription and subsequent RNA base-specific cleavage (MassCLEAVE, San Diego, CA) as previously described58.
Q6. How was the expression of the flanking windows normalized?
The expression values of both flanking windows were normalized by converting tag counts to tags per million mapped reads (TPM) and further normalization between samples was done using the RLE normalization procedure in edgeR41.
Q7. Where can The authorfind a track for enhancers?
Genome browser tracks for enhancers with user-definable expression specificity-constraints can be generated at http://enhancer.binf.ku.dk.
Q8. What are some examples of SNPs that are enriched in enhancers?
Examples include Graves’ disease-associated SNPs enriched in enhancers that are expressed predominantly in thyroid tissue, and similarly lymphocytes for chronic lymphocytic leukemia.
Q9. How many peaks were found at the GM12878 enhancer center positions?
Positional cross correlations were calculated between reverse and forward CAGE tag 5’ ends at ChIP-seq derived active HeLa-S3 and GM12878 enhancer center positions (as determined by P300 peaks) +/−300 bp (max lag 300) to identify their most likely separation.
Q10. How many enhancers were grouped in the facets?
The resulting sub-clusters broke up enhancers into 201 and 247 ubiquitous enhancers (u-enhancers) defined by cell type and tissue facets, respectively, (these sets intersect by 106 enhancers) and non-ubiquitous enhancers.
Q11. How many facets were used to visualize the complexity and specialization of facets?
In order to visualize the complexity and specialization of facets according to usage and specificity score of enhancers and genes, the authors counted the frequency of facet-used enhancers (significantly expressed in at least one contained sample) and gene promoters (≥ 1 TPM in at least one sample) with a specificity score in any of 20 bins distributed between 0 and 1.
Q12. How many enhancers were found in pathologically relevant cells?
For many traits where enriched disease-associated SNPs were within enhancers, enhancer activity was detected in pathologically relevant cell types (Fig. 6d, and Supplementary Figs 31 and 32).
Q13. How was the expression of each enhancer evaluated?
Each enhancer was considered differentially expressed in a facet with at least one pair-wise significant differential expression and overall positive standard linear statistics.
Q14. How many enhancers were filtered to not overlap with the predicted set?
These were further filtered to not overlap with their set of 43,011 predicted enhancers, which yielded 98,942 random genomic regions whose expression levels were quantified and normalized in the same manner as described for bidirectional loci (above).
Q15. What is the way to summarize the features of u-enhancers?
To summarize the features of u-enhancers in terms of expression width and variance, identified in a single plot, the authors used those enhancers falling into u-enhancer group from the tissue clustering.
Q16. What are the common enhancers found in facets?
where facets use a higher fraction of specific enhancers include immune cells, neurons, neural stem cells and hepatocytes amongst the cell type facets, and brain, blood, liver and testis amongst the organ/tissue facets.
Q17. What was the effect of the reporter assay on transcriptional status?
Reporter activity of ENCODE enhancers in relation to transcriptional statusThe authors used published8 results on a massively parallel reporter assay measuring the activity of ENCODE-predicted enhancers in HepG2 and K562 cells.
Q18. What tools were used to analyze the motifs?
Motif enrichment was analyzed using HOMER36 version 3, a suite of tools for motif discovery and next-generation sequencing analysis (http://biowhat.ucsd.edu/homer/).