scispace - formally typeset
Open AccessJournal ArticleDOI

A promoter-level mammalian expression atlas

Alistair R. R. Forrest, +280 more
- 27 Mar 2014 - 
- Vol. 507, Iss: 7493, pp 462-470
Reads0
Chats0
TLDR
For example, the authors mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body.
Abstract
Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research

read more

Content maybe subject to copyright    Report

A promoter-level mammalian expression atlas
Author
Forrest, Alistair RR, Kawaji, Hideya, Rehli, Michael, Baillie, J Kenneth, de Hoon, Michiel
JL, Haberle, Vanja, Lassmann, Timo, Kulakovskiy, Ivan V, Lizio, Marina, Itoh, Masayoshi,
Andersson, Robin, Mungall, Christopher J, Meehan, Terrence F, Schmeier, Sebastian, Bertin,
Nicolas, Jorgensen, Mette, Dimont, Emmanuel, Arner, Erik, Schmidl, Christian, Schaefer, Ulf,
Medvedeva, Yulia A, Plessy, Charles, Vitezic, Morana, Severin, Jessica, Semple, Colin A,
Ishizu, Yuri, Young, Robert S, Francescatto, Margherita, Alam, Intikhab, Albanese, Davide,
Altschuler, Gabriel M, Arakawa, Takahiro, Archer, John AC, Arner, Peter, Babina, Magda,
Rennie, Sarah, Balwierz, Piotr J, Beckhouse, Anthony G, Pradhan-Bhatt, Swati, Blake, Judith
A, Blumenthal, Antje, Bodega, Beatrice, Bonetti, Alessandro, Briggs, James, Brombacher,
Frank, Burroughs, A Maxwell, Califano, Andrea, Cannistraci, Carlo V, Carbajo, Daniel, Chen,
Yun, Chierici, Marco, Ciani, Yari, Clevers, Hans C, Dalla, Emiliano, Davis, Carrie A, Detmar,
Michael, Diehl, Alexander D, Dohi, Taeko, Drablos, Finn, Edge, Albert SB, Edinger, Matthias,
Ekwall, Karl, Endoh, Mitsuhiro, Enomoto, Hideki, Fagiolini, Michela, Fairbairn, Lynsey, Fang,
Hai, Farach-Carson, Mary C, Faulkner, Geoffrey J, Favorov, Alexander V, Fisher, Malcolm E,
Frith, Martin C, Fujita, Rie, Fukuda, Shiro, Furlanello, Cesare, Furuno, Masaaki, Furusawa, Jun-
ichi, Geijtenbeek, Teunis B, Gibson, Andrew P, Gingeras, Thomas, Goldowitz, Daniel, Gough,
Julian, Guhl, Sven, Guler, Reto, Gustincich, Stefano, Ha, Thomas J, Hamaguchi, Masahide,
Hara, Mitsuko, Harbers, Matthias, Harshbarger, Jayson, Hasegawa, Akira, Hasegawa, Yuki,
Hashimoto, Takehiro, Herlyn, Meenhard, Hitchens, Kelly J, Sui, Shannan J Ho, Hofmann,
Oliver M, Hoof, Ilka, Hori, Fumi, Huminiecki, Lukasz, Iida, Kei, Ikawa, Tomokatsu, Jankovic,
Boris R, Jia, Hui, Joshi, Anagha, Jurman, Giuseppe, Kaczkowski, Bogumil, Kai, Chieko, Kaida,
Kaoru, Kaiho, Ai, Kajiyama, Kazuhiro, Kanamori-Katayama, Mutsumi, Kasianov, ArtemS,
Kasukawa, Takeya, Katayama, Shintaro, Kato, Sachi, Kawaguchi, Shuji, Kawamoto, Hiroshi,
Kawamura, Yuki I, Kawashima, Tsugumi, Kempfle, Judith S, Kenna, Tony J, Kere, Juha,
Khachigian, Levon M, Kitamura, Toshio, Klinken, S Peter, Knox, Alan J, Kojima, Miki, Kojima,
Soichi, Kondo, Naoto, Koseki, Haruhiko, Koyasu, Shigeo, Krampitz, Sarah, Kubosaki, Atsutaka,
Kwon, Andrew T, Laros, Jeroen FJ, Lee, Weonju, Lennartsson, Andreas, Li, Kang, Lilje, Berit,
Lipovich, Leonard, Mackay-sim, Alan, Manabe, Ri-ichiroh, Mar, Jessica C, Marchand, Benoit,
Mathelier, Anthony, Mejhert, Niklas, Meynert, Alison, Mizuno, Yosuke, Morais, David A de
Lima, Morikawa, Hiromasa, Morimoto, Mitsuru, Moro, Kazuyo, Motakis, Efthymios, Motohashi,
Hozumi, Mummery, Christine L, Murata, Mitsuyoshi, Nagao-Sato, Sayaka, Nakachi, Yutaka,
Nakahara, Fumio, Nakamura, Toshiyuki, Nakamura, Yukio, Nakazato, Kenichi, Van Nimwegen,
Erik, Ninomiya, Noriko, Nishiyori, Hiromi, Noma, Shohei, Nozaki, Tadasuke, Ogishima, Soichi,
Ohkura, Naganari, Ohmiya, Hiroko, Ohno, Hiroshi, Ohshima, Mitsuhiro, Okada-Hatakeyama,
Mariko, Okazaki, Yasushi, Orlando, Valerio, Ovchinnikov, Dmitry A, Pain, Arnab, Passier,
Robert, Patrikakis, Margaret, Persson, Helena, Piazza, Silvano, Prendergast, James GD,
Rackham, Owen JL, Ramilowski, Jordan A, Rashid, Mamoon, Ravasi, Timothy, Rizzu, Patrizia,
Roncador, Marco, Roy, Sugata, Rye, Morten B, Saijyo, Eri, Sajantila, Antti, Saka, Akiko,
Sakaguchi, Shimon, Sakai, Mizuho, Sato, Hiroki, Satoh, Hironori, Savvi, Suzana, Saxena,
Alka, Schneider, Claudio, Schultes, Erik A, Schulze-Tanzil, Gundula G, Schwegmann, Anita,
Sengstag, Thierry, Sheng, Guojun, Shimoji, Hisashi, Shimoni, Yishai, Shin, Jay W, Simon,
Christophe, Sugiyama, Daisuke, Sugiyama, Takaaki, Suzuki, Masanori, Suzuki, Naoko,
Swoboda, Rolf K, 't Hoen, Peter AC, Tagami, Michihira, Takahashi, Naoko, Takai, Jun, Tanaka,
Hiroshi, Tatsukawa, Hideki, Tatum, Zuotian, Thompson, Mark, Toyoda, Hiroo, Toyoda, Tetsuro,
Valen, Eivind, van de Wetering, Marc, van den Berg, Linda M, Verardo, Roberto, Vijayan, Dipti,
Vorontsov, Ilya E, Wasserman, Wyeth W, Watanabe, Shoko, Wells, Christine A, Winteringham,
Louise N, Wolvetang, Ernst, Wood, Emily J, Yamaguchi, Yoko, Yamamoto, Masayuki, Yoneda,
Misako, Yonekura, Yohei, Yoshida, Shigehiro, Zabierowski, Susan E, Zhang, Peter G, Zhao,
Xiaobei, Zucchelli, Silvia, Summers, Kim M, Suzuki, Harukazu, Daub, Carsten O, Kawai, Jun,

Heutink, Peter, Hide, Winston, Freeman, Tom C, Lenhard, Boris, Bajic, Vladimir B, Taylor,
Martin S, Makeev, Vsevolod J, Sandelin, Albin, Hume, David A, Carninci, Piero, Hayashizaki,
Yoshihide
Published
2014
Journal Title
Nature: International Journal of Weekly Science
Version
Accepted Manuscript (AM)
DOI
https://doi.org/10.1038/nature13182
Copyright Statement
© 2014 Nature Publishing Group. This is the author-manuscript version of this paper.
Reproduced in accordance with the copyright policy of the publisher. Please refer to the journal
website for access to the definitive, published version.
Downloaded from
http://hdl.handle.net/10072/102455
Griffith Research Online
https://research-repository.griffith.edu.au

A promoter-level mammalian expression atlas
The FANTOM Consortium and the RIKEN PMI and CLST (DGT)
*
*
Lists of participants and their affiliations appear at the end of the paper
Abstract
Regulated transcription controls the diversity, developmental pathways and spatial organization of
the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we
mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell
lines and tissues to produce a comprehensive overview of mammalian gene expression across the
human body. We find that few genes are truly ‘housekeeping’, whereas many mammalian
promoters are composite entities composed of several closely separated TSSs, with independent
cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates,
whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression
analysis reveals key transcription factors defining cell states and links them to binding-site motifs.
The functions of identified novel transcripts can be predicted by coexpression and sample
Reprints and permissions information is available at www.nature.com/reprints.
Correspondence and requests for materials should be addressed to A.R.R.F (alistair.forrest@gmail.com), P.C. (carninci@riken.jp) or
Y.H. (yosihide@gsc.riken.jp).
Supplementary Information is available in the online version of the paper.
Online Content Any additional Methods, Extended Data display items and Source Data are available in the online version of the
paper; references unique to these sections appear only in the online paper.
Author Contributions The core members of FANTOM5 phase 1 were Alistair R. R. Forrest, Hideya Kawaji, Michael Rehli, J.
Kenneth Baillie, Michiel J. L. de Hoon, Timo Lassmann, Masayoshi Itoh, Kim M. Summers, Harukazu Suzuki, Carsten O. Daub, Jun
Kawai, Peter Heutink, Winston Hide, Tom C. Freeman, Boris Lenhard, Vladimir B. Bajic, Martin S. Taylor, Vsevolod J. Makeev,
Albin Sandelin, David A. Hume, Piero Carninci and Yoshihide Hayashizaki. Samples were provided by: A. Blumenthal, A. Bonetti,
A. Mackay-sim, A. Sajantila, A. Saxena, A. Schwegmann, A.G.B., A.J.K., A.L., A.R.R.F., A.S.B.E., B.B., C. Schmidl, C. Schneider,
C.A.D., C.A.W., C.K., C.L.M., D.A.H., D.A.O., D.G., D.S., D.V., E.W., F.B., F.N., G.G.S., G.J.F., G.S., H. Kawamoto, H. Koseki, H.
Morikawa, H. Motohashi, H. Ohno, H. Sato, H. Satoh, H. Tanaka, H. Tatsukawa, H. Toyoda, H.C.C., H.E., J. Kere, J.B., J.F., J.K.B.,
J.S.K., J.T., J.W.S., K.E., K.J.H., K.M., K.M.S., L.F., L.M.K., L.M.vdB., L.N.W., M. Edinger, M. Endoh, M. Fagiolini, M.
Hamaguchi, M. Hara, M. Herlyn, M. Morimoto, M. Rehli, M. Yamamoto, M. Yoneda, M.B., M.C.F.C., M.D., M.E.F., M.O., M.O.H.,
M.P., M.vdW., N.M., N.O., N.T., P.A., P.G.Z., P.H., P.R., R.F., R.G., R.K.S., R.P., R.V., S. Guhl, S. Gustincich, S. Kojima, S.
Koyasu, S. Krampitz, S. Sakaguchi, S. Savvi, S.E.Z., S.O., S.P.B., S.P.K., S. Roy., S.Z., T. Kitamura, T. Nakamura, T. Nozaki, T.
Sugiyama, T.B.G., T.D., T.G., T.I., T.J.H., T.J.K., V.O., W.L., Y. Hasegawa, Y. Nakachi, Y. Nakamura, Y. Yamaguchi, Y. Yonekura,
Y.I., Y.I.K., Y.M. and Y.O. Analyses were carried out by: A. Mathelier, A. Meynert, A. Sandelin, A.C., A.D.D., A.P.G., A.H., A.J.,
A.M.B., A.P., A.R.R.F., A.S.K., A.T.K., A.V.F., B. Lenhard, B. Lilje, B.D., B.K., B.M., B.R.J., C. Schmidl, C. Schneider, C.A.S.,
C.F., C.J.M., C.O.D., C.P., C.V.C., D.A., D.A.M., D.C., E. Dalla, E. Dimont, E.A., E.A.S., E.J.W., E.M., E.V., Ev.N., F.D., G.J.,
G.J.F., G.M.A., H. Kawaji, H. Ohmiya, H. Shimoji, H.F., H.J., H.P., I.A., I.E.V., I.H., I.V.K., J.A.B., J.A.C.A., J.A.R., J.C.M.,
J.F.J.L., J.G., J.G.D.P., J.H., J.K.B., J.S., K. Kajiyama, K.I., K.L., L.H., L.L., M. Francescatto, M. Rashid, M. Rehli, M. Roncador, M.
Thompson, M.B.R., M.C., M.C.F., M.J., M.J.L.dH., M.L., M.S.T., M.V., N.B., O.J.L.R., O.M.H., P.A.C.tH., P.J.B, R.A., R.S.Y., S.
Katayama, S. Kawaguchi, S. Schmeier, S. Rennie, S.F., S.J.H.S., S.P., T. Sengstag, T.C.F., T.F.M., T.H., T.K., T.L., T.R., T.T., U.S.,
V.B.B., V.H., V.J.M., W.H., W.W.W., X.Z., Y. Chen, Y. Ciani, Y.A.M., Y.S., Z.T. Libraries were generated by: A. Kaiho, A.
Kubosaki, A. Saka, C. Simon, E.S., F.H., H.N., J. Kawai, K. Kaida, K.N., M. Furuno, M. Murata, M. Sakai, M. Tagami, M.I., M.K.,
M.K.K., N.K., N.N., N.S., P.C., R.M., S. Kato, S.N., S.N.-S., S.W., S.Y., T.A., T. Kawashima. The manuscript was written by
A.R.R.F. and D.A.H. with help from A. Sandelin, J.K.B., M. Rehli, H.K., M.J.L.dH., V.H., I.V.K., M.T. and K.M.S. with
contributions, edits and comments from all authors. The project was managed by Y. Hayashizaki, A.R.R.F., P.C., M.I., M.S., J.
Kawai, C.O.D., H. Suzuki, T.L. and N.K. The scientific coordinator was A.R.R.F and the general organizer was Y. Hayashizaki.
Author Information All CAGE data has been deposited at DDBJ DRA under accession number DRA000991.
The authors declare no competing financial interests. Readers are welcome to comment on the online version of the paper.
Published in final edited form as:
Nature. 2014 March 27; 507(7493): 462–470. doi:10.1038/nature13182.

ontology enrichment analyses. The functional annotation of the mammalian genome 5
(FANTOM5) project provides comprehensive expression profiles and functional annotation of
mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
The mammalian genome encodes the instructions to specify development from the zygote
through gastrulation, implantation and generation of the full set of organs necessary to
become an adult, to respond to environmental influences, and eventually to reproduce.
Although the genome information is the same in almost all cells of an individual, at least
400 distinct cell types
1
have their own regulatory repertoire of active and inactive genes.
Each cell type responds acutely to alterations in its environment with changes in gene
expression, and interacts with other cells to generate complex activities such as movement,
vision, memory and immune response.
Identities of cell types are determined by transcriptional cascades that start initially in the
fertilised egg. In each cell lineage, specific sets of transcription factors are induced or
repressed. These factors together provide proximal and distal regulatory inputs that are
integrated at transcription start sites (TSSs) to control the transcription of target genes. Most
genes have more than one TSS, and the regulatory inputs that determine TSS choice and
activity are diverse and complex (reviewed in ref. 2).
Unbiased annotation of the regulation, expression and function of mammalian genes
requires systematic sampling of the distinct mammalian cell types and methods that can
identify the set of TSSs and transcription factors that regulate their utilization. To this end,
the FANTOM5 project has performed cap analysis of gene expression (CAGE)
3
across 975
human and 399 mouse samples, including primary cells, tissues and cancer cell lines, using
single-molecule sequencing
3
(Fig. 1; see the full sample list in Supplementary Table 1).
CAGE libraries were sequenced to a median depth of 4 million mapped tags per sample
(Supplementary Methods) to produce a unique gene expression profile, focused specifically
on promoter utilization. CAGE has advantages over RNA-seq or microarrays for this
purpose, because it permits separate analysis of multiple promoters linked to the same
gene
13
. Moreover, we show in an accompanying manuscript
4
that the data can be used to
locate active enhancers, and to provide numerous insights into cell-type-specific
transcriptional regulatory networks (see the FANTOM5 website http://
fantom.gsc.riken.jp/5). The data extend and complement the recently published ENCODE
5
data, and microarray-based gene expression atlases
6
to provide a major resource for
functional genome annotation and for understanding the transcriptional networks
underpinning mammalian cellular differentiation.
The FANTOM5 promoter atlas
Single molecule CAGE profiles were generated across a collection of 573 human primary
cell samples (~ 3 donors for most cell types) and 128 mouse primary cell samples, covering
most mammalian cell steady states. This data set is complemented with profiles of 250
different cancer cell lines (all available through public repositories and representing 154
distinct cancer subtypes), 152 human post-mortem tissues and 271 mouse developmental
Page 2
Nature. Author manuscript; available in PMC 2015 August 08.

tissue samples (Fig. 1a; see the full sample list in Supplementary Table 1). To facilitate data
mining all samples were annotated using structured ontologies (Cell Ontology
7
, Uberon
8
,
Disease Ontology
9
). The results of all analyses are summarized in the FANTOM5 online
resource (http://fantom.gsc.riken.jp/5). We also developed two specialized tools for
exploration of the data. ZENBU, based on the genome browser concept, allows users to
interactively explore the relationship between genomic distribution of CAGE tags and
expression profiles
10
. SSTAR, an interconnected semantic tool, allows users to explore the
relationships between genes, promoters, samples, transcription factors, transcription factor
binding sites and coexpressed sets of promoters. These and other ways to access the data are
described in more detail in Supplementary Note 1.
CAGE peak identification and thresholding
To identify CAGE peaks across the genome we developed decomposition-based peak
identification (DPI; described in Supplementary Methods; Extended Data Fig. 1). This
method first clusters CAGE tags based on proximity. For clusters wider than 49 base pairs
(bp) it attempts to decompose the signal into non-overlapping sub-regions with different
expression profiles using independent component analysis
11
. Sample-and genome-wide, DPI
identified 3,492,729 peaks in human and 2,088,255 peaks in mouse. To minimize the
fraction of peaks
3
that map to internal exons (which could exist due to post-transcriptional
cleavage and recapping of RNAs
12
), and enrich for TSSs, we applied tag evidence
thresholds to define robust and permissive subsets (described in more detail in
Supplementary Methods and summarized in Table 1). Specifically the robust threshold,
which is used for most of the analyses presented here, enriched for peaks at known 5ends
compared to known internal exons by twofold (that is, two-thirds of the peaks hitting known
full-length transcript models hit the 5end). A flow diagram showing the relationship
between samples, peaks, thresholding and subsets used in each analysis is provided in the
Supplementary Figure 1. Supporting evidence that the peaks are genuine TSSs, based upon
support from expressed sequence tags (ESTs), histone H3 lysine 4 trimethylation
(H3K4Me3) marks and DNase hypersensitive sites is provided in Supplementary Note 2.
Figure 1b illustrates the 266 bp spanning transcription initiation region of B4GALT1, where
6 independent robust peaks were identified by DPI, each with a unique regulatory pattern
(Fig. 1c). A total of 58% of human and 56% of mouse robust peaks occur in such composite
transcription initiation regions, defined as clusters of robust peaks within 100 bases of each
other. More than half of these contain peaks with statistically significant differences in
expression profiles (63% of human and 54% of mouse composite transcription initiation
regions; likelihood ratio test, false discovery rate (FDR) < 1%, Extended Data Fig. 1d).
Supplementary Tables 2 and 3 summarize public domain EST evidence that these
independent peaks contained within composite transcription initiation regions give rise to
long RNAs.
Known gene coverage in FANTOM5
To provide annotation of the CAGE peaks, the distance between individual peaks and the 5
ends of known full-length transcripts was determined and then peaks within 500 bases of the
Page 3
Nature. Author manuscript; available in PMC 2015 August 08.

Figures
Citations
More filters
Journal ArticleDOI

Molecular and genetic properties of tumors associated with local immune cytolytic activity.

TL;DR: The genetic findings provide evidence for immunoediting in tumors and uncover mechanisms of tumor-intrinsic resistance to cytolytic activity, suggesting immune-mediated elimination.
Journal ArticleDOI

An atlas of active enhancers across human cell types and tissues

TL;DR: It is shown that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity.
Journal ArticleDOI

High expression of ACE2 receptor of 2019-nCoV on the epithelial cells of oral mucosa.

TL;DR: Findings have explained the basic mechanism that the oral cavity is a potentially high risk for 2019-nCoV infectious susceptibility and provided a piece of evidence for the future prevention strategy in dental clinical practice as well as daily life.
Journal ArticleDOI

Oncogenic Signaling Pathways in The Cancer Genome Atlas

TL;DR: This work charted the detailed landscape of pathway alterations in 33 cancer types, stratified into 64 subtypes, and identified patterns of co-occurrence and mutual exclusivity.
References
More filters
Journal ArticleDOI

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI

An integrated encyclopedia of DNA elements in the human genome

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Book

Independent Component Analysis

TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.
Journal ArticleDOI

The RNA helicase RIG-I has an essential function in double-stranded RNA-induced innate antiviral responses.

TL;DR: In this article, the authors identify retinoic acid inducible gene I (RIG-I), which encodes a DExD/H box RNA helicase that contains a caspase recruitment domain, as an essential regulator for dsRNA-induced signaling.
Journal ArticleDOI

A gene atlas of the mouse and human protein-encoding transcriptomes

TL;DR: In this paper, high-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale, and the authors have designed custom arrays that interrogate the expression of the vast majority of proteinencoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues.
Related Papers (5)

Integrative analysis of 111 reference human epigenomes

Anshul Kundaje, +123 more
- 19 Feb 2015 - 

Landscape of transcription in human cells

Sarah Djebali, +87 more
- 06 Sep 2012 - 
Frequently Asked Questions (6)
Q1. What have the authors contributed in "A promoter-level mammalian expression atlas author" ?

Supplementary Information is available in the online version of the paper. Online Content Any additional Methods, Extended Data display items and Source Data are available in the online version of the paper ; references unique to these sections appear only in the online paper. The manuscript was written by A. R. R. F. and D. A. H. with help from A. Sandelin, J. K. B., M. Rehli, H. K., M. J. L. dH., V. H., I. V. K., M. T. and K. M. S. with contributions, edits and comments from all authors. The project was managed by Y. Hayashizaki, A. R. R. F., P. C., M. I., M. S., J. Kawai, C. O. D., H. Suzuki, T. L. and N. K. The authors declare no competing financial interests. Readers are welcome to comment on the online version of the paper. The functional annotation of the mammalian genome 5 ( FANTOM5 ) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research. To this end, the FANTOM5 project has performed cap analysis of gene expression ( CAGE ) 3 across 975 human and 399 mouse samples, including primary cells, tissues and cancer cell lines, using single-molecule sequencing3 ( Fig. 1 ; see the full sample list in Supplementary Table 1 ). Moreover, the authors show in an accompanying manuscript4 that the data can be used to locate active enhancers, and to provide numerous insights into cell-type-specific transcriptional regulatory networks ( see the FANTOM5 website http: // fantom. The data extend and complement the recently published ENCODE5 data, and microarray-based gene expression atlases6 to provide a major resource for functional genome annotation and for understanding the transcriptional networks underpinning mammalian cellular differentiation. To identify CAGE peaks across the genome the authors developed decomposition-based peak identification ( DPI ; described in Supplementary Methods ; Extended Data Fig. 1 ). To minimize the fraction of peaks3 that map to internal exons ( which could exist due to post-transcriptional cleavage and recapping of RNAs12 ), and enrich for TSSs, the authors applied tag evidence thresholds to define robust and permissive subsets ( described in more detail in Supplementary Methods and summarized in Table 1 ). A flow diagram showing the relationship between samples, peaks, thresholding and subsets used in each analysis is provided in the Supplementary Figure 1. Supporting evidence that the peaks are genuine TSSs, based upon support from expressed sequence tags ( ESTs ), histone H3 lysine 4 trimethylation ( H3K4Me3 ) marks and DNase hypersensitive sites is provided in Supplementary Note 2. In comparison to the previous FANTOM3 and 4 projects, FANTOM5 measured expression at an additional 4,721 human and 5,127 mouse RefSeq genes. The inclusion of primary cells, cell lines and tissues in the atlas provided greater coverage than any of the sample types alone ( Fig. 1d ) and the primary cell samples in particular were a rich source of unannotated peaks ( Fig. 1e ). To enable comparative analysis, the authors projected the expression patterns from one species to the other ( Extended Data Fig. 4 ) and provide the peak position and orthologous expression profile through a cross-species track in ZENBU10. Examining CGI and non-CGI promoters separately the authors find that cell-type-specific promoters of both classes were enriched for binding of cell-type-specific transcription factors ( evidenced by over-representation of motifs and bound sites in public ChIP-seq data sets ). For the human hepatocellular carcinoma cell line HepG2 the authors observed enrichment of liver-specific transcription factors ( HNF4, FOXA2, and TCF7L2 ) at both CGI and non-CGI HepG2 specific promoters ( Extended Data Fig. 6b, c ; similar examples are shown in Extended Data Figs 5d and 7 ). To demonstrate their likely relevance the authors systematically reviewed phenotypes of transcription factor knockout mice at the MGI ( see Supplementary Note 7 ). Introducing sample ontology enrichment analysis ( SOEA ), the authors show that expression profiles can also be associated with cell, anatomical and disease ontology terms by testing for overrepresentation of terms in ranked lists of systematically annotated samples expressing each peak ( Extended Data Fig. 11 and Supplementary Methods ). The precise phasing was supported further by the pattern of H2A. This suggests a more rapidly evolving immune system. It also suggests contributions of relaxed constraint and positive selection to the remodelling of transcription initiation through the insertion and deletion of promoter sequences. For example, in mouse inner ear hair cells, knockout of six of the top 20 most enriched transcription factor genes in mouse ( Pou3f4 ( ref. 24 ), Sox2 ( ref. 25 ), Egr2, Six1 ( ref. 26 ), Fos27, Tbx18 ( ref. 28 ) ) as well as patient mutations in a further four top transcription factor genes ( POU4F3 ( ref. 29 ), ZIC2 ( ref. 30 ), SOX10 ( ref. 31 ), FOXF2 ( ref. 32 ) ) resulted in hearing-related defects. 

Of 19 promoters in coexpression group 413, eight were present in the KEGG pathway, including RIG-I (DDX58), the gene encoding the receptor for the mitochondrial antiviral signalling pathway53. 

The remaining 1,221 de novo motifs that were not similar to known motifs were then clustered using MACRO-APE, resulting in 169 unique novel motifs. 

Annotated expression profiles of alternative promoters Overlay of coexpression groups enriched for genes involved in the KEGG pathway for influenza A pathogenesis (hsa:05164; FDR < 0.1, n > 2). 

60Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan. 

CAGE profiles across the biological states (I) are shown as a greyscale plot, in which the x axis represents the genomic coordinates and individual rows represent individual biological states.