Author
Martin Hemberg
Other affiliations: Boston Children's Hospital, Wellcome Trust/Cancer Research UK Gurdon Institute, Imperial College London ...read more
Bio: Martin Hemberg is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: Gene & Biology. The author has an hindex of 37, co-authored 102 publications receiving 10274 citations. Previous affiliations of Martin Hemberg include Boston Children's Hospital & Wellcome Trust/Cancer Research UK Gurdon Institute.
Topics: Gene, Biology, RNA, Promoter, Transcriptome
Papers published on a yearly basis
Papers
More filters
••
TL;DR: It is revealed that a widespread mechanism of enhancer activation involves RNAPII binding and eRNA synthesis, which occurs specifically at enhancers that are actively engaged in promoting mRNA synthesis.
Abstract: We used genome-wide sequencing methods to study stimulus-dependent enhancer function in mouse cortical neurons. We identified approximately 12,000 neuronal activity-regulated enhancers that are bound by the general transcriptional co-activator CBP in an activity-dependent manner. A function of CBP at enhancers may be to recruit RNA polymerase II (RNAPII), as we also observed activity-regulated RNAPII binding to thousands of enhancers. Notably, RNAPII at enhancers transcribes bi-directionally a novel class of enhancer RNAs (eRNAs) within enhancer domains defined by the presence of histone H3 monomethylated at lysine 4. The level of eRNA expression at neuronal enhancers positively correlates with the level of messenger RNA synthesis at nearby genes, suggesting that eRNA synthesis occurs specifically at enhancers that are actively engaged in promoting mRNA synthesis. These findings reveal that a widespread mechanism of enhancer activation involves RNAPII binding and eRNA synthesis.
2,177 citations
••
[...]
Massachusetts Institute of Technology1, Howard Hughes Medical Institute2, Broad Institute3, Wellcome Trust Sanger Institute4, European Bioinformatics Institute5, University of Cambridge6, Harvard University7, Weizmann Institute of Science8, University of Zurich9, Laboratory of Molecular Biology10, Utrecht University11, École Polytechnique Fédérale de Lausanne12, University of Pennsylvania13, German Cancer Research Center14, Heidelberg University15, Ludwig Maximilian University of Munich16, John Radcliffe Hospital17, Newcastle University18, Stanford University19, University of Oxford20, University of California, San Francisco21, Allen Institute for Brain Science22, Karolinska Institutet23, Royal Institute of Technology24, Icahn School of Medicine at Mount Sinai25, University of Cape Town26, University Medical Center Groningen27, Radboud University Nijmegen28, Kettering University29, University of Edinburgh30, Babraham Institute31, New York University32, Netherlands Cancer Institute33, Ragon Institute of MGH, MIT and Harvard34, University of Texas Health Science Center at Houston35, Technische Universität München36, Technical University of Denmark37, University of California, Berkeley38, King's College London39, California Institute of Technology40
TL;DR: An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease.
Abstract: The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
1,391 citations
••
TL;DR: It is demonstrated that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients and achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach.
Abstract: Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.
1,120 citations
••
TL;DR: Clonal heterogeneity of gene expression level is not due to independent noise in the expression of individual genes, but reflects metastable states of a slowly fluctuating transcriptome that is distinct in individual cells and may govern the reversible, stochastic priming of multipotent progenitor cells in cell fate decision.
Abstract: Phenotypic cell-to-cell variability within clonal populations may be a manifestation of 'gene expression noise', or it may reflect stable phenotypic variants. Such 'non-genetic cell individuality' can arise from the slow fluctuations of protein levels in mammalian cells. These fluctuations produce persistent cell individuality, thereby rendering a clonal population heterogeneous. However, it remains unknown whether this heterogeneity may account for the stochasticity of cell fate decisions in stem cells. Here we show that in clonal populations of mouse haematopoietic progenitor cells, spontaneous 'outlier' cells with either extremely high or low expression levels of the stem cell marker Sca-1 (also known as Ly6a; ref. 9) reconstitute the parental distribution of Sca-1 but do so only after more than one week. This slow relaxation is described by a gaussian mixture model that incorporates noise-driven transitions between discrete subpopulations, suggesting hidden multi-stability within one cell type. Despite clonality, the Sca-1 outliers had distinct transcriptomes. Although their unique gene expression profiles eventually reverted to that of the median cells, revealing an attractor state, they lasted long enough to confer a greatly different proclivity for choosing either the erythroid or the myeloid lineage. Preference in lineage choice was associated with increased expression of lineage-specific transcription factors, such as a >200-fold increase in Gata1 (ref. 10) among the erythroid-prone cells, or a >15-fold increased PU.1 (Sfpi1) (ref. 11) expression among myeloid-prone cells. Thus, clonal heterogeneity of gene expression level is not due to independent noise in the expression of individual genes, but reflects metastable states of a slowly fluctuating transcriptome that is distinct in individual cells and may govern the reversible, stochastic priming of multipotent progenitor cells in cell fate decision.
1,087 citations
••
TL;DR: This Review discusses the multiple algorithmic options for clustering scRNA-seq data, including various technical, biological and computational considerations.
Abstract: Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individual cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging problem from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.
741 citations
Cited by
More filters
••
TL;DR: A strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.
7,892 citations
••
TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Abstract: Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.
7,741 citations
••
Cold Spring Harbor Laboratory1, California Institute of Technology2, University of California, Irvine3, Florida State University College of Arts and Sciences4, Yale University5, Wellcome Trust Sanger Institute6, Norwegian University of Science and Technology7, Affymetrix8, University of North Carolina at Chapel Hill9, University of Lausanne10, University of Geneva11, Genome Institute of Singapore12, Stanford University13, Pompeu Fabra University14
TL;DR: Evidence that three-quarters of the human genome is capable of being transcribed is reported, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs that prompt a redefinition of the concept of a gene.
Abstract: Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
4,450 citations
01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
4,409 citations
••
TL;DR: The most complete human lncRNA annotation to date is presented, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts, and expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes.
Abstract: The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
4,291 citations