ChromHMM: automating chromatin-state discovery and characterization

Home
/
Papers
/
ChromHMM: automating chromatin-state discovery and characterization

ChromHMM: automating chromatin-state discovery and characterization

Jason Ernst¹, Jason Ernst², Manolis Kellis¹•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Los Angeles²

01 Feb 2012-

TL;DR: ChromHMM is developed, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations.

read less

Abstract: Chromatin state annotation using combinations of chromatin modification patterns has emerged as a powerful approach for discovering regulatory regions and their cell type specific activity patterns, and for interpreting disease-association studies1-5. However, the computational challenge of learning chromatin state models from large numbers of chromatin modification datasets in multiple cell types still requires extensive bioinformatics expertise making it inaccessible to the wider scientific community. To address this challenge, we have developed ChromHMM, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations.

...read moreread less

Citations

PDF

Open Access

More filters

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

...read moreread less

4,833 citations

Integrative analysis of 111 reference human epigenomes

[...]

Anshul Kundaje, Wouter Meuleman, Jason Ernst, Angela Yen, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Lucas D. Ward, Abhishek Sarkar, Gerald Quon, Matthew L. Eaton, Yi-Chieh Wu, Andreas R. Pfenning, Xinchen Wang, Melina Claussnitzer, Yaping Liu, Mukul S. Bansal, Soheil Feizi-Khankandi, Ah Ram Kim, Richard C Sallari, Nicholas A Sinnott-Armstrong, Laurie A. Boyer, Elizabeta Gjoneska, Li-Huei Tsai, Manolis Kellis - Show less +21 more

01 Feb 2015

TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.

...read moreread less

Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

...read moreread less

4,409 citations

Journal Article•

Discovery and characterization of chromatin states for systematic annotation of the human genome

[...]

Jason Ernst¹, Manolis Kellis¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 2010-PubMed Central

TL;DR: In this article, a multivariate Hidden Markov Model was used to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.

...read moreread less

Abstract: A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, largescale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

...read moreread less

720 citations

Chromatin-state discovery and genome annotation with ChromHMM

[...]

Jason Ernst, Manolis Kellis¹•Institutions (1)

Broad Institute¹

01 Nov 2017

TL;DR: ChromHMM combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type, and provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state.

...read moreread less

Abstract: Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 d.

...read moreread less

364 citations

A comparative encyclopedia of DNA elements in the mouse genome

[...]

Feng Yue¹, Feng Yue², Yong Cheng³, Alessandra Breschi, Jeff Vierstra⁴, Weisheng Wu¹, Weisheng Wu⁵, Tyrone Ryba⁶, Tyrone Ryba⁷, Richard Sandstrom⁴, Zhihai Ma³, Carrie A. Davis⁸, Benjamin D. Pope⁶, Yin Shen², Dmitri D. Pervouchine, Sarah Djebali, Robert E. Thurman⁴, Rajinder Kaul⁴, Eric Rynes⁴, Anthony Kirilusha⁹, Georgi K. Marinov⁹, Brian A. Williams⁹, Diane Trout⁹, Henry Amrhein⁹, Katherine I. Fisher-Aylor⁹, Igor Antoshechkin⁹, Gilberto DeSalvo⁹, Lei Hoon See⁸, Meagan Fastuca⁸, Jorg Drenkow⁸, Chris Zaleski⁸, Alexander Dobin⁸, Pablo Prieto, Julien Lagarde, Giovanni Bussotti, Andrea Tanzer¹⁰, Olgert Denas¹¹, Kanwei Li¹¹, M. A. Bender⁴, M. A. Bender¹², Miaohua Zhang¹², Rachel Byron¹², Mark Groudine⁴, Mark Groudine¹², David McCleary², Long Pham², Zhen Ye², Samantha Kuan², Lee Edsall², Yi-Chieh Wu¹³, Matthew D. Rasmussen¹³, Mukul S. Bansal¹³, Manolis Kellis¹³, Manolis Kellis¹⁴, Cheryl A. Keller¹, Christapher S. Morrissey¹, Tejaswini Mishra¹, Deepti Jain¹, Nergiz Dogan¹, Robert S. Harris¹, Philip Cayting³, Trupti Kawli³, Alan P. Boyle⁵, Alan P. Boyle³, Ghia Euskirchen³, Anshul Kundaje³, Shin Lin³, Yiing Lin³, Camden Jansen¹⁵, Venkat S. Malladi³, Melissa S. Cline¹⁶, Drew T. Erickson³, Vanessa M. Kirkup¹⁶, Katrina Learned¹⁶, Cricket A. Sloan³, Kate R. Rosenbloom¹⁶, Beatriz Lacerda de Sousa¹⁷, Kathryn Beal, Miguel Pignatelli, Paul Flicek, Jin Lian¹⁸, Tamer Kahveci¹⁹, Dongwon Lee²⁰, W. James Kent¹⁶, Miguel Santos¹⁷, Javier Herrero²¹, Cedric Notredame, Audra K. Johnson⁴, Shinny Vong⁴, Kristen Lee⁴, Daniel Bates⁴, Fidencio Neri⁴, Morgan Diegel⁴, Theresa K. Canfield⁴, Peter J. Sabo⁴, Matthew S. Wilken⁴, Thomas A. Reh⁴, Erika Giste⁴, Anthony Shafer⁴, Tanya Kutyavin⁴, Eric Haugen⁴, Douglas Dunn⁴, Alex Reynolds⁴, Shane Neph⁴, Richard Humbert⁴, R. Scott Hansen⁴, Marella F. T. R. de Bruijn²², Licia Selleri²³, Alexander Y. Rudensky²⁴, Steven Z. Josefowicz²⁴, Robert M. Samstein²⁴, Evan E. Eichler⁴, Stuart H. Orkin²⁵, Dana N. Levasseur²⁶, Thalia Papayannopoulou⁴, Kai Hsin Chang⁴, Arthur I. Skoultchi²⁷, Srikanta Gosh²⁷, Christine M. Disteche⁴, Piper M. Treuting⁴, Yanli Wang¹, Mitchell J. Weiss, Gerd A. Blobel²⁸, Xiaoyi Cao², Sheng Zhong², Ting Wang²⁹, Peter J. Good³⁰, Rebecca F. Lowdon²⁹, Rebecca F. Lowdon³⁰, Leslie B. Adams³⁰, Leslie B. Adams³¹, Xiao Qiao Zhou³⁰, Michael J. Pazin³⁰, Elise A. Feingold³⁰, Barbara J. Wold⁹, James Taylor¹¹, Ali Mortazavi¹⁵, Sherman M. Weissman¹⁸, John A. Stamatoyannopoulos⁴, Michael Snyder³, Roderic Guigó, Thomas R. Gingeras⁸, David M. Gilbert⁶, Ross C. Hardison¹, Michael A. Beer²⁰, Bing Ren² - Show less +142 more•Institutions (31)

Pennsylvania State University¹, University of California, San Diego², Stanford University³, University of Washington⁴, University of Michigan⁵, Florida State University⁶, New College of Florida⁷, Cold Spring Harbor Laboratory⁸, California Institute of Technology⁹, University of Vienna¹⁰, Emory University¹¹, Fred Hutchinson Cancer Research Center¹², Massachusetts Institute of Technology¹³, Broad Institute¹⁴, University of California, Irvine¹⁵, University of California, Santa Cruz¹⁶, University of California, San Francisco¹⁷, Yale University¹⁸, University of Florida¹⁹, Johns Hopkins University²⁰, University College London²¹, University of Oxford²², Cornell University²³, Memorial Sloan Kettering Cancer Center²⁴, Harvard University²⁵, University of Iowa²⁶, Yeshiva University²⁷, University of Pennsylvania²⁸, Washington University in St. Louis²⁹, National Institutes of Health³⁰, University of North Carolina at Chapel Hill³¹

01 Nov 2014

TL;DR: By comparing with the human genome, this work not only confirms substantial conservation in the newly annotated potential functional sequences, but also finds a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization.

...read moreread less

Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

...read moreread less

226 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Human Genome Browser at UCSC

[...]

W. James Kent¹, Charles W. Sugnet¹, Terrence S. Furey¹, Krishna M. Roskin¹, Tom H. Pringle, Alan M. Zahler¹, and David Haussler¹ - Show less +3 more•Institutions (1)

University of California, Santa Cruz¹

01 Jun 2002-Genome Research

TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.

...read moreread less

Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

...read moreread less

9,605 citations

“Bioinformatics” 특집을 내면서

[...]

장병탁, 김삼묘, 허철구

01 Aug 2000

TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.

...read moreread less

4,833 citations

Integrative analysis of 111 reference human epigenomes

[...]

01 Feb 2015

...read moreread less

4,409 citations

Journal Article•DOI•

The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells

[...]

Cole Trapnell¹, Davide Cacchiarelli¹, Davide Cacchiarelli², Jonna Grimsby², Prapti Pokharel², Shuqiang Li³, Michael A. Morse², Michael A. Morse¹, Niall J. Lennon², Kenneth J. Livak³, Tarjei S. Mikkelsen², Tarjei S. Mikkelsen¹, John L. Rinn², John L. Rinn¹, John L. Rinn⁴ - Show less +11 more•Institutions (4)

Harvard University¹, Broad Institute², Fluidigm Corporation³, Beth Israel Deaconess Medical Center⁴

23 Mar 2014-Nature Biotechnology

TL;DR: Monocle is described, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points that revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation.

...read moreread less

Abstract: Defining the transcriptional dynamics of a temporal process such as cell differentiation is challenging owing to the high variability in gene expression between individual cells. Time-series gene expression analyses of bulk cells have difficulty distinguishing early and late phases of a transcriptional cascade or identifying rare subpopulations of cells, and single-cell proteomic methods rely on a priori knowledge of key distinguishing markers. Here we describe Monocle, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points. Applied to the differentiation of primary human myoblasts, Monocle revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation. We validated some of these predicted regulators in a loss-of function screen. Monocle can in principle be used to recover single-cell gene expression kinetics from a wide array of cellular processes, including differentiation, proliferation and oncogenic transformation.

...read moreread less

4,119 citations

Journal Article•DOI•

Mapping and analysis of chromatin state dynamics in nine human cell types

[...]

Jason Ernst¹, Pouya Kheradpour¹, Pouya Kheradpour², Tarjei S. Mikkelsen¹, Noam Shoresh¹, Lucas D. Ward², Lucas D. Ward¹, Charles B. Epstein¹, Xiaolan Zhang¹, Li Wang¹, Robbyn Issner¹, Michael Coyne¹, Manching Ku³, Manching Ku⁴, Manching Ku¹, Timothy Durham¹, Manolis Kellis¹, Manolis Kellis², Bradley E. Bernstein³, Bradley E. Bernstein⁴, Bradley E. Bernstein¹ - Show less +17 more•Institutions (4)

Broad Institute¹, Massachusetts Institute of Technology², Howard Hughes Medical Institute³, Harvard University⁴

05 May 2011-Nature

TL;DR: This study presents a general framework for deciphering cis-regulatory connections and their roles in disease, and maps nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions.

...read moreread less

Abstract: Chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity. The approach is especially well suited to the characterization of non-coding portions of the genome, which critically contribute to cellular phenotypes yet remain largely uncharted. Here we map nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions. Focusing on cell-type-specific patterns of promoters and enhancers, we define multicell activity profiles for chromatin state, gene expression, regulatory motif enrichment and regulator expression. We use correlations between these profiles to link enhancers to putative target genes, and predict the cell-type-specific activators and repressors that modulate them. The resulting annotations and regulatory predictions have implications for the interpretation of genome-wide association studies. Top-scoring disease single nucleotide polymorphisms are frequently positioned within enhancer elements specifically active in relevant cell types, and in some cases affect a motif instance for a predicted regulator, thus suggesting a mechanism for the association. Our study presents a general framework for deciphering cis-regulatory connections and their roles in disease.

...read moreread less

2,646 citations