scispace - formally typeset
Search or ask a question

ChromHMM: automating chromatin-state discovery and characterization

TL;DR: ChromHMM is developed, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations.
Abstract: Chromatin state annotation using combinations of chromatin modification patterns has emerged as a powerful approach for discovering regulatory regions and their cell type specific activity patterns, and for interpreting disease-association studies1-5. However, the computational challenge of learning chromatin state models from large numbers of chromatin modification datasets in multiple cell types still requires extensive bioinformatics expertise making it inaccessible to the wider scientific community. To address this challenge, we have developed ChromHMM, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations.
Citations
More filters
01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal Article
TL;DR: In this article, a multivariate Hidden Markov Model was used to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.
Abstract: A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, largescale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

720 citations

01 Nov 2017
TL;DR: ChromHMM combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type, and provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state.
Abstract: Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 d.

364 citations

Feng Yue1, Feng Yue2, Yong Cheng3, Alessandra Breschi, Jeff Vierstra4, Weisheng Wu1, Weisheng Wu5, Tyrone Ryba6, Tyrone Ryba7, Richard Sandstrom4, Zhihai Ma3, Carrie A. Davis8, Benjamin D. Pope6, Yin Shen2, Dmitri D. Pervouchine, Sarah Djebali, Robert E. Thurman4, Rajinder Kaul4, Eric Rynes4, Anthony Kirilusha9, Georgi K. Marinov9, Brian A. Williams9, Diane Trout9, Henry Amrhein9, Katherine I. Fisher-Aylor9, Igor Antoshechkin9, Gilberto DeSalvo9, Lei Hoon See8, Meagan Fastuca8, Jorg Drenkow8, Chris Zaleski8, Alexander Dobin8, Pablo Prieto, Julien Lagarde, Giovanni Bussotti, Andrea Tanzer10, Olgert Denas11, Kanwei Li11, M. A. Bender4, M. A. Bender12, Miaohua Zhang12, Rachel Byron12, Mark Groudine4, Mark Groudine12, David McCleary2, Long Pham2, Zhen Ye2, Samantha Kuan2, Lee Edsall2, Yi-Chieh Wu13, Matthew D. Rasmussen13, Mukul S. Bansal13, Manolis Kellis13, Manolis Kellis14, Cheryl A. Keller1, Christapher S. Morrissey1, Tejaswini Mishra1, Deepti Jain1, Nergiz Dogan1, Robert S. Harris1, Philip Cayting3, Trupti Kawli3, Alan P. Boyle5, Alan P. Boyle3, Ghia Euskirchen3, Anshul Kundaje3, Shin Lin3, Yiing Lin3, Camden Jansen15, Venkat S. Malladi3, Melissa S. Cline16, Drew T. Erickson3, Vanessa M. Kirkup16, Katrina Learned16, Cricket A. Sloan3, Kate R. Rosenbloom16, Beatriz Lacerda de Sousa17, Kathryn Beal, Miguel Pignatelli, Paul Flicek, Jin Lian18, Tamer Kahveci19, Dongwon Lee20, W. James Kent16, Miguel Santos17, Javier Herrero21, Cedric Notredame, Audra K. Johnson4, Shinny Vong4, Kristen Lee4, Daniel Bates4, Fidencio Neri4, Morgan Diegel4, Theresa K. Canfield4, Peter J. Sabo4, Matthew S. Wilken4, Thomas A. Reh4, Erika Giste4, Anthony Shafer4, Tanya Kutyavin4, Eric Haugen4, Douglas Dunn4, Alex Reynolds4, Shane Neph4, Richard Humbert4, R. Scott Hansen4, Marella F. T. R. de Bruijn22, Licia Selleri23, Alexander Y. Rudensky24, Steven Z. Josefowicz24, Robert M. Samstein24, Evan E. Eichler4, Stuart H. Orkin25, Dana N. Levasseur26, Thalia Papayannopoulou4, Kai Hsin Chang4, Arthur I. Skoultchi27, Srikanta Gosh27, Christine M. Disteche4, Piper M. Treuting4, Yanli Wang1, Mitchell J. Weiss, Gerd A. Blobel28, Xiaoyi Cao2, Sheng Zhong2, Ting Wang29, Peter J. Good30, Rebecca F. Lowdon29, Rebecca F. Lowdon30, Leslie B. Adams30, Leslie B. Adams31, Xiao Qiao Zhou30, Michael J. Pazin30, Elise A. Feingold30, Barbara J. Wold9, James Taylor11, Ali Mortazavi15, Sherman M. Weissman18, John A. Stamatoyannopoulos4, Michael Snyder3, Roderic Guigó, Thomas R. Gingeras8, David M. Gilbert6, Ross C. Hardison1, Michael A. Beer20, Bing Ren2 
01 Nov 2014
TL;DR: By comparing with the human genome, this work not only confirms substantial conservation in the newly annotated potential functional sequences, but also finds a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization.
Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

226 citations

References
More filters
Journal ArticleDOI
TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

9,605 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal ArticleDOI
TL;DR: Monocle is described, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points that revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation.
Abstract: Defining the transcriptional dynamics of a temporal process such as cell differentiation is challenging owing to the high variability in gene expression between individual cells. Time-series gene expression analyses of bulk cells have difficulty distinguishing early and late phases of a transcriptional cascade or identifying rare subpopulations of cells, and single-cell proteomic methods rely on a priori knowledge of key distinguishing markers. Here we describe Monocle, an unsupervised algorithm that increases the temporal resolution of transcriptome dynamics using single-cell RNA-Seq data collected at multiple time points. Applied to the differentiation of primary human myoblasts, Monocle revealed switch-like changes in expression of key regulatory factors, sequential waves of gene regulation, and expression of regulators that were not known to act in differentiation. We validated some of these predicted regulators in a loss-of function screen. Monocle can in principle be used to recover single-cell gene expression kinetics from a wide array of cellular processes, including differentiation, proliferation and oncogenic transformation.

4,119 citations

Journal ArticleDOI
05 May 2011-Nature
TL;DR: This study presents a general framework for deciphering cis-regulatory connections and their roles in disease, and maps nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions.
Abstract: Chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity. The approach is especially well suited to the characterization of non-coding portions of the genome, which critically contribute to cellular phenotypes yet remain largely uncharted. Here we map nine chromatin marks across nine cell types to systematically characterize regulatory elements, their cell-type specificities and their functional interactions. Focusing on cell-type-specific patterns of promoters and enhancers, we define multicell activity profiles for chromatin state, gene expression, regulatory motif enrichment and regulator expression. We use correlations between these profiles to link enhancers to putative target genes, and predict the cell-type-specific activators and repressors that modulate them. The resulting annotations and regulatory predictions have implications for the interpretation of genome-wide association studies. Top-scoring disease single nucleotide polymorphisms are frequently positioned within enhancer elements specifically active in relevant cell types, and in some cases affect a motif instance for a predicted regulator, thus suggesting a mechanism for the association. Our study presents a general framework for deciphering cis-regulatory connections and their roles in disease.

2,646 citations