scispace - formally typeset
Open AccessJournal ArticleDOI

Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

Reads0
Chats0
TLDR
A bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data supports the hypothesis that combinatorial TF motif patterns are cell-type specific.
Abstract
It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Interpreting tree ensembles with inTrees

TL;DR: In this paper, the interpretable trees (inTrees) framework extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions.
Journal ArticleDOI

Lineage Regulators Direct BMP and Wnt Pathways to Cell-Specific Programs During Differentiation and Regeneration,

TL;DR: These findings demonstrate that hematopoietic regeneration is driven by collaboration of master regulators and signaling transcription factors to control the entire hematopolietic gene programs by binding DNA adjacent to lineage-specific transcription factors.
Posted Content

Interpreting Tree Ensembles with inTrees

Houtao Deng
- 23 Aug 2014 - 
TL;DR: In this paper, the inTrees (interpretable trees) framework extracts, measures, prunes and selects rules from a tree ensemble, and calculates frequent variable interactions, which can be applied to both classification and regression problems, and is applicable to many types of tree ensembles.
Journal ArticleDOI

Machine learning: A powerful tool for gene function prediction in plants.

TL;DR: This review discusses specific applications of machine learning in identifying structural features in sequenced genomes, predicting interactions between different cellular components, and predicting gene function and organismal phenotypes and proposes strategies for stimulating functional discovery using machine learning–based approaches in plants.
Journal ArticleDOI

Classifying Included and Excluded Exons in Exon Skipping Event Using Histone Modifications

TL;DR: A random forest based method was developed to classify included and excluded exons in exon skipping event and discovered their preference in both kinds of exons, which might provide insights into researches on the regulatory mechanisms of alternative splicing.
References
More filters

Classification and Regression by randomForest

TL;DR: random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.
Journal ArticleDOI

An integrated encyclopedia of DNA elements in the human genome

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Journal ArticleDOI

Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities

TL;DR: It is demonstrated in macrophages and B cells that collaborative interactions of the common factor PU.1 with small sets of macrophage- or B cell lineage-determining transcription factors establish cell-specific binding sites that are associated with the majority of promoter-distal H3K4me1-marked genomic regions.
Journal Article

An integrated encyclopedia of DNA elements in the human genome.

ENCODEConsortium
- 01 Jan 2012 - 
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
Related Papers (5)