scispace - formally typeset
Open AccessJournal ArticleDOI

Discovering Motifs in Ranked Lists of DNA Sequences

Reads0
Chats0
TLDR
The implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences, is demonstrated, demonstrating that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications.
Abstract
Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

TL;DR: GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets, and its unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation.
Journal ArticleDOI

Molecular and genetic properties of tumors associated with local immune cytolytic activity.

TL;DR: The genetic findings provide evidence for immunoediting in tumors and uncover mechanisms of tumor-intrinsic resistance to cytolytic activity, suggesting immune-mediated elimination.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Proceedings Article

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.
Journal ArticleDOI

Transcriptional Regulatory Networks in Saccharomyces cerevisiae

TL;DR: This work determines how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiae associate with genes across the genome in living cells, and identifies network motifs, the simplest units of network architecture, and demonstrates that an automated process can use motifs to assemble a transcriptional regulatory network structure.
Related Papers (5)