scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An experimental comparison of PMSprune and other algorithms for motif search

TL;DR: It is observed that both PMSprune and DME (an algorithm based on position-specific score matrices), in general, perform better than the 13 algorithms reported in Tompa et al.
Abstract: A comparative study of the various motif search algorithms is very important for several reasons. For example, we could identify the strengths and weaknesses of each. As a result, we might be able to devise hybrids that will perform better than the individual components. In this paper, we (either directly or indirectly) compare the performance of PMSprune (an algorithm based on the (l, d)-motif model) and several other algorithms in terms of seven measures and using well-established benchmarks. We have employed several benchmark datasets including the one used by Tompa et al. It is observed that both PMSprune and DME (an algorithm based on position-specific score matrices), in general, perform better than the 13 algorithms reported in Tompa et al. Subsequently, we have compared PMSprune and DME on other benchmark datasets including ChIP-Chip, ChIP-Seq and ABS. Between PMSprune and DME, PMSprune performs better than DME on six measures. DME performs better than PMSprune on one measure (namely, specificity).
Citations
More filters
Journal ArticleDOI
TL;DR: A fast algorithm is proposed that can solve the well-known challenging instances of PMS: (21, 8) and (23, 9).
Abstract: Background Motifs are patterns found in biological sequences that are vital for understanding gene function, human disease, drug design, etc. They are helpful in finding transcriptional regulatory elements, transcription factor binding sites, and so on. As a result, the problem of identifying motifs is very crucial in biology.

64 citations

Journal ArticleDOI
TL;DR: This paper shows how the optimal value of q is determined to achieve the best running time and presents a new efficient method to improve the performance of the exact algorithms for the motif finding problem.
Abstract: Background Given a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d.

19 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: An elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS and it is demonstrated that the randomized algorithm outperforms the exsiting algorithms for solving PMS.
Abstract: Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.

12 citations

Journal ArticleDOI
TL;DR: These analyses provide the first genome-wide profiling of DNA hydroxymethylation of the frontal cortex of ad patients from China, emphasizing an important role of 5hmC in ad pathogenesis and highlighting both ethnicity-specific and overlapping changes of brain hydroxylethylome in Alzheimer's disease.
Abstract: 5-Methylcytosine (5mC), generated through the covalent addition of a methyl group to the fifth carbon of cytosine, is the most prevalent DNA modification in humans and functions as a critical player in the regulation of tissue and cell-specific gene expression. 5mC can be oxidized to 5-hydroxymethylcytosine (5hmC) by ten-eleven translocation (TET) enzymes, which is enriched in brain. Alzheimer's disease (AD) is the most common neurodegenerative disorder, and several studies using the samples collected from Caucasian cohorts have found that epigenetics, particularly cytosine methylation, could play a role in the etiological process of AD. However, little research has been conducted using the samples of other ethnic groups. Here we generated genome-wide profiles of both 5mC and 5hmC in human frontal cortex tissues from late-onset Chinese AD patients and cognitively normal controls. We identified both Chinese-specific and overlapping differentially hydroxymethylated regions (DhMRs) with Caucasian cohorts. Pathway analyses revealed specific pathways enriched among Chinese-specific DhMRs, as well as the shared DhMRs with Caucasian cohorts. Furthermore, two important transcription factor-binding motifs, hypoxia-inducible factor 2α (HIF2α) and hypoxia-inducible factor 1α (HIF1α), were enriched in the DhMRs. Our analyses provide the first genome-wide profiling of DNA hydroxymethylation of the frontal cortex of AD patients from China, emphasizing an important role of 5hmC in AD pathogenesis and highlighting both ethnicity-specific and overlapping changes of brain hydroxymethylome in AD.

8 citations

References
More filters
Journal ArticleDOI
TL;DR: The purpose of the current assessment is to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
Abstract: The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

1,324 citations

Journal ArticleDOI
TL;DR: Whole-genome mRNA quantitation is tested by applying it to three extensively studied regulatory systems in the yeast Saccharomyces cerevisiae: galactose response, heat shock, and mating type, and yielded all of the four relevant DNA motifs and most of the known a- and α-specific genes.
Abstract: Whole-genome mRNA quantitation can be used to identify the genes that are most responsive to environmental or genotypic change. By searching for mutually similar DNA elements among the upstream non-coding DNA sequences of these genes, we can identify candidate regulatory motifs and corresponding candidate sets of coregulated genes. We have tested this strategy by applying it to three extensively studied regulatory systems in the yeast Saccharomyces cerevisiae: galactose response, heat shock, and mating type. Galactose-response data yielded the known binding site of Gal4, and six of nine genes known to be induced by galactose. Heat shock data yielded the cell-cycle activation motif, which is known to mediate cell-cycle dependent activation, and a set of genes coding for all four nucleosomal proteins. Mating type α and a data yielded all of the four relevant DNA motifs and most of the known a- and α-specific genes.

1,030 citations


"An experimental comparison of PMSpr..." refers methods in this paper

  • ...AlignACE [RHEC98] uses the whole genome RNA quanititation to identify genes that are most responsive to environmental or genotypic change....

    [...]

Journal ArticleDOI
14 Sep 2001-Science
TL;DR: In this article, the authors assembled data from Caenorhabditis elegans DNA microarray experiments involving many growth conditions, developmental stages, and varieties of mutants and visualized the co-regulated genes in a three-dimensional expression map that displays correlations of gene expression profiles as distances in two dimensions and gene density in the third dimension.
Abstract: We have assembled data from Caenorhabditis elegans DNA microarray experiments involving many growth conditions, developmental stages, and varieties of mutants. Co-regulated genes were grouped together and visualized in a three-dimensional expression map that displays correlations of gene expression profiles as distances in two dimensions and gene density in the third dimension. The gene expression map can be used as a gene discovery tool to identify genes that are co-regulated with known sets of genes (such as heat shock, growth control genes, germ line genes, and so forth) or to uncover previously unknown genetic functions (such as genomic instability in males and sperm caused by specific transposons).

690 citations

Proceedings Article
19 Aug 2000
TL;DR: This work complements existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals in DNA sequences.
Abstract: Signal finding (pattern discovery in unaligned DNA sequences) is a fundamental problem in both computer science and molecular biology with important applications in locating regulatory sites and drug target identification. Despite many studies, this problem is far from being resolved: most signals in DNA sequences are so complicated that we don't yet have good models or reliable algorithms for their recognition. We complement existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals.

591 citations


"An experimental comparison of PMSpr..." refers background or methods in this paper

  • ...MITRA [EP02] is based on WINNOWER [PS00]....

    [...]

  • ...In every step it keeps only one value of the best neighbor but there could be a set of ℓ-mers which could be the best neighbors for u. MITRA [EP02] is based on WINNOWER [PS00]....

    [...]

  • ...WINNOWER uses a technique to generate Extendable Cliques....

    [...]

  • ...Some examples of algorithms proposed for LDMP are Random Projection [BT01], MITRA [EP02], Winnower [PS00], Pattern Branching [PRP03], PMS1, PMS2, PMS3 [RBH05], PMSP [DBR06], CENSUS [ES03] and Voting [CL05]....

    [...]

  • ...The run time of this algorithm is O(N2d+1), where N = nm. SP-STAR uses a technique which eliminates more edges than WINNOWER and hence is faster....

    [...]

Journal ArticleDOI
TL;DR: A novel motif-discovery algorithm, PROJECTION, is introduced, designed to enhance the performance of existing motif finders using random projections of the input's substrings, and is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge.
Abstract: The DNA motif discovery problem abstracts the task of discovering short, conserved sites in genomic DNA. Pevzner and Sze recently described a precise combinatorial formulation of motif discovery that motivates the following algorithmic challenge: find twenty planted occurrences of a motif of length fifteen in roughly twelve kilobases of genomic sequence, where each occurrence of the motif differs from its consensus in four randomly chosen positions. Such "subtle" motifs, though statistically highly significant, expose a weakness in existing motif-finding algorithms, which typically fail to discover them. Pevzner and Sze introduced new algorithms to solve their (15,4)-motif challenge, but these methods do not scale efficiently to more difficult problems in the same family, such as the (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif-discovery algorithm, PROJECTION, designed to enhance the performance of existing motif finders using random projections of the input's substrings. Experiments on synthetic data demonstrate that PROJECTION remedies the weakness observed in existing algorithms, typically solving the difficult (14,4)-, (16,5)-, and (18,6)-motif problems. Our algorithm is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge. A probabilistic estimate suggests that related motif-finding problems that PROJECTION fails to solve are in all likelihood inherently intractable. We also test the performance of our algorithm on realistic biological examples, including transcription factor binding sites in eukaryotes and ribosome binding sites in prokaryotes.

517 citations