An experimental comparison of PMSprune and other algorithms for motif search

doi:10.1504/IJBRA.2014.065242

Home
/
Papers
/
An experimental comparison of PMSprune and other algorithms for motif search

Journal Article•DOI•

An experimental comparison of PMSprune and other algorithms for motif search

Dolly Sharma¹, Sanguthevar Rajasekaran², Sudipta Pathak²•Institutions (2)

Amity University¹, University of Connecticut²

01 Oct 2014-International Journal of Bioinformatics Research and Applications (Inderscience Publishers Ltd)-Vol. 10, Iss: 6, pp 559-573

TL;DR: It is observed that both PMSprune and DME (an algorithm based on position-specific score matrices), in general, perform better than the 13 algorithms reported in Tompa et al.

read less

Abstract: A comparative study of the various motif search algorithms is very important for several reasons. For example, we could identify the strengths and weaknesses of each. As a result, we might be able to devise hybrids that will perform better than the individual components. In this paper, we (either directly or indirectly) compare the performance of PMSprune (an algorithm based on the (l, d)-motif model) and several other algorithms in terms of seven measures and using well-established benchmarks. We have employed several benchmark datasets including the one used by Tompa et al. It is observed that both PMSprune and DME (an algorithm based on position-specific score matrices), in general, perform better than the 13 algorithms reported in Tompa et al. Subsequently, we have compared PMSprune and DME on other benchmark datasets including ChIP-Chip, ChIP-Seq and ABS. Between PMSprune and DME, PMSprune performs better than DME on six measures. DME performs better than PMSprune on one measure (namely, specificity).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem

[...]

Hieu Dinh¹, Sanguthevar Rajasekaran¹, Vamsi Kundeti¹•Institutions (1)

University of Connecticut¹

24 Oct 2011-BMC Bioinformatics

TL;DR: A fast algorithm is proposed that can solve the well-known challenging instances of PMS: (21, 8) and (23, 9).

...read moreread less

Abstract: Background Motifs are patterns found in biological sequences that are vital for understanding gene function, human disease, drug design, etc. They are helpful in finding transcriptional regulatory elements, transcription factor binding sites, and so on. As a result, the problem of identifying motifs is very crucial in biology.

...read moreread less

64 citations

Journal Article•DOI•

A hybrid method for the exact planted (l, d) motif finding problem and its parallelization.

[...]

Mostafa M. Abbas¹, Mohamed Abouelhoda², Mohamed Abouelhoda³, Hazem M. Bahig•Institutions (3)

Sinai University¹, Nile University², Cairo University³

13 Dec 2012-BMC Bioinformatics

TL;DR: This paper shows how the optimal value of q is determined to achieve the best running time and presents a new efficient method to improve the performance of the exact algorithms for the motif finding problem.

...read moreread less

Abstract: Background Given a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d.

...read moreread less

19 citations

Proceedings Article•DOI•

qPMS10: A randomized algorithm for efficiently solving quorum Planted Motif Search problem

[...]

Peng Xiao¹, Soumitra Pal¹, Sanguthevar Rajasekaran¹•Institutions (1)

University of Connecticut¹

01 Dec 2016

TL;DR: An elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS and it is demonstrated that the randomized algorithm outperforms the exsiting algorithms for solving PMS.

...read moreread less

Abstract: Discovering patterns in biological sequences is very important to extract useful information from them. Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similiarity between families of proteins, etc. Several models of motifs have been proposed in the literature. The (l, d)-motif model is one of these that has been studied widely. The (l, d)-motif search problem is also known as Planted Motif Search (PMS). The general problem of PMS has been proven to be NP-hard. In this paper, we present an elegant as well as efficient randomized algorithm, named qPMS10, to solve PMS. Currently, the best known algorithm for solving PMS is qPMS9 and it can solve challenging (l, d)-motif instances up to (28, 12) and (30, 13). qPMS9 is a deterministic algorithm. We provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. Both theoretical and empirical analysis demonstrate that our randomized algorithm outperforms the exsiting algorithms for solving PMS. Besides, the random sampling techniques we employ in our algorithm can also be extended to solve other motif search problems including Simple Motif Search (SMS) and Edit-distance based Motif Search (EMS). Furthermore, our algorithm can be parallelized efficiently and has the potential of yielding great speedups on multi-core machines.

...read moreread less

12 citations

Journal Article•DOI•

Ethnicity-specific and overlapping alterations of brain hydroxymethylome in Alzheimer's disease

[...]

Lixia Qin¹, Qian Xu¹, Ziyi Li², Li Chen³, Yujing Li², Nannan Yang¹, Zhenhua Liu¹, Jifeng Guo, Lu Shen, Emily G. Allen², Chao Chen¹, Chao Ma⁴, Hao Wu², Xiongwei Zhu⁵, Peng Jin², Beisha Tang - Show less +12 more•Institutions (5)

Central South University¹, Emory University², Indiana University³, Peking Union Medical College⁴, Case Western Reserve University⁵

01 Jan 2020-Human Molecular Genetics

TL;DR: These analyses provide the first genome-wide profiling of DNA hydroxymethylation of the frontal cortex of ad patients from China, emphasizing an important role of 5hmC in ad pathogenesis and highlighting both ethnicity-specific and overlapping changes of brain hydroxylethylome in Alzheimer's disease.

...read moreread less

Abstract: 5-Methylcytosine (5mC), generated through the covalent addition of a methyl group to the fifth carbon of cytosine, is the most prevalent DNA modification in humans and functions as a critical player in the regulation of tissue and cell-specific gene expression. 5mC can be oxidized to 5-hydroxymethylcytosine (5hmC) by ten-eleven translocation (TET) enzymes, which is enriched in brain. Alzheimer's disease (AD) is the most common neurodegenerative disorder, and several studies using the samples collected from Caucasian cohorts have found that epigenetics, particularly cytosine methylation, could play a role in the etiological process of AD. However, little research has been conducted using the samples of other ethnic groups. Here we generated genome-wide profiles of both 5mC and 5hmC in human frontal cortex tissues from late-onset Chinese AD patients and cognitively normal controls. We identified both Chinese-specific and overlapping differentially hydroxymethylated regions (DhMRs) with Caucasian cohorts. Pathway analyses revealed specific pathways enriched among Chinese-specific DhMRs, as well as the shared DhMRs with Caucasian cohorts. Furthermore, two important transcription factor-binding motifs, hypoxia-inducible factor 2α (HIF2α) and hypoxia-inducible factor 1α (HIF1α), were enriched in the DhMRs. Our analyses provide the first genome-wide profiling of DNA hydroxymethylation of the frontal cortex of AD patients from China, emphasizing an important role of 5hmC in AD pathogenesis and highlighting both ethnicity-specific and overlapping changes of brain hydroxymethylome in AD.

...read moreread less

8 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Assessing computational tools for the discovery of transcription factor binding sites.

[...]

Martin Tompa¹, Nan Li¹, Timothy L. Bailey², George M. Church³, Bart De Moor⁴, Eleazar Eskin⁵, Alexander V. Favorov⁶, Martin C. Frith⁷, Yutao Fu⁷, W. James Kent⁸, Vsevolod J. Makeev⁶, Andrei A. Mironov⁹, William Stafford Noble¹, Giulio Pavesi¹⁰, Graziano Pesole¹⁰, Mireille Régnier, Nicolas Simonis¹¹, Saurabh Sinha¹², Gert Thijs⁴, Jacques van Helden¹¹, Mathias Vandenbogaert, Zhiping Weng⁷, Christopher T. Workman⁵, Chun Ye⁵, Zhou Zhu³ - Show less +21 more•Institutions (12)

University of Washington¹, University of Queensland², Harvard University³, Katholieke Universiteit Leuven⁴, University of California, San Diego⁵, Engelhardt Institute of Molecular Biology⁶, Boston University⁷, University of California, Santa Cruz⁸, Moscow State University⁹, University of Milan¹⁰, Université libre de Bruxelles¹¹, Rockefeller University¹²

01 Jan 2005-Nature Biotechnology

TL;DR: The purpose of the current assessment is to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

...read moreread less

Abstract: The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

...read moreread less

1,324 citations

Journal Article•DOI•

Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation

[...]

Frederick P. Roth¹, Jason D. Hughes¹, Preston W. Estep¹, George M. Church¹•Institutions (1)

Harvard University¹

01 Oct 1998-Nature Biotechnology

TL;DR: Whole-genome mRNA quantitation is tested by applying it to three extensively studied regulatory systems in the yeast Saccharomyces cerevisiae: galactose response, heat shock, and mating type, and yielded all of the four relevant DNA motifs and most of the known a- and α-specific genes.

...read moreread less

Abstract: Whole-genome mRNA quantitation can be used to identify the genes that are most responsive to environmental or genotypic change. By searching for mutually similar DNA elements among the upstream non-coding DNA sequences of these genes, we can identify candidate regulatory motifs and corresponding candidate sets of coregulated genes. We have tested this strategy by applying it to three extensively studied regulatory systems in the yeast Saccharomyces cerevisiae: galactose response, heat shock, and mating type. Galactose-response data yielded the known binding site of Gal4, and six of nine genes known to be induced by galactose. Heat shock data yielded the cell-cycle activation motif, which is known to mediate cell-cycle dependent activation, and a set of genes coding for all four nucleosomal proteins. Mating type α and a data yielded all of the four relevant DNA motifs and most of the known a- and α-specific genes.

...read moreread less

1,030 citations

"An experimental comparison of PMSpr..." refers methods in this paper

...AlignACE [RHEC98] uses the whole genome RNA quanititation to identify genes that are most responsive to environmental or genotypic change....
[...]

Journal Article•DOI•

A Gene Expression Map for Caenorhabditis elegans

[...]

Stuart K. Kim¹, Jim Lund¹, Moni Kiraly¹, Kyle Duke¹, Min Jiang¹, Joshua M. Stuart¹, Andreas Eizinger¹, Brian N. Wylie², George S. Davidson² - Show less +5 more•Institutions (2)

Stanford University¹, Sandia National Laboratories²

14 Sep 2001-Science

TL;DR: In this article, the authors assembled data from Caenorhabditis elegans DNA microarray experiments involving many growth conditions, developmental stages, and varieties of mutants and visualized the co-regulated genes in a three-dimensional expression map that displays correlations of gene expression profiles as distances in two dimensions and gene density in the third dimension.

...read moreread less

Abstract: We have assembled data from Caenorhabditis elegans DNA microarray experiments involving many growth conditions, developmental stages, and varieties of mutants. Co-regulated genes were grouped together and visualized in a three-dimensional expression map that displays correlations of gene expression profiles as distances in two dimensions and gene density in the third dimension. The gene expression map can be used as a gene discovery tool to identify genes that are co-regulated with known sets of genes (such as heat shock, growth control genes, germ line genes, and so forth) or to uncover previously unknown genetic functions (such as genomic instability in males and sperm caused by specific transposons).

...read moreread less

690 citations

Proceedings Article•

Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

[...]

Pavel A. Pevzner¹, Sing-Hoi Sze•Institutions (1)

University of Southern California¹

19 Aug 2000

TL;DR: This work complements existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals in DNA sequences.

...read moreread less

Abstract: Signal finding (pattern discovery in unaligned DNA sequences) is a fundamental problem in both computer science and molecular biology with important applications in locating regulatory sites and drug target identification. Despite many studies, this problem is far from being resolved: most signals in DNA sequences are so complicated that we don't yet have good models or reliable algorithms for their recognition. We complement existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals.

...read moreread less

591 citations

"An experimental comparison of PMSpr..." refers background or methods in this paper

...MITRA [EP02] is based on WINNOWER [PS00]....
[...]
...In every step it keeps only one value of the best neighbor but there could be a set of ℓ-mers which could be the best neighbors for u. MITRA [EP02] is based on WINNOWER [PS00]....
[...]
...WINNOWER uses a technique to generate Extendable Cliques....
[...]
...Some examples of algorithms proposed for LDMP are Random Projection [BT01], MITRA [EP02], Winnower [PS00], Pattern Branching [PRP03], PMS1, PMS2, PMS3 [RBH05], PMSP [DBR06], CENSUS [ES03] and Voting [CL05]....
[...]
...The run time of this algorithm is O(N2d+1), where N = nm. SP-STAR uses a technique which eliminates more edges than WINNOWER and hence is faster....
[...]

Journal Article•DOI•

Finding motifs using random projections.

[...]

Jeremy Buhler¹, Martin Tompa¹•Institutions (1)

University of Washington¹

01 Jan 2002-Journal of Computational Biology

TL;DR: A novel motif-discovery algorithm, PROJECTION, is introduced, designed to enhance the performance of existing motif finders using random projections of the input's substrings, and is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge.

...read moreread less

Abstract: The DNA motif discovery problem abstracts the task of discovering short, conserved sites in genomic DNA. Pevzner and Sze recently described a precise combinatorial formulation of motif discovery that motivates the following algorithmic challenge: find twenty planted occurrences of a motif of length fifteen in roughly twelve kilobases of genomic sequence, where each occurrence of the motif differs from its consensus in four randomly chosen positions. Such "subtle" motifs, though statistically highly significant, expose a weakness in existing motif-finding algorithms, which typically fail to discover them. Pevzner and Sze introduced new algorithms to solve their (15,4)-motif challenge, but these methods do not scale efficiently to more difficult problems in the same family, such as the (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif-discovery algorithm, PROJECTION, designed to enhance the performance of existing motif finders using random projections of the input's substrings. Experiments on synthetic data demonstrate that PROJECTION remedies the weakness observed in existing algorithms, typically solving the difficult (14,4)-, (16,5)-, and (18,6)-motif problems. Our algorithm is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge. A probabilistic estimate suggests that related motif-finding problems that PROJECTION fails to solve are in all likelihood inherently intractable. We also test the performance of our algorithm on realistic biological examples, including transcription factor binding sites in eukaryotes and ribosome binding sites in prokaryotes.

...read moreread less

517 citations