scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MEME Suite: tools for motif discovery and searching

TL;DR: The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps, and all of the motif-based tools are now implemented as web services via Opal.
Abstract: The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.
Citations
More filters
Journal ArticleDOI
TL;DR: The toolkit incorporates over 130 functions, which are designed to meet the increasing demand for big-data analyses, ranging from bulk sequence processing to interactive data visualization, and a new plotting engine developed to maximum their interactive ability.

5,173 citations

Journal ArticleDOI
TL;DR: Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices, and provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats.
Abstract: Summary: A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix. Results: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU. Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at

3,266 citations


Cites background from "MEME Suite: tools for motif discove..."

  • ...Received on December 17, 2010; revised on January 26, 2011; accepted on February 1, 2011...

    [...]

Journal ArticleDOI
17 Apr 2018-Immunity
TL;DR: An extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA identifies six immune subtypes that encompass multiple cancer types and are hypothesized to define immune response patterns impacting prognosis.

3,246 citations

Journal ArticleDOI
TL;DR: The capabilities of all the tools within the MEME suite are described, advice on their best use is given and several case studies are provided to illustrate how to combine the results of various MEME Suite tools for successful motif-based analyses.
Abstract: The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA and RNA. Such motifs encode many biological functions, and their detection and characterization is important in the study of molecular interactions in the cell, including the regulation of gene expression. Since the previous description of the MEME Suite in the 2009 Nucleic Acids Research Web Server Issue, we have added six new tools. Here we describe the capabilities of all the tools within the suite, give advice on their best use and provide several case studies to illustrate how to combine the results of various MEME Suite tools for successful motif-based analyses. The MEME Suite is freely available for academic use at http://meme-suite.org, and source code is also available for download and local installation.

1,971 citations


Cites methods from "MEME Suite: tools for motif discove..."

  • ...Six of these tools––DREME (3), MEME-ChIP (4), CentriMo (6), AME (7), SpaMo (8) and MCAST (12)––were developed or given web interfaces since the last publication describing the MEME Suite (15)....

    [...]

  • ...Tools added since the MEME Suite web server was last described (15) are underlined....

    [...]

Journal ArticleDOI
TL;DR: It is becoming clear that a single WRKY transcription factor might be involved in regulating several seemingly disparate processes, and that members of the family play roles in both the repression and de-repression of important plant processes.

1,967 citations

References
More filters
Journal ArticleDOI
TL;DR: The calculation of the q‐value is discussed, the pFDR analogue of the p‐value, which eliminates the need to set the error rate beforehand as is traditionally done, and can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.
Abstract: Summary. Multiple-hypothesis testing involves guarding against much more complicated errors than single-hypothesis testing. Whereas we typically control the type I error rate for a single-hypothesis test, a compound error rate is controlled for multiple-hypothesis tests. For example, controlling the false discovery rate FDR traditionally involves intricate sequential p-value rejection methods based on the observed data. Whereas a sequential p-value method fixes the error rate and estimates its corresponding rejection region, we propose the opposite approach—we fix the rejection region and then estimate its corresponding error rate. This new approach offers increased applicability, accuracy and power. We apply the methodology to both the positive false discovery rate pFDR and FDR, and provide evidence for its benefits. It is shown that pFDR is probably the quantity of interest over FDR. Also discussed is the calculation of the q-value, the pFDR analogue of the p-value, which eliminates the need to set the error rate beforehand as is traditionally done. Some simple numerical examples are presented that show that this new approach can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.

5,414 citations

Proceedings Article
01 Jan 1994
TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.
Abstract: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset

4,978 citations


"MEME Suite: tools for motif discove..." refers methods in this paper

  • ...The MEME algorithm (2) has been widely used for the discovery of DNA and protein sequence motifs, and MEME continues to be the starting point for most analyses using the MEME Suite....

    [...]

  • ...MEME (2) and GLAM2 (3) are tools for motif discovery, TOMTOM (4) searches for similar motifs in databases of known motifs, FIMO, GLAM2SCAN (3) and MAST (5) search for occurrences of motifs in sequence databases, and GOMO (6) provides associations between motifs and GO terms....

    [...]

Journal ArticleDOI
TL;DR: The freely accessible web server and its architecture are described, and ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance are discussed.
Abstract: MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource (http://meme.nbcr.net) and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance.

2,216 citations


"MEME Suite: tools for motif discove..." refers background in this paper

  • ...It offers a significantly expanded set of programs for these tasks compared with the earlier web server (1)....

    [...]

Journal ArticleDOI
TL;DR: Kepler as mentioned in this paper is a scientific workflow system, which is currently under development across a number of scientific data management projects and is a community-driven, open source project, and always welcome related projects and new contributors to join.
Abstract: Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery “pipelines”. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. “the Grid”). However, this infrastructure is only a means to an end and scientists ideally should be bothered little with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemyii system, planned extensions, and areas of future research. Kepler is a communitydriven, open source project, and we always welcome related projects and new contributors to join.

1,926 citations

Journal ArticleDOI
TL;DR: A statistical measure of motif-motif similarity is defined, an algorithm is described, called Tomtom, for searching a database of motifs with a given query motif, and its effectiveness in finding similar motifs is demonstrated.
Abstract: A common question within the context of de novo motif discovery is whether a newly discovered, putative motif resembles any previously discovered motif in an existing database. To answer this question, we define a statistical measure of motif-motif similarity, and we describe an algorithm, called Tomtom, for searching a database of motifs with a given query motif. Experimental simulations demonstrate the accuracy of Tomtom's E values and its effectiveness in finding similar motifs.

1,603 citations


"MEME Suite: tools for motif discove..." refers background or methods in this paper

  • ...Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM....

    [...]

  • ...This type of analysis is done using TOMTOM....

    [...]

  • ...MEME (2) and GLAM2 (3) are tools for motif discovery, TOMTOM (4) searches for similar motifs in databases of known motifs, FIMO, GLAM2SCAN (3) and MAST (5) search for occurrences of motifs in sequence databases, and GOMO (6) provides associations between motifs and GO terms....

    [...]

  • ...TOMTOM not only provides a numeric score for the match between two motifs, but also provides an estimate of the statistical significance of the score....

    [...]

  • ...TOMTOM (4) quantifies the similarity between two motifs, and can be used to search a database of known motifs for matches to motifs found by MEME....

    [...]