scispace - formally typeset
Journal ArticleDOI

Discovering Motifs in Biological Sequences Using the Micron Automata Processor

TLDR
This paper proposes a novel algorithm for the (l; d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA), designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel.
Abstract
Finding approximately conserved sequences, called motifs , across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the $(l,d)$ motif search problem of identifying one or more motifs of length $l$ present in at least $q$ of the $n$ given sequences, with each occurrence differing from the motif in at most $d$ substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is $(26,11)$ . We propose a novel algorithm for the $(l,d)$ motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the $(l,d)$ motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances $(39,18)$ and $(40,17)$ . The paper serves as a useful guide to solving problems using this new accelerator technology.

read more

Citations
More filters
Proceedings ArticleDOI

Genax: a genome sequencing accelerator

TL;DR: GenAx is presented, an accelerator for read alignment, a time-consuming step in genome sequencing which achieves 31.7× speedup over the standard BWA-MEM sequence aligner running on a 56-thread dualsocket 14-core Xeon E5 server processor, while reducing power consumption and area.
Proceedings ArticleDOI

Cache automaton

TL;DR: Cache Automaton as discussed by the authors extends a conventional last-level cache architecture with components to accelerate two phases in NFA processing: state-match and state-transition, which is made efficient using a sense-amplifier cycling technique that exploits spatial locality in symbol matches.
Proceedings ArticleDOI

Sequential pattern mining with the Micron automata processor

TL;DR: A hardware-accelerated solution of the SPM using Micron's Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs), and a generalized automaton structure is proposed by flattening sequential patterns to simple strings to reduce compilation time and to minimize overhead of reconfiguration.
Journal ArticleDOI

Computer Science Education for Primary and Lower Secondary School Students: Teaching the Concept of Automata

TL;DR: A puzzle game that players can answer correctly if they understand the fundamental concepts of automata theory is designed, which suggests that primary and lower secondary school students can understand thefundamental concepts of Automata theory.
Proceedings ArticleDOI

Parallel Automata Processor

TL;DR: This paper explores the FSM parallelization problem in the context of the Micron Automata Processor and proposes solutions that leverage both the unique properties of the NFAs and unique features in the AP to realize parallel NFA execution on the AP.
References
More filters
Proceedings Article

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.
Journal ArticleDOI

Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment

TL;DR: A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling, that finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations.
Journal ArticleDOI

Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

TL;DR: A greedy algorithm for determining alignments of functionally related sequences is described, and the accuracy of the P value calculations are tested, and an example of using the algorithm to identify binding sites for the Escherichia coli CRP protein is given.
Proceedings Article

Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

TL;DR: This work complements existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals in DNA sequences.
Journal ArticleDOI

A restriction enzyme from Hemophilus influenzae: II. Base sequence of the recognition site

TL;DR: In this paper, the authors have explored the nucleotide sequences at the 5′-ends of the limit product by labeling the 5-phoryl groups (using polynucleotide kinase) and characterizing the labeled fragments released by various nucleases.
Related Papers (5)