scispace - formally typeset
Open AccessPosted Content

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

Reads0
Chats0
TLDR
SneakySnake is introduced, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment and is efficient to implement on CPUs, GPUs, and FPGAs.
Abstract
Motivation: We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that connects two terminals with the least routing cost on a special grid layout that contains obstacles. The SneakySnake algorithm quickly solves the SNR problem and uses the found optimal path to decide whether or not performing sequence alignment is necessary. Reducing the ASM problem into SNR also makes SneakySnake efficient to implement on CPUs, GPUs, and FPGAs. Results: SneakySnake significantly improves the accuracy of pre-alignment filtering by up to four orders of magnitude compared to the state-of-the-art pre-alignment filters, Shouji, GateKeeper, and SHD. For short sequences, SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a configurable scoring function), by up to 37.7x and 43.9x (>12x on average), respectively, with its CPU implementation, and by up to 413x and 689x (>400x on average), respectively, with FPGA and GPU acceleration. For long sequences, the CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence aligner of minimap2) by up to 979x (276.9x on average) and 91.7x (31.7x on average), respectively. As SneakySnake does not replace sequence alignment, users can still obtain all capabilities (e.g., configurable scoring functions) of the aligner of their choice, unlike existing acceleration efforts that sacrifice some aligner capabilities. Availability: this https URL

read more

Citations
More filters
Proceedings ArticleDOI

GenStore: a high-performance in-storage processing system for genome sequence analysis

TL;DR: GenStore is proposed, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequenceAnalysis by exploiting low-cost and accurate in- storage filters.
Proceedings ArticleDOI

SeGraM: a universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping

TL;DR: This work proposes SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efficiently support both sequence-to-graph mapping and sequence- to-sequence mapping, for both short and long reads.
Journal Article

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

TL;DR: Through rigorous analysis of read mapping processes of reads with different properties and degrees of genetic variation, this work meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flashbased solid-state drive (SSD).
Journal ArticleDOI

From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures

TL;DR: In this article , the authors describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures, and conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics.
References
More filters
Journal ArticleDOI

A note on two problems in connexion with graphs

TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Journal ArticleDOI

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

TL;DR: How heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching is described and an optimality property of a class of search strategies is demonstrated.
Related Papers (5)