Optimal gap-affine alignment in O(s) space
Santiago Marco-Sola,Jordan M. Eizenga,Andrea Guarracino,Benedict Paten,Erik Garrison,Miquel Moreto +5 more
TLDR
The bidirectional WFA algorithm (BiWFA), the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA’s time complexity of O(ns), is presented.Abstract:
Motivation Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances in genomics and sequencing technologies demand faster and scalable algorithms that can cope with the ever-increasing sequence lengths. Classical pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory. The recently proposed wavefront alignment algorithm (WFA) introduced an efficient algorithm to perform exact gap-affine alignment in O(ns) time, where s is the optimal score and n is the sequence length. Notwithstanding these bounds, WFA’s O(s2) memory requirements become computationally impractical for genome-scale alignments, leading to a need for further improvement. Results In this paper, we present the bidirectional WFA algorithm (BiWFA), the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA’s time complexity of O(ns). As a result, this work improves the lowest known memory bound O(n) to compute gap-affine alignments. In practice, our implementation never requires more than a few hundred MBs aligning noisy Oxford Nanopore Technologies reads up to 1 Mbp long while maintaining competitive execution times. Availability All code is publicly available at https://github.com/smarco/BiWFA-paper Contact santiagomsola@gmail.comread more
Citations
More filters
Journal ArticleDOI
A draft human pangenome reference
Wen-Wei Liao,Mobin Asri,Jana Ebler,Daniel Doerr,Marina Haukness,Glenn Hickey,Shuangjia Lu,Julian K. Lucas,Jean Marcel Maurice Monlong,Haley J. Abel,Silvia Buonaiuto,Xian Chang,Haoyu Cheng,Justin Jang Hann Chu,Vincenza Colonna,Jordan M. Eizenga,Xiaowen Feng,Christian Fischer,Robert S. Fulton,Shilpa Garg,Cristian Groza,Andrea Guarracino,William T. Harvey,Simon Heumos,Kerstin Howe,Miten Jain,Tsung-Yu Lu,Charles Markello,Fergal J. Martin,Matthew Mitchell,Katherine M. Munson,Moses N. Mwaniki,Adam M. Novak,Hugh E. Olsen,Trevor Pesout,David Porubsky,Pjotr Prins,Jonas Andreas Sibbesen,Chad Tomlinson,Flavia Villani,Mitchell R. Vollger,Guillaume Bourque,Mark Chaisson,Paul Flicek,Adam M. Phillippy,Justin M. Zook,Evan E. Eichler,David Haussler,Erich D. Jarvis,Karen H. Miga,Ting Wang,Erik Garrison,Tobias Marschall,Ira M. Hall,Heng Li,Benedict Paten +55 more
TL;DR: The pangenome reference as discussed by the authors contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals and is more than 99% accurate at the structural and base pair levels.
Posted ContentDOI
A Draft Human Pangenome Reference
TL;DR: The Human Pangenome Reference Consortium (HPRC) as mentioned in this paper presented a first draft human pangeneome reference, which contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals.
Journal ArticleDOI
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures
Mohammed Alser,Joel Lindegger,Can Fırtına,Nour Almadhoun,Haiyu Mao,Gagandeep Singh,Juan Gómez-Luna,Onur Mutlu +7 more
TL;DR: In this article , the authors describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures, and conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics.
Journal ArticleDOI
Recombination between heterologous human acrocentric chromosomes
Andrea Guarracino,Silvia Buonaiuto,Leonardo Gomes de Lima,Tamara A. Potapova,Arang Rhie,Sergey Koren,Boris Rubinstein,Christian Fischer,Jennifer L. Gerton,Adam M. Phillippy,Vincenza Colonna,Erik Garrison +11 more
TL;DR: In the first complete assembly of a human genome, the Telomere-to-Telomere Consortium's CHM13 assembly (T2T-CHM13) provided a model of their homology as mentioned in this paper .
Posted ContentDOI
Building pangenome graphs
Erik Garrison,Andrea Guarracino,Simon Heumos,Flavia Villani,Zhigui Bao,Lorenzo Tattini,Jörg Hagmann,Santiago Marco-Sola,David G. Ashbrook,Kaisa Thorell,Rachel Rusholme-Pilcher,Gianni Liti,Sven Nahnsen,Franklin L. Nobrega,Yi Wu,Hao Chen,Joep de Ligt,Peter H. Sudmant,Nicole Soranzo,Vincenza Colonna,Robert W. Williams,Pjotr Prins +21 more
TL;DR: PanGenome Graph Builder (PGGB) as discussed by the authors uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which they can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Aaron McKenna,Matthew Hanna,Eric Banks,Andrey Sivachenko,Kristian Cibulskis,Andrew Kernytsky,Kiran V. Garimella,David Altshuler,Stacey Gabriel,Mark J. Daly,Mark A. DePristo +10 more
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Journal ArticleDOI
A general method applicable to the search for similarities in the amino acid sequence of two proteins
TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Posted ContentDOI
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
TL;DR: BWA-MEM automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment, which is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases.
Journal ArticleDOI
Minimap2: pairwise alignment for nucleotide sequences
TL;DR: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.
Related Papers (5)
An algorithm for progressive multiple alignment of sequences with insertions
Ari Löytynoja,Nick Goldman +1 more
A technique of genetic algorithm and sequence synthesis for multiple molecular sequence alignment
Ching Zhang,A.K.C. Wong +1 more