Open AccessPosted Content
Data structures to represent a set of k-long DNA sequences
Reads0
Chats0
TLDR
This survey gives a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set and hopes it will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.Abstract:
The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique features and applications that, over the last ten years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.read more
Citations
More filters
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Journal ArticleDOI
Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs
Guillaume Holley,Páll Melsted +1 more
TL;DR: A parallel and memory-efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted graph, called Bifrost.
Posted ContentDOI
Bifrost – Highly parallel construction and indexing of colored and compacted de Bruijn graphs
Guillaume Holley,Páll Melsted +1 more
TL;DR: A new parallel and memory efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted de Bruijk graph, Bifrost, which makes full use of the dynamic index efficiency and proposes a graph coloring method efficiently mapping each k-mer of the graph to the set of genomes in which it occurs.
Journal ArticleDOI
Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis
Kristoffer Sahlin,Paul Medvedev +1 more
TL;DR: IsONcorrect as mentioned in this paper is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths, achieving a median accuracy of 98.9-99.6%.
Journal ArticleDOI
Data structures based on k-mers for querying large collections of sequencing data sets.
TL;DR: An accessible survey of several computational approaches introduced to index and query large collections of data sets based on representing data sets as sets of k-mers, which summarize their performance and highlight their current strengths and limitations.
References
More filters
Journal ArticleDOI
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
Anton Bankevich,Sergey Nurk,Dmitry Antipov,Alexey Gurevich,Mikhail Dvorkin,Alexander S. Kulikov,Valery M. Lesin,Sergey I. Nikolenko,Son Pham,Andrey D. Prjibelski,Alexey V. Pyshkin,Alexander Sirotkin,Nikolay Vyahhi,Glenn Tesler,Max A. Alekseyev,Pavel A. Pevzner +15 more
TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Book ChapterDOI
Introduction to Algorithms
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Journal ArticleDOI
Space/time trade-offs in hash coding with allowable errors
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Journal ArticleDOI
Kraken: ultrafast metagenomic sequence classification using exact alignments
TL;DR: Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences that achieves classification accuracy comparable to the fastest BLAST program.
Related Papers (5)
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
Guillaume Marçais,Carl Kingsford +1 more