scispace - formally typeset
Open AccessPosted Content

Data structures to represent a set of k-long DNA sequences

Reads0
Chats0
TLDR
This survey gives a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set and hopes it will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.
Abstract
The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique features and applications that, over the last ten years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

read more

Citations
More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

Glenn Tesler
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Journal ArticleDOI

Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs

TL;DR: A parallel and memory-efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted graph, called Bifrost.
Posted ContentDOI

Bifrost – Highly parallel construction and indexing of colored and compacted de Bruijn graphs

TL;DR: A new parallel and memory efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted de Bruijk graph, Bifrost, which makes full use of the dynamic index efficiency and proposes a graph coloring method efficiently mapping each k-mer of the graph to the set of genomes in which it occurs.
Journal ArticleDOI

Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis

TL;DR: IsONcorrect as mentioned in this paper is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths, achieving a median accuracy of 98.9-99.6%.
Journal ArticleDOI

Data structures based on k-mers for querying large collections of sequencing data sets.

TL;DR: An accessible survey of several computational approaches introduced to index and query large collections of data sets based on representing data sets as sets of k-mers, which summarize their performance and highlight their current strengths and limitations.
References
More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

Glenn Tesler
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Book ChapterDOI

Introduction to Algorithms

Xin-She Yang
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Journal ArticleDOI

Space/time trade-offs in hash coding with allowable errors

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Journal ArticleDOI

Kraken: ultrafast metagenomic sequence classification using exact alignments

TL;DR: Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences that achieves classification accuracy comparable to the fastest BLAST program.
Related Papers (5)