Data structures to represent a set of k-long DNA sequences

Open AccessPosted Content

Data structures to represent a set of k-long DNA sequences

Rayan Chikhi, +2 more

- 29 Mar 2019 -

arXiv: Data Structures and Algorithms

Chats0

TLDR

This survey gives a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set and hopes it will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

Abstract:

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique features and applications that, over the last ten years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k-mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.

Citations

PDF

Open Access

More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

Glenn Tesler

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Journal ArticleDOI

Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs

Guillaume Holley, +1 more

- 17 Sep 2020 -

Genome Biology

TL;DR: A parallel and memory-efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted graph, called Bifrost.

...read moreread less

Posted ContentDOI

Bifrost – Highly parallel construction and indexing of colored and compacted de Bruijn graphs

Guillaume Holley, +1 more

- 08 Jul 2019 -

bioRxiv

TL;DR: A new parallel and memory efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted de Bruijk graph, Bifrost, which makes full use of the dynamic index efficiency and proposes a graph coloring method efficiently mapping each k-mer of the graph to the set of genomes in which it occurs.

...read moreread less

Journal ArticleDOI

Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis

Kristoffer Sahlin, +1 more

- 04 Jan 2021 -

Nature Communications

TL;DR: IsONcorrect as mentioned in this paper is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths, achieving a median accuracy of 98.9-99.6%.

...read moreread less

Journal ArticleDOI

Data structures based on k-mers for querying large collections of sequencing data sets.

Camille Marchet, +5 more

- 01 Jan 2021 -

Genome Research

TL;DR: An accessible survey of several computational approaches introduced to index and query large collections of data sets based on representing data sets as sets of k-mers, which summarize their performance and highlight their current strengths and limitations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

Glenn Tesler

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Book ChapterDOI

Introduction to Algorithms

Xin-She Yang

TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.

...read moreread less

Journal ArticleDOI

Space/time trade-offs in hash coding with allowable errors

Burton H. Bloom

- 01 Jul 1970 -

Communications of The ACM

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

...read moreread less

Journal ArticleDOI

Kraken: ultrafast metagenomic sequence classification using exact alignments

Derrick E. Wood, +3 more

- 03 Mar 2014 -

Genome Biology

TL;DR: Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences that achieves classification accuracy comparable to the fastest BLAST program.

...read moreread less