scispace - formally typeset
Journal ArticleDOI

Error-Correcting Codes for Short Tandem Duplication and Edit Errors

- 01 Feb 2022 - 
- Vol. 68, Iss: 2, pp 871-880
TLDR
In this article , a code for correcting short tandem duplication and edit errors was proposed, where an edit error may be a substitution, deletion, or insertion, and the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol.
Abstract
Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and edit errors, where an edit error may be a substitution, deletion, or insertion. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one edit error. Because an edited symbol can be duplicated many times (as part of substrings of various lengths), a single edit can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Correcting multiple short duplication and substitution errors

TL;DR: In this paper , the authors proposed error-correcting codes for simultaneously correcting short (tandem) duplications and at most p substitutions, where a short duplication generates a copy of a substring with length ≤ 3 and inserts the copy following the original substring.
Journal ArticleDOI

Low-Redundancy Codes for Correcting Multiple Short-Duplication and Edit Errors

TL;DR: In this paper , the authors proposed error-correcting codes for simultaneously correcting short (tandem) duplications and at most $p$ edits (in addition to duplications) at the additional cost of roughly
Journal ArticleDOI

Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions

TL;DR: In this paper , the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions was considered, and linear-time encodable and decodable codes with list-size 2 for one deletion and one substitution with redundancy were constructed.
References
More filters
Journal ArticleDOI

DNA-Based Storage: Trends and Methods

TL;DR: The analytic contribution of the work is the construction and design of sequences over discrete alphabets that avoid pre-specified address patterns, have balanced base content, and exhibit other relevant substring constraints.
Journal ArticleDOI

Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

TL;DR: This paper provides error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original, and provides a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.
Journal ArticleDOI

Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems

TL;DR: It is demonstrated that Levenshtein’s construction of binary codes correcting insertions of zeros is applicable also to channels with arbitrary alphabets and with errors of arbitrary (but fixed) length.
Journal ArticleDOI

Replication slippage versus point mutation rates in short tandem repeats of the human genome

TL;DR: It was found that within the STRs with repeated units consisting of one, two or three nucleotides, point mutations occur approximately twice as frequently as one would expect on the basis of the 1.2% difference between the human and chimpanzee genomes.
Journal ArticleDOI

Coding Over Sets for DNA Storage

TL;DR: This paper studies error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA), and proposes explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently.