Journal ArticleDOI
Error-Correcting Codes for Short Tandem Duplication and Edit Errors
TLDR
In this article , a code for correcting short tandem duplication and edit errors was proposed, where an edit error may be a substitution, deletion, or insertion, and the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol.Abstract:
Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and edit errors, where an edit error may be a substitution, deletion, or insertion. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one edit error. Because an edited symbol can be duplicated many times (as part of substrings of various lengths), a single edit can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA. read more
Citations
More filters
Proceedings ArticleDOI
Correcting multiple short duplication and substitution errors
TL;DR: In this paper , the authors proposed error-correcting codes for simultaneously correcting short (tandem) duplications and at most p substitutions, where a short duplication generates a copy of a substring with length ≤ 3 and inserts the copy following the original substring.
Journal ArticleDOI
Low-Redundancy Codes for Correcting Multiple Short-Duplication and Edit Errors
TL;DR: In this paper , the authors proposed error-correcting codes for simultaneously correcting short (tandem) duplications and at most $p$ edits (in addition to duplications) at the additional cost of roughly
Journal ArticleDOI
Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions
TL;DR: In this paper , the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions was considered, and linear-time encodable and decodable codes with list-size 2 for one deletion and one substitution with redundancy were constructed.
References
More filters
Journal ArticleDOI
DNA-Based Storage: Trends and Methods
S. M. Hossein Tabatabaei Yazdi,Han Mao Kiah,Eva Garcia-Ruiz,Jian Ma,Huimin Zhao,Olgica Milenkovic +5 more
TL;DR: The analytic contribution of the work is the construction and design of sequences over discrete alphabets that avoid pre-specified address patterns, have balanced base content, and exhibit other relevant substring constraints.
Journal ArticleDOI
Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
TL;DR: This paper provides error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original, and provides a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.
Journal ArticleDOI
Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems
TL;DR: It is demonstrated that Levenshtein’s construction of binary codes correcting insertions of zeros is applicable also to channels with arbitrary alphabets and with errors of arbitrary (but fixed) length.
Journal ArticleDOI
Replication slippage versus point mutation rates in short tandem repeats of the human genome
TL;DR: It was found that within the STRs with repeated units consisting of one, two or three nucleotides, point mutations occur approximately twice as frequently as one would expect on the basis of the 1.2% difference between the human and chimpanzee genomes.
Journal ArticleDOI
Coding Over Sets for DNA Storage
TL;DR: This paper studies error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA), and proposes explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently.