scispace - formally typeset
Search or ask a question

Showing papers by "William F. Smyth published in 2013"


Journal ArticleDOI
TL;DR: New, simple, easily-computed, and widely applicable notions of string covering that provide an intuitive and useful characterisation of a string are proposed: the enhanced cover; the enhanced left cover; and the enhancedleft seed.

35 citations


Journal ArticleDOI
TL;DR: The aim of this survey is to provide insight into the sequential algorithms that have been proposed to compute exact "regularities" in strings; that is, covers, seeds, repetitions, runs, and repeats.
Abstract: The aim of this survey is to provide insight into the sequential algorithms that have been proposed to compute exact "regularities" in strings; that is, covers (or quasiperiods), seeds, repetitions, runs (or maximal periodicities), and repeats. After outlining and evaluating the algorithms that have been proposed for their computation, I suggest possibly productive future directions of research.

34 citations


Journal ArticleDOI
TL;DR: According to Kane’s criteria for oligo design, BOND computes highly specific DNA oligonucleotides, for all the genes that admit unique probes, while running orders of magnitude faster than the existing programs.
Abstract: DNA microarrays have become ubiquitous in biological and medical research. The most difficult problem that needs to be solved is the design of DNA oligonucleotides that (i) are highly specific, that is, bind only to the intended target, (ii) cover the highest possible number of genes, that is, all genes that allow such unique regions, and (iii) are computed fast. None of the existing programs meet all these criteria. We introduce a new approach with our software program BOND (Basic OligoNucleotide Design). According to Kane’s criteria for oligo design, BOND computes highly specific DNA oligonucleotides, for all the genes that admit unique probes, while running orders of magnitude faster than the existing programs. The same approach enables us to introduce also an evaluation procedure that correctly measures the quality of the oligonucleotides. Extensive comparison is performed to prove our claims. BOND is flexible, easy to use, requires no additional software, and is freely available for non-commercial use from http://www.csd.uwo.ca/~ilie/BOND/ . We provide an improved solution to the important problem of oligonucleotide design, including a thorough evaluation of oligo design programs. We hope BOND will become a useful tool for researchers in biological and medical sciences by making the microarray procedures faster and more accurate.

26 citations


Book ChapterDOI
10 Jul 2013
TL;DR: This paper describes and evaluates algorithms for prefix table construction, some previously proposed, some designed by us, and new linear-time algorithms for transformations between π and the border array.
Abstract: The prefix table of a string x = x[1..n] is an array π = π[1..n] such that π[i] is the length of the longest substring beginning at i that equals a prefix of x. In this paper we describe and evaluate algorithms for prefix table construction, some previously proposed, others designed by us. We also describe and evaluate new linear-time algorithms for transformations between π and the border array.

18 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose linear-time RAM algorithms for string comparison in V-order and for Lyndon-like factorization of a string into V-words, and introduce Hybrid Lyndon words as a generalization of standard Lyndon words, and hence propose extensions of factorization algorithms to other forms of order.

13 citations


01 Jan 2013
TL;DR: This paper proposes linear-time RAM algorithms for string comparison in V-order and for Lyndon-like factorization of a string into V-words and introduces Hybrid Lyndon words as a generalization of standard Lyndon words, and hence proposes extensions of factorization algorithms to other forms of order.
Abstract: In this paper we extend previous work on unique maximal factorization families (UMFFs) and a total (but non-lexicographic) ordering of strings called VV-order. We present new combinatorial results for VV-order, in particular concatenation under VV-order. We propose linear-time RAM algorithms for string comparison in VV-order and for Lyndon-like factorization of a string into VV-words. This asymptotic efficiency thus matches that of the corresponding algorithms for lexicographical order. Finally, we introduce Hybrid Lyndon words as a generalization of standard Lyndon words, and hence propose extensions of factorization algorithms to other forms of order.

12 citations


Book ChapterDOI
01 Jan 2013
TL;DR: Generic RAM and PRAM algorithms for factoring words over sets of strings known as circ-UMFFs are described, generalizations of the well-known Lyndon words based on lexorder, whose properties were first studied in 1958 by Chen, Fox and Lyndon.
Abstract: In this paper we describe algorithms for factoring words over sets of strings known as circ-UMFFs, generalizations of the well-known Lyndon words based on lexorder, whose properties were first studied in 1958 by Chen, Fox and Lyndon. In 1983 Duval designed an elegant linear-time sequential (RAM) Lyndon factorization algorithm; a corresponding parallel (PRAM) algorithm was described in 1994 by Daykin, Iliopoulos and Smyth. In 2003 Daykin and Daykin introduced various circ-UMFFs, including one based on V-words and V-ordering; in 2011 linear string comparison and sequential factorization algorithms based on V-order were given by Daykin, Daykin and Smyth. Here we first describe generic RAM and PRAM algorithms for factoring a word over any circ-UMFF; then we show how to customize these generic algorithms to yield optimal parallel Lyndon-like V-word factorization.

6 citations