scispace - formally typeset
Search or ask a question

Showing papers by "William F. Smyth published in 2014"


Journal ArticleDOI
TL;DR: The V-transform (V-BWT), a variant of the classic Burrows–Wheeler Transform, is introduced, whereas the original BWT uses lexicographic order, whereas this work applies a distinct total ordering of strings called V-order.

21 citations


Book ChapterDOI
15 Oct 2014
TL;DR: This article introduces a new and simple data structure, the prefix table under Hamming distance, and presents two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice.
Abstract: In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

8 citations


Proceedings Article
01 Jan 2014
TL;DR: In this article, a linear-time algorithm for computing the V-comparison of two finite strings was proposed, based on recording letter positions in increasing order, requiring only linked lists.
Abstract: In this paper we focus on a total (but non-lexicographic) or- dering of strings called V-order. We devise a new linear-time algorithm for computing the V -comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptu- ally simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists.

6 citations


Proceedings Article
01 Jan 2014
TL;DR: A canonical factorization for any two squares that occur at the same position and satisfy some size restrictions is presented, which is believed to have application to problems such as the New Periodicity Lemma, Crochemore-Rytter Three Squares Lemma and ultimately the maximum-number-of-runs conjecture.
Abstract: We present a new combinatorial structure in a string: a canonical factor- ization for any two squares that occur at the same position and satisfy some size restrictions. We believe that this canonical factorization will have application to re- lated problems such as the New Periodicity Lemma, Crochemore-Rytter Three Squares Lemma, and ultimately the maximum-number-of-runs conjecture.

5 citations


Journal ArticleDOI
TL;DR: This paper explores the possibility that repetitions (perhaps also other regularities in strings) can be computed in a manner commensurate with the size of the output.
Abstract: Combinatorics on words began more than a century ago with a demonstration that an infinitely long string with no repetitions could be constructed on an alphabet of only three letters. Computing all...

3 citations


Book ChapterDOI
13 Feb 2014
TL;DR: A new linear-time algorithm for computing the V-comparison of two finite strings called V-order is devised, both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists.
Abstract: In this paper we focus on a total (but non-lexicographic) ordering of strings called V-order. We devise a new linear-time algorithm for computing the V-comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists.

2 citations


Posted Content
TL;DR: It is shown using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, the model is used to provide a lower bound on the alphabet size.
Abstract: An integer array y = y[1..n] is said to be feasible if and only if y[1] = n and, for every i \in 2..n, i \le i+y[i] \le n+1. A string is said to be indeterminate if and only if at least one of its elements is a subset of cardinality greater than one of a given alphabet Sigma; otherwise it is said to be regular. A feasible array y is said to be regular if and only if it is the prefix array of some regular string. We show using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, we use the model to provide a lower bound on the alphabet size. We show further that there is a 1-1 correspondence between labelled simple graphs and indeterminate strings, and we show how to determine the minimum alphabet size |Sigma| of an indeterminate string x based on its associated graph Gx. Thus, in this sense, indeterminate strings are a more natural object of combinatorial interest than the strings on elements of Sigma that have traditionally been studied.

1 citations


01 Jan 2014
TL;DR: In this article, it is shown how to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table π=y. The prefix graph P=Py is a labelled simple graph whose structure is determined by a feasible array y.
Abstract: An indeterminate string (or, more simply, just a string ) x=x[1..n] on an alphabet σ is a sequence of nonempty subsets of σ. We say that x[i1] and x[i2] match (written x[i1]≈x[i2]) if and only if x[i1]∩x[i2]≠θ. A feasible array is an array y=y[1..n] of integers such that y[1]=n and for every i∈2..n, y[i]∈0..n-i+1. A prefix table of a string x is an array π=π[1..n] of integers such that, for every i∈1..n, π[i]=j if and only if x[i..i+j-1] is the longest substring at position i of x that matches a prefix of x It is known from [6] that every feasible array is a prefix table of some indeterminate string. A prefix graph P=Py is a labelled simple graph whose structure is determined by a feasible array y In this paper we show, given a feasible array y , how to use Py to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table π=y.

1 citations


Posted Content
TL;DR: In this paper, a linear-time algorithm was proposed to compute the cover array of a regular string based on the prefix table of the regular string and then extended to indeterminate strings.
Abstract: An \emph{indeterminate string} $x = x[1..n]$ on an alphabet $\Sigma$ is a sequence of nonempty subsets of $\Sigma$; $x$ is said to be \emph{regular} if every subset is of size one. A proper substring $u$ of regular $x$ is said to be a \emph{cover} of $x$ iff for every $i \in 1..n$, an occurrence of $u$ in $x$ includes $x[i]$. The \emph{cover array} $\gamma = \gamma[1..n]$ of $x$ is an integer array such that $\gamma[i]$ is the longest cover of $x[1..i]$. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular $x$ based on prior computation of the border array of $x$. In this paper we first describe a linear-time algorithm to compute the cover array of regular string $x$ based on the prefix table of $x$. We then extend this result to indeterminate strings.