Showing papers by "William F. Smyth published in 2014"

PDF

Open Access

Journal Article•DOI•

A bijective variant of the Burrows–Wheeler Transform using V-order

[...]

Jacqueline W. Daykin¹, Jacqueline W. Daykin², William F. Smyth•Institutions (2)

King's College London¹, Royal Holloway, University of London²

24 Apr 2014-Theoretical Computer Science

TL;DR: The V-transform (V-BWT), a variant of the classic Burrows–Wheeler Transform, is introduced, whereas the original BWT uses lexicographic order, whereas this work applies a distinct total ordering of strings called V-order.

...read moreread less

21 citations

Book Chapter•DOI•

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

[...]

Carl Barton¹, Costas S. Iliopoulos², Costas S. Iliopoulos¹, Solon P. Pissis¹, William F. Smyth³ - Show less +1 more•Institutions (3)

King's College London¹, University of Western Australia², McMaster University³

15 Oct 2014

TL;DR: This article introduces a new and simple data structure, the prefix table under Hamming distance, and presents two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice.

...read moreread less

Abstract: In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

...read moreread less

8 citations

Proceedings Article•

Simple Linear Comparison of Strings in V -Order (Extended Abstract)

[...]

Ali Alatabbi¹, Jacqueline W. Daykin¹, M. Sohel Rahman², William F. Smyth³•Institutions (3)

King's College London¹, Bangladesh University of Engineering and Technology², McMaster University³

01 Jan 2014

TL;DR: In this article, a linear-time algorithm for computing the V-comparison of two finite strings was proposed, based on recording letter positions in increasing order, requiring only linked lists.

...read moreread less

Abstract: In this paper we focus on a total (but non-lexicographic) or- dering of strings called V-order. We devise a new linear-time algorithm for computing the V -comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptu- ally simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists.

...read moreread less

6 citations

Proceedings Article•

Two squares canonical factorization

[...]

Haoyue Bai¹, Frantisek Franek¹, William F. Smyth¹•Institutions (1)

McMaster University¹

01 Jan 2014

TL;DR: A canonical factorization for any two squares that occur at the same position and satisfy some size restrictions is presented, which is believed to have application to problems such as the New Periodicity Lemma, Crochemore-Rytter Three Squares Lemma and ultimately the maximum-number-of-runs conjecture.

...read moreread less

Abstract: We present a new combinatorial structure in a string: a canonical factor- ization for any two squares that occur at the same position and satisfy some size restrictions. We believe that this canonical factorization will have application to re- lated problems such as the New Periodicity Lemma, Crochemore-Rytter Three Squares Lemma, and ultimately the maximum-number-of-runs conjecture.

...read moreread less

5 citations

Journal Article•DOI•

Large-scale detection of repetitions

[...]

William F. Smyth¹•Institutions (1)

McMaster University¹

28 May 2014-Philosophical Transactions of the Royal Society A

TL;DR: This paper explores the possibility that repetitions (perhaps also other regularities in strings) can be computed in a manner commensurate with the size of the output.

...read moreread less

Abstract: Combinatorics on words began more than a century ago with a demonstration that an infinitely long string with no repetitions could be constructed on an alphabet of only three letters. Computing all...

...read moreread less

3 citations

Book Chapter•DOI•

Simple Linear Comparison of Strings in V -Order

[...]

Ali Alatabbi¹, J.W. Daykin¹, J.W. Daykin², M. Sohel Rahman¹, M. Sohel Rahman³, William F. Smyth⁴ - Show less +2 more•Institutions (4)

King's College London¹, Royal Holloway, University of London², Bangladesh University of Engineering and Technology³, McMaster University⁴

13 Feb 2014

TL;DR: A new linear-time algorithm for computing the V-comparison of two finite strings called V-order is devised, both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists.

...read moreread less

Abstract: In this paper we focus on a total (but non-lexicographic) ordering of strings called V-order. We devise a new linear-time algorithm for computing the V-comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists.

...read moreread less

2 citations

Posted Content•

Indeterminate Strings, Prefix Arrays & Undirected Graphs

[...]

Manolis Christodoulakis¹, P.J. Ryan², William F. Smyth³, Shu Wang⁴•Institutions (4)

University of Cyprus¹, McMaster University², Murdoch University³, IBM⁴

12 Jun 2014-arXiv: Discrete Mathematics

TL;DR: It is shown using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, the model is used to provide a lower bound on the alphabet size.

...read moreread less

Abstract: An integer array y = y[1..n] is said to be feasible if and only if y[1] = n and, for every i \in 2..n, i \le i+y[i] \le n+1. A string is said to be indeterminate if and only if at least one of its elements is a subset of cardinality greater than one of a given alphabet Sigma; otherwise it is said to be regular. A feasible array y is said to be regular if and only if it is the prefix array of some regular string. We show using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, we use the model to provide a lower bound on the alphabet size. We show further that there is a 1-1 correspondence between labelled simple graphs and indeterminate strings, and we show how to determine the minimum alphabet size |Sigma| of an indeterminate string x based on its associated graph Gx. Thus, in this sense, indeterminate strings are a more natural object of combinatorial interest than the strings on elements of Sigma that have traditionally been studied.

...read moreread less

1 citations

Inferring an indeterminate string from a prefix graph

[...]

Ali Alatabbi¹, M. Sohel Rahman², William F. Smyth³•Institutions (3)

King's College London¹, Bangladesh University², Murdoch University³

01 Jan 2014

TL;DR: In this article, it is shown how to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table π=y. The prefix graph P=Py is a labelled simple graph whose structure is determined by a feasible array y.

...read moreread less

Abstract: An indeterminate string (or, more simply, just a string ) x=x[1..n] on an alphabet σ is a sequence of nonempty subsets of σ. We say that x[i1] and x[i2] match (written x[i1]≈x[i2]) if and only if x[i1]∩x[i2]≠θ. A feasible array is an array y=y[1..n] of integers such that y[1]=n and for every i∈2..n, y[i]∈0..n-i+1. A prefix table of a string x is an array π=π[1..n] of integers such that, for every i∈1..n, π[i]=j if and only if x[i..i+j-1] is the longest substring at position i of x that matches a prefix of x It is known from [6] that every feasible array is a prefix table of some indeterminate string. A prefix graph P=Py is a labelled simple graph whose structure is determined by a feasible array y In this paper we show, given a feasible array y , how to use Py to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table π=y.

...read moreread less

1 citations

Posted Content•

Computing Covers Using Prefix Tables

[...]

Ali Alatabbi¹, M. Sohel Rahman², William F. Smyth³•Institutions (3)

King's College London¹, Bangladesh University of Engineering and Technology², University of Western Australia³

09 Dec 2014-arXiv: Data Structures and Algorithms

TL;DR: In this paper, a linear-time algorithm was proposed to compute the cover array of a regular string based on the prefix table of the regular string and then extended to indeterminate strings.

...read moreread less

Abstract: An \emph{indeterminate string} $x = x[1..n]$ on an alphabet $\Sigma$ is a sequence of nonempty subsets of $\Sigma$; $x$ is said to be \emph{regular} if every subset is of size one. A proper substring $u$ of regular $x$ is said to be a \emph{cover} of $x$ iff for every $i \in 1..n$, an occurrence of $u$ in $x$ includes $x[i]$. The \emph{cover array} $\gamma = \gamma[1..n]$ of $x$ is an integer array such that $\gamma[i]$ is the longest cover of $x[1..i]$. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular $x$ based on prior computation of the border array of $x$. In this paper we first describe a linear-time algorithm to compute the cover array of regular string $x$ based on the prefix table of $x$. We then extend this result to indeterminate strings.

...read moreread less