scispace - formally typeset
Search or ask a question
Author

Costas S. Iliopoulos

Bio: Costas S. Iliopoulos is an academic researcher from King's College London. The author has contributed to research in topics: String (computer science) & Pattern matching. The author has an hindex of 40, co-authored 432 publications receiving 6883 citations. Previous affiliations of Costas S. Iliopoulos include University of Cambridge & Royal Holloway, University of London.


Papers
More filters
Book ChapterDOI
11 Oct 2010
TL;DR: This paper uses Lyndon words and introduces the Lyndon structure of runs as a useful tool when computing powers and presents an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots.
Abstract: A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a string of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for a few classical string problems: computing all distinct kth string powers for a given k, in particular squares for k = 2, and finding all local periods in a given string of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov & Kucherov, 1999). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.

439 citations

Book ChapterDOI
21 Aug 2006
TL;DR: An algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S1,S2,...,Sr}.
Abstract: We present an algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S1,S2,...,Sr}. In order to find these common motifs we must first identify the factors that exist in each string. Therefore the algorithm begins by constructing a factor automaton for each string Si. To find the common factors of all the strings, the algorithm needs to gather all the factors from the strings together in one data structure and this is achieved by computing an automaton that accepts the union of the above-mentioned automata. Using this automaton we are able to create a new factor alphabet. Based on this factor alphabet a finite automaton is created for each string Si that accepts sequences of all non overlapping factors residing in each string. The intersection of the latter automata produces the finite automaton which accepts all the common subsequences with gaps over the factor alphabet that are present in all the strings of the set S = {S1,S2,...,Sr}. These common subsequences are the common motifs of the strings.

308 citations

Journal Article
TL;DR: The upper bound of 0.5 n on the maximal number of runs in a string of length n has been shown in this article, and the lower bound is 0.406 n.
Abstract: A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition v with a period p such that 2p≤|v|. The maximal number of runs in a string of length n has been thoroughly studied, and is known to be between 0.944 n and 1.029 n. In this paper we investigate cubic runs, in which the shortest period p satisfies 3p≤|v|. We show the upper bound of 0.5 n on the maximal number of such runs in a string of length n, and construct an infinite sequence of words over binary alphabet for which the lower bound is 0.406 n.

266 citations

Journal ArticleDOI
TL;DR: This paper presents a CRCW parallel RAM algorithm that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors that requires Θ(n2) space.
Abstract: Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires ź(n2) space. However, the space needed can be reduced toO(n1+ź) for any 0< ź ≤1, with a corresponding slow-down proportional to 1/ź. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.

152 citations

Journal ArticleDOI
TL;DR: The upper bounds derived on the computational complexity of the algorithms above improve the upper bounds given by Kannan and Bachem in [SIAM J. Comput., 8 (1979), pp. 499–507].
Abstract: An $O(s^5 M(s^2 ))$ algorithm for computing the canonical structure of a finite Abelian group represented by an integer matrix of size s (this is the Smith normal form of the matrix) is presented. Moreover, an $O(s^3 M(s^2 ))$ algorithm for computing the Hermite normal form of an integer matrix of size s is given.The upper bounds derived on the computational complexity of the algorithms above improve the upper bounds given by Kannan and Bachem in [SIAM J. Comput., 8 (1979), pp. 499–507] and Chou and Collins in [SIAM J. Comput., 11 (1982), pp. 687–708].

143 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal Article
TL;DR: In this survey I have collected everything I could find on graph labelings techniques that have appeared in journals that are not widely available.
Abstract: A graph labeling is an assignment of integers to the vertices or edges, or both, subject to certain conditions. Graph labelings were first introduced in the late 1960s. In the intervening years dozens of graph labelings techniques have been studied in over 1000 papers. Finding out what has been done for any particular kind of labeling and keeping up with new discoveries is difficult because of the sheer number of papers and because many of the papers have appeared in journals that are not widely available. In this survey I have collected everything I could find on graph labeling. For the convenience of the reader the survey includes a detailed table of contents and index.

2,367 citations

Journal ArticleDOI

1,380 citations

Journal ArticleDOI
TL;DR: This work developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, LRT, GERP, SiPhy, phyloP, and phastCons.
Abstract: The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p −12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies

1,295 citations