Author
Costas S. Iliopoulos
Other affiliations: University of Cambridge, Royal Holloway, University of London, Bangladesh University of Engineering and Technology ...read more
Bio: Costas S. Iliopoulos is an academic researcher from King's College London. The author has contributed to research in topics: String (computer science) & Pattern matching. The author has an hindex of 40, co-authored 432 publications receiving 6883 citations. Previous affiliations of Costas S. Iliopoulos include University of Cambridge & Royal Holloway, University of London.
Papers published on a yearly basis
Papers
More filters
••
11 Oct 2010TL;DR: This paper uses Lyndon words and introduces the Lyndon structure of runs as a useful tool when computing powers and presents an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots.
Abstract: A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a string of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for a few classical string problems: computing all distinct kth string powers for a given k, in particular squares for k = 2, and finding all local periods in a given string of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov & Kucherov, 1999). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.
439 citations
••
21 Aug 2006TL;DR: An algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S1,S2,...,Sr}.
Abstract: We present an algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S1,S2,...,Sr}. In order to find these common motifs we must first identify the factors that exist in each string. Therefore the algorithm begins by constructing a factor automaton for each string Si. To find the common factors of all the strings, the algorithm needs to gather all the factors from the strings together in one data structure and this is achieved by computing an automaton that accepts the union of the above-mentioned automata. Using this automaton we are able to create a new factor alphabet. Based on this factor alphabet a finite automaton is created for each string Si that accepts sequences of all non overlapping factors residing in each string. The intersection of the latter automata produces the finite automaton which accepts all the common subsequences with gaps over the factor alphabet that are present in all the strings of the set S = {S1,S2,...,Sr}. These common subsequences are the common motifs of the strings.
308 citations
•
TL;DR: The upper bound of 0.5 n on the maximal number of runs in a string of length n has been shown in this article, and the lower bound is 0.406 n.
Abstract: A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition v with a period p such that 2p≤|v|. The maximal number of runs in a string of length n has been thoroughly studied, and is known to be between 0.944 n and 1.029 n. In this paper we investigate cubic runs, in which the shortest period p satisfies 3p≤|v|. We show the upper bound of 0.5 n on the maximal number of such runs in a string of length n, and construct an infinite sequence of words over binary alphabet for which the lower bound is 0.406 n.
266 citations
••
TL;DR: This paper presents a CRCW parallel RAM algorithm that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors that requires Θ(n2) space.
Abstract: Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires ź(n2) space. However, the space needed can be reduced toO(n1+ź) for any 0< ź ≤1, with a corresponding slow-down proportional to 1/ź. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.
152 citations
••
TL;DR: The upper bounds derived on the computational complexity of the algorithms above improve the upper bounds given by Kannan and Bachem in [SIAM J. Comput., 8 (1979), pp. 499–507].
Abstract: An $O(s^5 M(s^2 ))$ algorithm for computing the canonical structure of a finite Abelian group represented by an integer matrix of size s (this is the Smith normal form of the matrix) is presented. Moreover, an $O(s^3 M(s^2 ))$ algorithm for computing the Hermite normal form of an integer matrix of size s is given.The upper bounds derived on the computational complexity of the algorithms above improve the upper bounds given by Kannan and Bachem in [SIAM J. Comput., 8 (1979), pp. 499–507] and Chou and Collins in [SIAM J. Comput., 11 (1982), pp. 687–708].
143 citations
Cited by
More filters
••
[...]
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality.
Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …
33,785 citations
•
TL;DR: In this survey I have collected everything I could find on graph labelings techniques that have appeared in journals that are not widely available.
Abstract: A graph labeling is an assignment of integers to the vertices or edges, or both, subject to certain conditions. Graph labelings were first introduced in the late 1960s. In the intervening years dozens of graph labelings techniques have been studied in over 1000 papers. Finding out what has been done for any particular kind of labeling and keeping up with new discoveries is difficult because of the sheer number of papers and because many of the papers have appeared in journals that are not widely available. In this survey I have collected everything I could find on graph labeling. For the convenience of the reader the survey includes a detailed table of contents and index.
2,367 citations
••
Stanford University1, Icahn School of Medicine at Mount Sinai2, Indiana University3, Memorial Sloan Kettering Cancer Center4, Mayo Clinic5, National Institutes of Health6, University of Utah7, Fred Hutchinson Cancer Research Center8, Johns Hopkins University9, NorthShore University HealthSystem10, University of Michigan11, University of North Carolina at Chapel Hill12, University of Turku13, Translational Genomics Research Institute14, Wayne State University15, University of Paris16, University of Melbourne17, Cancer Council Victoria18, University of Ulm19, University of Southern California20, Karolinska Institutet21, Northwestern University22, McGill University23, LSU Health Sciences Center New Orleans24
TL;DR: This work developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, LRT, GERP, SiPhy, phyloP, and phastCons.
Abstract: The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p −12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies
1,295 citations
••
Children's Hospital of Philadelphia1, Duke University2, Washington University in St. Louis3, Baylor University4, Brigham and Women's Hospital5, University of Pittsburgh6, University of Texas MD Anderson Cancer Center7, Vanderbilt University Medical Center8, Medical University of South Carolina9, Memorial Sloan Kettering Cancer Center10
TL;DR: A four-tiered system to categorize somatic sequence variations based on their clinical significances is proposed, with variants with strong clinical significance and variants with potential clinical significance in tier I; tier III, variants of unknown clinical significance; and tier IV, variants deemed benign or likely benign.
1,113 citations