Showing papers by "Costas S. Iliopoulos published in 2014"

PDF

Open Access

Journal Article•DOI•

[...]

Jinil Kim¹, Peter Eades², Rudolf Fleischer³, Seok-Hee Hong², Costas S. Iliopoulos⁴, Kunsoo Park¹, Simon J. Puglisi⁵, Takeshi Tokuyama⁶ - Show less +4 more•Institutions (6)

Seoul National University¹, University of Sydney², German University of Technology in Oman³, King's College London⁴, University of Helsinki⁵, Tohoku University⁶

01 Mar 2014-Theoretical Computer Science

TL;DR: This work introduces a new string matching problem called order-preserving matching on numeric strings, where a pattern matches a text if the text contains a substring of values whose relative orders coincide with those of the pattern.

...read moreread less

74 citations

Journal Article•DOI•

Extracting powers and periods in a word from its runs structure

[...]

Maxime Crochemore¹, Costas S. Iliopoulos², Marcin Kubica³, Jakub Radoszewski³, Wojciech Rytter⁴, Tomasz Waleń⁵ - Show less +2 more•Institutions (5)

King's College London¹, Curtin University², University of Warsaw³, Nicolaus Copernicus University in Toruń⁴, International Institute of Minnesota⁵

01 Feb 2014-Theoretical Computer Science

TL;DR: Lyndon words are used and the Lyndon structure of runs are introduced as a useful tool when computing powers and in problems related to periods some versions of the Manhattan skyline problem are used.

...read moreread less

68 citations

Journal Article•DOI•

Fast Algorithms for Approximate Circular String Matching

[...]

Carl Barton¹, Costas S. Iliopoulos², Costas S. Iliopoulos¹, Costas S. Iliopoulos³, Solon P. Pissis¹ - Show less +1 more•Institutions (3)

King's College London¹, University of Western Australia², Curtin University³

22 Mar 2014-Algorithms for Molecular Biology

TL;DR: A suboptimal average-case algorithm for exact circular string matching requiring time O(n) requiring time k=O(m/logm) for moderate values of k, and how the same results can be easily obtained under the edit distance model.

...read moreread less

Abstract: Background Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.

...read moreread less

34 citations

Journal Article•DOI•

Abelian borders in binary words

[...]

Manolis Christodoulakis¹, Michalis Christou², Maxime Crochemore², Maxime Crochemore³, Costas S. Iliopoulos², Costas S. Iliopoulos⁴ - Show less +2 more•Institutions (4)

University of Cyprus¹, King's College London², University of Paris³, Curtin University⁴

10 Jul 2014-Discrete Applied Mathematics

TL;DR: It is shown how many binary words have shortest border of a given length by identifying relations with Dyck words and some bounds on the number of abelian border-free words of agiven length are given.

...read moreread less

15 citations

Journal Article•DOI•

Optimal computation of all tandem repeats in a weighted sequence

[...]

Carl Barton¹, Costas S. Iliopoulos², Costas S. Iliopoulos¹, Costas S. Iliopoulos³, Solon P. Pissis¹ - Show less +1 more•Institutions (3)

King's College London¹, University of Western Australia², Curtin University³

16 Aug 2014-Algorithms for Molecular Biology

TL;DR: A novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal O(nlogn) time, is presented, thus improving on the best known On2-time algorithm for computing all repetitions in a weighted sequence of length n.

...read moreread less

Abstract: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment. Crochemore’s repetitions algorithm, also referred to as Crochemore’s partitioning algorithm, was introduced in 1981, and was the first optimal -time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore’s partitioning algorithm for weighted sequences, which requires optimal time, thus improving on the best known -time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

...read moreread less

13 citations

Proceedings Article•DOI•

Evaluation of credibility assessment for microblogging: models and future directions

[...]

Amal Abdullah AlMansour¹, Ljiljana Brankovic², Costas S. Iliopoulos¹•Institutions (2)

King's College London¹, University of Newcastle²

16 Sep 2014

TL;DR: A systematic review of the current developments in assessing information credibility automatically in UGC platforms, focusing on microblogging service, covers different aspects from dataset collection and feature usage, through classification techniques, to performance evaluation.

...read moreread less

Abstract: Due to their openness and low publishing barrier nature, User-Generated Content (UGC) platforms facilitate the creation of huge amounts of inaccurate content. Consequently, assessing UGC information credibility is developing into a vitally important research topic. This paper offers a systematic review of the current developments in assessing information credibility automatically in UGC platforms, focusing on microblogging service. It covers different aspects from dataset collection and feature usage, through classification techniques, to performance evaluation. A novel theoretical credibility model which integrates the evaluators' traits and context factors to assess information credibility is also presented along with important directions for future research on UGC information credibility.

...read moreread less

11 citations

Proceedings Article•DOI•

A fast and lightweight filter-based algorithm for circular pattern matching

[...]

Md. Aashikur Rahman Azim¹, Costas S. Iliopoulos², M. Sohel Rahman¹, M. Samiruzzaman²•Institutions (2)

Bangladesh University of Engineering and Technology¹, King's College London²

20 Sep 2014

TL;DR: A fast filter-based algorithm for exact Circular Pattern Matching, which solves the problem of finding all occurrences of the rotations of a pattern P of length m in a text T of length n.

...read moreread less

Abstract: Exact Circular Pattern Matching (ECPM) problem consists in finding all occurrences of the rotations of a pattern P of length m in a text T of length n. In this paper we present a fast filter-based algorithm for this problem.

...read moreread less

9 citations

Journal Article•DOI•

New simple efficient algorithms computing powers and runs in strings

[...]

Maxime Crochemore¹, Maxime Crochemore², Costas S. Iliopoulos¹, Costas S. Iliopoulos³, Marcin Kubica⁴, Jakub Radoszewski⁴, Wojciech Rytter⁵, Wojciech Rytter⁴, Krzysztof Stencel⁵, Krzysztof Stencel⁴, Tomasz Waleń⁶, Tomasz Waleń⁴ - Show less +8 more•Institutions (6)

King's College London¹, University of Paris², Curtin University³, University of Warsaw⁴, Nicolaus Copernicus University in Toruń⁵, International Institute of Minnesota⁶

01 Jan 2014-Discrete Applied Mathematics

TL;DR: Three new simple O(nlogn) time algorithms related to repeating factors and novel algorithmic solutions for several classical string problems which are much simpler than (usually quite sophisticated) linear time algorithms are presented.

...read moreread less

8 citations

Book Chapter•DOI•

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

[...]

Carl Barton¹, Costas S. Iliopoulos², Costas S. Iliopoulos¹, Solon P. Pissis¹, William F. Smyth³ - Show less +1 more•Institutions (3)

King's College London¹, University of Western Australia², McMaster University³

15 Oct 2014

TL;DR: This article introduces a new and simple data structure, the prefix table under Hamming distance, and presents two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice.

...read moreread less

Abstract: In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

...read moreread less

8 citations

Posted Content•

Covering Problems for Partial Words and for Indeterminate Strings

[...]

Maxime Crochemore¹, Maxime Crochemore², Costas S. Iliopoulos², Tomasz Kociumaka³, Jakub Radoszewski², Jakub Radoszewski³, Wojciech Rytter³, Tomasz Waleń³ - Show less +4 more•Institutions (3)

University of Paris¹, King's College London², University of Warsaw³

11 Dec 2014-arXiv: Data Structures and Algorithms

TL;DR: In this paper, it was shown that the problem of computing a shortest solid cover of an indeterminate string is NP-complete for binary alphabet and partial word covering problem is fixed-parameter tractable with respect to the number of non-solid symbols.

...read moreread less

Abstract: We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to $k$, the number of non-solid symbols. For the indeterminate string covering problem we obtain a $2^{O(k \log k)} + n k^{O(1)}$-time algorithm. For the partial word covering problem we obtain a $2^{O(\sqrt{k}\log k)} + nk^{O(1)}$-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no $2^{o(\sqrt{k})} n^{O(1)}$-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.

...read moreread less

8 citations

Posted Content•

On the Average-case Complexity of Pattern Matching with Wildcards.

[...]

Carl Barton, Costas S. Iliopoulos

03 Jul 2014-arXiv: Data Structures and Algorithms

TL;DR: These are the first results on the average-case complexity of pattern matching with wildcards which, as a by product, provide with first provable separation in complexity between exact pattern matching and pattern matchingwith wildcards in the word RAM model.

...read moreread less

Abstract: Pattern matching with wildcards is the problem of finding all factors of a text $t$ of length $n$ that match a pattern $x$ of length $m$, where wildcards (characters that match everything) may be present. In this paper we present a number of fast average-case algorithms for pattern matching where wildcards are restricted to either the pattern or the text, however, the results are easily adapted to the case where wildcards are allowed in both. We analyse the \textit{average-case} complexity of these algorithms and show the first non-trivial time bounds. These are the first results on the average-case complexity of pattern matching with wildcards which, as a by product, provide with first provable separation in complexity between exact pattern matching and pattern matching with wildcards in the word RAM model.

...read moreread less

Book Chapter•DOI•

Covering Problems for Partial Words and for Indeterminate Strings

[...]

Maxime Crochemore¹, Maxime Crochemore², Costas S. Iliopoulos³, Costas S. Iliopoulos², Tomasz Kociumaka⁴, Jakub Radoszewski⁴, Wojciech Rytter⁴, Tomasz Waleń⁴ - Show less +4 more•Institutions (4)

University of Paris¹, King's College London², University of Western Australia³, University of Warsaw⁴

15 Dec 2014

TL;DR: It is proved that both indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to $k$, the number of non-solid symbols.

...read moreread less

Abstract: We consider the problem of computing a solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don’t care symbol. We prove that both indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to $k$, the number of non-solid symbols. For the indeterminate string covering problem we obtain a $2^{\mathcal {O}(k\log k)} + n k^{\mathcal {O}(1)}$-time algorithm. For the partial word covering problem we obtain a $2^{\mathcal {O}(\sqrt{k}\log k)} + nk^{\mathcal {O}(1)}$-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no $2^{o(\sqrt{k})} n^{\mathcal {O}(1)}$-time solution exists for this problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.

...read moreread less

Journal Article•DOI•

On the average number of regularities in a word

[...]

Manolis Christodoulakis¹, Michalis Christou², Maxime Crochemore², Costas S. Iliopoulos²•Institutions (2)

University of Cyprus¹, King's College London²

01 Mar 2014-Theoretical Computer Science

TL;DR: The average number of powers and runs occurring in a word of length n drawn from an alphabet of size @s is studied and it is shown that a word contains [email protected]^(^r^-^1^)-1+o(n) powers of exponent r, at most [email-protected]+o( n) runs, and also ([email protected])n+o (n) palindromes.

...read moreread less

Journal Article•DOI•

The swap matching problem revisited

[...]

Pritom Ahmed¹, Costas S. Iliopoulos², A. S. M. Sohidull Islam¹, M. Sohel Rahman¹•Institutions (2)

Bangladesh University of Engineering and Technology¹, King's College London²

06 Nov 2014-Theoretical Computer Science

TL;DR: In this paper, a graph-theoretic model was proposed to solve the swap matching problem, and the resulting algorithms are adaptations of the classic shift-and algorithm for patterns having length similar to the word-size of the target machine.

...read moreread less

Fast algorithms for approximate circular string

[...]

Costas S. Iliopoulos, Solon P. Pissis

01 Jan 2014

TL;DR: A suboptimal average-case algorithm for exact circular string matching requiring time O(n) and two fast average- case algorithms for approximate circular string matches with k-mismatches, under the Hamming distance model are presented.

...read moreread less

Abstract: Background: Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area. Results: In this article, we present a suboptimal average-case algorithm for exact circular string matching requiring time O(n). Based on our solution for the exact case, we present two fast average-case algorithms for approximate circular string matching with k-mismatches, under the Hamming distance model, requiring time O(n) for moderate values of k ,t hat isk = O(m/logm) .W e show how the same results can be easily obtained under the edit distance model. The presented algorithms are also implemented as library functions. Experimental results demonstrate that the functions provided in this library accelerate the computations by more than three orders of magnitude compared to a naive approach. Conclusions: We present two fast average-case algorithms for approximate circular string matching with k-mismatches; and show that they also perform very well in practice. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any biological pipeline. The source code of the library is freely available at http://www.inf.kcl.ac.uk/research/projects/asmf/.

...read moreread less

Proceedings Article•DOI•

Detection of Web Spambot in the Presence of Decoy Actions

[...]

Vida Ghanaei¹, Costas S. Iliopoulos¹, Solon P. Pissis¹•Institutions (1)

King's College London¹

03 Dec 2014

TL;DR: The preliminary experimental results show that the proposed method is successful for the classification of web spam bot in the presence of decoy actions, hence eliminating spam in Web 2.0 applications.

...read moreread less

Abstract: Based on the recent research and statistics by Symantec, significant amount of all global web traffic and email traffic is marked as spam. Spambot is basically a robot that maliciously traverses the World Wide Web (WWW), and gathers information, email addresses, etc. For the spammer. The increasing growth of spam bot sophistication advances in the introduction of Spam 2.0, which infiltrate legitimate Web 2.0 unsolicited. This leads to various unwanted outcomes, such as the appearance of spam pages as the top search engines results due to excessive usage of popular terms, unreal web-pages visit rate, spam emails, and wastes of resources. Here we present an efficient method to detect web spam bot in the presence of decoy actions, by applying efficient approximate string-matching techniques. Our preliminary experimental results show that the proposed method is successful for the classification of web spam bot in the presence of decoy actions, hence eliminating spam in Web 2.0 applications.

...read moreread less

Posted Content•

Average-Case Optimal Approximate Circular String Matching

[...]

Carl Barton¹, Costas S. Iliopoulos¹, Costas S. Iliopoulos², Solon P. Pissis¹•Institutions (2)

King's College London¹, University of Western Australia²

20 Jun 2014-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors presented a new algorithm for approximate circular string matching under the edit distance model with optimal average case search time O(n(k + log m)/m).

...read moreread less

Abstract: Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time O(n(k + log m)/m). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using x and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach.

...read moreread less

Journal Article•DOI•

Extending alignments with k-mismatches and ℓ-gaps

[...]

Carl Barton¹, Costas S. Iliopoulos², In-Bok Lee³, Laurent Mouchard¹, Kunsoo Park⁴, Solon P. Pissis⁵ - Show less +2 more•Institutions (5)

King's College London¹, University of Western Australia², Korea Aerospace University³, Seoul National University⁴, Heidelberg Institute for Theoretical Studies⁵

01 Mar 2014-Theoretical Computer Science

TL;DR: A generalisation of the authors' solution to solve the problem of extending an alignment with k-mismatches and @?-gaps in time @Q([email protected]@?).

...read moreread less

DOI•

Covering problems for partial words and for indeterminate strings

[...]

Maxime Crochemore¹, Maxime Crochemore², Costas S. Iliopoulos¹, Tomasz Kociumaka³, Jakub Radoszewski³, Jakub Radoszewski¹, Wojciech Rytter³, Tomasz Waleń³ - Show less +4 more•Institutions (3)

King's College London¹, University of Paris², University of Warsaw³

01 Jan 2014

TL;DR: In this paper, the problem of finding a shortest solid string whose occurrences cover the whole indeterminate string was shown to be NP-complete for all non-standard words and even for partial words.

...read moreread less

Abstract: Indeterminate strings are a subclass of non-standard words having non-deterministic nature. In a classic string every position contains exactly one symbol—we say it is a solid symbol—while in an indeterminate string a position may contain a set of symbols (possible at this position); such sets are called non-solid symbols. The most important subclass of indeterminate strings are partial words, where each non-solid symbol is the whole alphabet; in this case non-solid symbols are also called don't care symbols. We consider the problem of finding a shortest cover of an indeterminate string, i.e., finding a shortest solid string whose occurrences cover the whole indeterminate string. We show that this classical problem becomes NP-complete for indeterminate strings and even for partial words. The proof of this fact is one of the main results of this paper. Our other main results focus on design of algorithms efficient with respect to certain parameters of the input (so called FPT algorithms) for the shortest cover problem. For the indeterminate string covering problem we obtain an O ( n k 2 + 2 k k 3 ) -time algorithm, where k is the number of non-solid symbols, while for the partial word covering problem we obtain a running time of O ( n k 2 + 2 O ( k log ⁡ k ) ) . Additionally, we prove that, unless the Exponential Time Hypothesis is false, no 2 o ( k ) n O ( 1 ) -time solution exists for either problem, which shows that our algorithm for partial words is close to optimal. We also present an algorithm for both problems parameterized both by k and the alphabet size with a simple implementation. A preliminary version of this article was presented at the 25th International Symposium on Algorithms and Computation (ISAAC 2014), LNCS, vol. 8889, pp. 220–232, Springer (2014) [12] .

...read moreread less

Book Chapter•DOI•

On-line Minimum Closed Covers

[...]

Costas S. Iliopoulos¹, Manal Mohamed¹•Institutions (1)

King's College London¹

19 Sep 2014

TL;DR: This paper presents an on-line O(n)-time algorithm to calculate the size of a minimum closed cover for each prefix of a given string w of length n and shows a method to recover a Minimum Closed Covers problem in greedy manner from right to left.

...read moreread less

Abstract: The Minimum Closed Covers problem asks us to compute a minimum size of a closed cover of given string In this paper we present an on-line O(n)-time algorithm to calculate the size of a minimum closed cover for each prefix of a given string w of length n We also show a method to recover a minimum closed cover of each prefix of w in greedy manner from right to left

...read moreread less