Showing papers by "Costas S. Iliopoulos published in 2013"

PDF

Open Access

Journal Article•DOI•

Predicting the functional consequences of non-synonymous DNA sequence variants--evaluation of bioinformatics tools and development of a consensus strategy.

[...]

Kimon Frousios¹, Costas S. Iliopoulos¹, Thomas Schlitt¹, Michael A. Simpson¹•Institutions (1)

King's College London¹

03 Jul 2013-Genomics

TL;DR: It is demonstrated that the CoVEC approach outperforms most individual methods and highlights the benefit of combining results from multiple tools.

...read moreread less

92 citations

Posted Content•

Order Preserving Matching

[...]

Jinil Kim¹, Peter Eades², Rudolf Fleischer³, Seok-Hee Hong², Costas S. Iliopoulos⁴, Kunsoo Park¹, Simon J. Puglisi⁵, Takeshi Tokuyama⁶ - Show less +4 more•Institutions (6)

Seoul National University¹, University of Sydney², German University of Technology in Oman³, King's College London⁴, University of Helsinki⁵, Tohoku University⁶

17 Feb 2013-arXiv: Data Structures and Algorithms

TL;DR: Order-preserving matching on numeric strings was introduced in this article, where a pattern matches a text if the text contains a substring whose relative orders coincide with those of the pattern.

...read moreread less

Abstract: We introduce a new string matching problem called order-preserving matching on numeric strings where a pattern matches a text if the text contains a substring whose relative orders coincide with those of the pattern. Order-preserving matching is applicable to many scenarios such as stock price analysis and musical melody matching in which the order relations should be matched instead of the strings themselves. Solving order-preserving matching has to do with representations of order relations of a numeric string. We define prefix representation and nearest neighbor representation, which lead to efficient algorithms for order-preserving matching. We present efficient algorithms for single and multiple pattern cases. For the single pattern case, we give an O(n log m) time algorithm and optimize it further to obtain O(n + m log m) time. For the multiple pattern case, we give an O(n log m) time algorithm.

...read moreread less

75 citations

Book Chapter•DOI•

Order-Preserving Incomplete Suffix Trees and Order-Preserving Indexes

[...]

Maxime Crochemore¹, Costas S. Iliopoulos², Tomasz Kociumaka³, Marcin Kubica³, Alessio Langiu¹, Solon P. Pissis⁴, Jakub Radoszewski³, Wojciech Rytter³, Tomasz Waleń⁵ - Show less +5 more•Institutions (5)

King's College London¹, University of Western Australia², University of Warsaw³, Florida Museum of Natural History⁴, International Institute of Minnesota⁵

07 Oct 2013

TL;DR: In this article, an O(n log logn) time algorithm was proposed to construct an index that enables order-preserving pattern matching queries in time proportional to pattern length.

...read moreread less

Abstract: Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted to Theor. Comp. Sci.) introduced order-preserving pattern matching: for a given text the goal is to find its factors having the same 'shape' as a given pattern. Known results include a linear-time algorithm for this problem (in case of polynomially-bounded alphabet) and a generalization to multiple patterns. We give an O(nloglogn) time construction of an index that enables order-preserving pattern matching queries in time proportional to pattern length. The main component is a data structure being an incomplete suffix tree in the order-preserving setting. The tree can miss single letters related to branching at internal nodes. Such incompleteness results from the weakness of our so called weak character oracle. However, due to its weakness, such oracle can answer queries on-line in O(loglogn) time using a sliding-window approach. For most of the applications such incomplete suffix-trees provide the same functional power as the complete ones. We also give an $O(\frac{n\log{n}}{\log\log{n}})$ time algorithm constructing complete order-preserving suffix trees.

...read moreread less

47 citations

Journal Article•

IACSIT International Journal of Engineering and Technology

[...]

Mudhi Aljamea, Costas S. Iliopoulos, Ali Alatabbi

01 Jan 2013-International journal of engineering and technology

44 citations

Journal Article•DOI•

Enhanced string covering

[...]

Tomas Flouri¹, Costas S. Iliopoulos², Tomasz Kociumaka³, Solon P. Pissis¹, Simon J. Puglisi⁴, William F. Smyth², Wojciech Tyczyński³ - Show less +3 more•Institutions (4)

Heidelberg Institute for Theoretical Studies¹, University of Western Australia², University of Warsaw³, King's College London⁴

01 Sep 2013-Theoretical Computer Science

TL;DR: New, simple, easily-computed, and widely applicable notions of string covering that provide an intuitive and useful characterisation of a string are proposed: the enhanced cover; the enhanced left cover; and the enhancedleft seed.

...read moreread less

35 citations

Book Chapter•DOI•

Suffix Tree of Alignment: An Efficient Index for Similar Data

[...]

Joong Chae Na¹, Heejin Park², Maxime Crochemore³, Jan Holub⁴, Costas S. Iliopoulos³, Laurent Mouchard⁵, Kunsoo Park⁶ - Show less +3 more•Institutions (6)

Sejong University¹, Hanyang University², King's College London³, Czech Technical University in Prague⁴, University of Rouen⁵, Seoul National University⁶

10 Jul 2013

TL;DR: This work considers an index data structure for similar strings and the generalized suffix tree, a compacted trie representing all suffixes in A and B, which is a solution for this.

...read moreread less

Abstract: We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A| + |B| leaves and can be constructed in O(|A| + |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of A and B.

...read moreread less

27 citations

Journal Article•DOI•

A note on efficient computation of all Abelian periods in a string

[...]

Maxime Crochemore¹, Costas S. Iliopoulos², Tomasz Kociumaka³, Marcin Kubica³, Jakub Pachocki³, Jakub Radoszewski³, Wojciech Rytter³, Wojciech Tyczyński³, Tomasz Waleń⁴ - Show less +5 more•Institutions (4)

King's College London¹, Curtin University², University of Warsaw³, International Institute of Minnesota⁴

01 Feb 2013-Information Processing Letters

TL;DR: In this article, a linear time algorithm for finding all Abelian periods in a string is presented. But the algorithm is based on a reduction of the problem of all Abelians periods to that of (already solved) Abelian squares which provides new insight into both connected problems.

...read moreread less

25 citations

Journal Article•DOI•

Computing the Longest Previous Factor

[...]

Maxime Crochemore¹, Lucian Ilie², Costas S. Iliopoulos³, Marcin Kubica⁴, Wojciech Rytter, Tomasz Waleń⁴ - Show less +2 more•Institutions (4)

University of Paris¹, University of Western Ontario², Curtin University³, University of Warsaw⁴

01 Jan 2013-The Journal of Combinatorics

TL;DR: This work gives the first time-space optimal algorithm that computes the Longest Previous Factor array, given the Suffix Array and the Longmost Common Prefix array.

...read moreread less

Abstract: The Longest Previous Factor array gives, for each position i in a string y , the length of the longest factor (substring) of y that occurs both at i and to the left of i in y . The Longest Previous Factor array is central in many text compression techniques as well as in the most efficient algorithms for detecting motifs and repetitions occurring in a text. Computing the Longest Previous Factor array requires usually the Suffix Array and the Longest Common Prefix array. We give the first time-space optimal algorithm that computes the Longest Previous Factor array, given the Suffix Array and the Longest Common Prefix array. We also give the first linear-time algorithm that computes the permutation that applied to the Longest Common Prefix array produces the Longest Previous Factor array.

...read moreread less

24 citations

Proceedings Article•

Maximal Palindromic Factorization

[...]

Ali Alatabbi¹, Costas S. Iliopoulos², Mohammad Sohel Rahman•Institutions (2)

King's College London¹, Bangladesh University of Engineering and Technology²

01 Jan 2013

TL;DR: An algorithm for maximal palindromic factorization of a finite string is presented by adapting an Gusfield algorithm for detecting all occurrences of maximalPalindromes in a string in linear time to the length of the given string then using the breadth first search (BFS) to find the maximal palINDromicfactorization set.

...read moreread less

Abstract: A palindrome is a symmetric string, phrase, number, or other sequence of units sequence that reads the same forward and backward. We present an algorithm for maximal palindromic factorization of a finite string by adapting an Gusfield algorithm [15] for detecting all occurrences of maximal palindromes in a string in linear time to the length of the given string then using the breadth first search (BFS) to find the maximal palindromic factorization set. A factorization F of s with respect to S refers to a decomposition of s such that s = si1si2 · · · sil where sij ∈ S and l is minimum. In this context the set S is referred to as the factorization set. In this paper, we tackle the following problem. Given a string s, find the maximal palindromic factorization of s, that is a factorization of s where the factorization set is the set of all center-distinct maximal palindromes of a string s MP(s).

...read moreread less

17 citations

Journal Article•DOI•

Efficient seed computation revisited

[...]

Michalis Christou¹, Maxime Crochemore¹, Costas S. Iliopoulos², Marcin Kubica³, Solon P. Pissis¹, Jakub Radoszewski³, Wojciech Rytter³, Bartosz Szreder³, T. Wale³ - Show less +5 more•Institutions (3)

King's College London¹, Curtin University², University of Warsaw³

01 Apr 2013-Theoretical Computer Science

TL;DR: In this paper, the shortest seed problem is solved in O(n log n/m) time, where m is the length of a seed and n is the number of prefixes in the string.

...read moreread less

15 citations

DOI•

Formal Aspects of Computing

[...]

Costas S. Iliopoulos, Solon P. Pissis

01 Aug 2013

TL;DR: The principal aim of this journal is to promote the growth of computing science, to show its relation to practice and to stimulate applications of apposite formalisms to practical problems.

...read moreread less

Abstract: This journal aims to publish contributions at the junction of theory and practice. The objective is to disseminate applicable research. Thus new theoretical contributions are welcome where they are motivated by potential application; applications of existing formalisms are of interest if they show something novel about the approach or application. The term "formal methods" has been applied to a range of notations, theories and tools. There is no doubt that some of these have already had a significant impact on practical applications of computing. Indeed, it is interesting to note that once something is adopted into practical use it is no longer thought of as a formal method. Apart from widely used notations such as those for syntax and state machines, there have been significant applications of specification notations, development methods and tools both for proving general results and for searching for specific conditions. However, the most profound and lasting influence of the formal approach is the way it has illuminated fundamental concepts like those of communication. In this spirit, the principal aim of this journal is to promote the growth of computing science, to show its relation to practice and to stimulate applications of apposite formalisms to practical problems. One significant challenge is to show how a range of formal models can be related to each other.

...read moreread less

Journal Article•DOI•

Locating tandem repeats in weighted sequences in proteins

[...]

Hui Zhang¹, Qing Guo², Costas S. Iliopoulos³•Institutions (3)

Zhejiang University of Technology¹, Zhejiang University², King's College London³

09 May 2013-BMC Bioinformatics

TL;DR: By introducing the idea of equivalence classes in weighted sequences, this work identifies the tandem repeats of every possible length using an iterative partitioning technique and proves that the problem can be solved in O(n2) time.

...read moreread less

Abstract: A weighted biological sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. We attempt to locate all the tandem repeats in a weighted sequence. A repeated substring is called a tandem repeat if each occurrence of the substring is directly adjacent to each other. By introducing the idea of equivalence classes in weighted sequences, we identify the tandem repeats of every possible length using an iterative partitioning technique. We also present the algorithm for recording the tandem repeats, and prove that the problem can be solved in O(n2) time.

...read moreread less

Journal Article•DOI•

Bipartite Ramsey numbers involving stars, stripes and trees

[...]

Michalis Christou¹, Costas S. Iliopoulos¹, Mirka Miller²•Institutions (2)

King's College London¹, University of Newcastle²

13 Nov 2013

TL;DR: This work investigates the appearance of simpler monochromatic graphs such as stripes, stars and trees under a 2-colouring of the edges of a bipartite graph.

...read moreread less

Abstract: The Ramsey number R(m, n) is the smallest integer p such that any blue-red colouring of the edges of the complete graph Kp forces the appearance of a blue Km or a red Kn. Bipartite Ramsey problems deal with the same questions but the graph explored is the complete bipartite graph instead of the complete graph. We consider special cases of the bipartite Ramsey problem. More specifically we investigate the appearance of simpler monochromatic graphs such as stripes, stars and trees under a 2-colouring of the edges of a bipartite graph. We give the Ramsey numbers Rb(mP2, nP2), Rb(Tm, Tn), Rb(Sm, nP2), Rb(Tm, nP2) and Rb(Sm, Tn).

...read moreread less

Proceedings Article•DOI•

GapsMis: flexible sequence alignment with a bounded number of gaps

[...]

Carl Barton¹, Tomas Flouri², Costas S. Iliopoulos¹, Solon P. Pissis³•Institutions (3)

King's College London¹, Heidelberg Institute for Theoretical Studies², Florida Museum of Natural History³

22 Sep 2013

TL;DR: Millions of pairwise sequence alignments, performed under realistic conditions based on the properties of real full-length genomes, show that GapsMis can increase the accuracy of extending short-read alignments end-to-end compared to more traditional approaches.

...read moreread less

Abstract: Motivation: Recent developments in next-generation sequencing technologies have renewed interest in pairwise sequence alignment techniques, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and the high-quality fragment of a short read, an important problem is to find the best possible alignment between a succeeding factor of the reference sequence and the remaining low-quality part of the read; allowing a number of mismatches and the insertion of gaps in the alignment. Results: We present GapsMis, a tool for pairwise global and semi-global sequence alignment with a variable, but bounded, number of gaps. It is based on a new algorithm, which computes a different version of the traditional dynamic programming matrix. Millions of pairwise sequence alignments, performed under realistic conditions based on the properties of real full-length genomes, show that GapsMis can increase the accuracy of extending short-read alignments end-to-end compared to more traditional approaches. Availability: http://www.exelixis-lab.org/gapmis

...read moreread less

Journal Article•DOI•

GapMis: a Tool for Pairwise Sequence Alignment with a Single Gap

[...]

Tomas Flouri¹, Kimon Frousios², Costas S. Iliopoulos², Kunsoo Park³, Solon P. Pissis⁴, German Tischler⁵ - Show less +2 more•Institutions (5)

Czech Technical University in Prague¹, King's College London², Seoul National University³, Heidelberg Institute for Theoretical Studies⁴, University of Würzburg⁵

31 Jul 2013-Recent Patents on Dna & Gene Sequences

TL;DR: The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task, and based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix.

...read moreread less

Abstract: Motivation: Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment Results: We present GapMis, a tool for pairwise sequence alignment with a single gap It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task

...read moreread less

Posted Content•

Order-Preserving Suffix Trees and Their Algorithmic Applications

[...]

Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń - Show less +5 more

27 Mar 2013-arXiv: Data Structures and Algorithms

TL;DR: A linear-time order-preserving pattern matching algorithm for polynomially-bounded alphabet and an extension of this result to pattern matching with multiple patterns, and a number of applications of order- Preserving suffix trees to identify patterns and repetitions in time series.

...read moreread less

Abstract: Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted to Theor. Comp. Sci.) introduced order-preserving pattern matching. In this problem we are looking for consecutive substrings of the text that have the same "shape" as a given pattern. These results include a linear-time order-preserving pattern matching algorithm for polynomially-bounded alphabet and an extension of this result to pattern matching with multiple patterns. We make one step forward in the analysis and give an $O(\frac{n\log{n}}{\log\log{n}})$ time randomized algorithm constructing suffix trees in the order-preserving setting. We show a number of applications of order-preserving suffix trees to identify patterns and repetitions in time series.

...read moreread less

Book Chapter•DOI•

Generic algorithms for factoring strings

[...]

David E. Daykin¹, Jacqueline W. Daykin², Costas S. Iliopoulos³, William F. Smyth⁴•Institutions (4)

University of Reading¹, Royal Holloway, University of London², University of Western Australia³, McMaster University⁴

01 Jan 2013

TL;DR: Generic RAM and PRAM algorithms for factoring words over sets of strings known as circ-UMFFs are described, generalizations of the well-known Lyndon words based on lexorder, whose properties were first studied in 1958 by Chen, Fox and Lyndon.

...read moreread less

Abstract: In this paper we describe algorithms for factoring words over sets of strings known as circ-UMFFs, generalizations of the well-known Lyndon words based on lexorder, whose properties were first studied in 1958 by Chen, Fox and Lyndon. In 1983 Duval designed an elegant linear-time sequential (RAM) Lyndon factorization algorithm; a corresponding parallel (PRAM) algorithm was described in 1994 by Daykin, Iliopoulos and Smyth. In 2003 Daykin and Daykin introduced various circ-UMFFs, including one based on V-words and V-ordering; in 2011 linear string comparison and sequential factorization algorithms based on V-order were given by Daykin, Daykin and Smyth. Here we first describe generic RAM and PRAM algorithms for factoring a word over any circ-UMFF; then we show how to customize these generic algorithms to yield optimal parallel Lyndon-like V-word factorization.

...read moreread less

Posted Content•

A Note on the Longest Common Compatible Prefix Problem for Partial Words

[...]

Maxime Crochemore¹, Costas S. Iliopoulos², Tomasz Kociumaka³, Marcin Kubica³, Alessio Langiu⁴, Jakub Radoszewski³, Wojciech Rytter³, Bartosz Szreder³, Tomasz Waleń³ - Show less +5 more•Institutions (4)

University of Paris¹, Curtin University², University of Warsaw³, National Research Council⁴

09 Dec 2013-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the longest common compatible prefix (LCP) problem for regular words has been solved in O(n) and O(1) query time, respectively, using ideas from alignment algorithms and dynamic programming.

...read moreread less

Abstract: For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]\uparrow w[j,j+k-1]$, where $\uparrow$ is the compatibility relation of partial words (it is not an equivalence relation). The LCCP problem is to preprocess a partial word in such a way that any query $lccp(i,j)$ about this word can be answered in $O(1)$ time. It is a natural generalization of the longest common prefix (LCP) problem for regular words, for which an $O(n)$ preprocessing time and $O(1)$ query time solution exists. Recently an efficient algorithm for this problem has been given by F. Blanchet-Sadri and J. Lazarow (LATA 2013). The preprocessing time was $O(nh+n)$, where $h$ is the number of "holes" in $w$. The algorithm was designed for partial words over a constant alphabet and was quite involved. We present a simple solution to this problem with slightly better runtime that works for any linearly-sortable alphabet. Our preprocessing is in time $O(n\mu+n)$, where $\mu$ is the number of blocks of holes in $w$. Our algorithm uses ideas from alignment algorithms and dynamic programming.

...read moreread less

Proceedings Article•

Static Analysis and Clustering of Malware Applying Text Based Search

[...]

Mudhi Aljamea, Costas S. Iliopoulos, Richard E. Overill, Vida Ghanaei¹•Institutions (1)

King's College London¹

01 Jan 2013

TL;DR: This paper proposes a static analysis approach using text based search technique, control flow graph, hashing, and machine learning to cluster malware variants accordingly.

...read moreread less

Abstract: Malware is computer software with the harmful intension to both computers and networks. Anti-virus companies receive extensive amount of malware variants daily, therefore there is an essential need to automatically cluster malware variants into their corresponding family in order to reduce the effort and time on manual analysis. As malware variants which belong to the same family, share certain amount of code, we classify them into the same cluster based on the shared features that we extract from them. In this paper we propose a static analysis approach using text based search technique, control flow graph, hashing, and machine learning to cluster malware variants accordingly. However, this is an ongoing work, but we will be able to explain our methodology and the preliminary results achieved.

...read moreread less

Proceedings Article•DOI•

Comparison for the detection of Virus and spam using pattern matching tools

[...]

Mourad Elloumi¹, Pedram Hayati², Costas S. Iliopoulos³, Jalil Asghar Mirza⁴, Solon P. Pissis, Arfaat Shah³ - Show less +2 more•Institutions (4)

Tunis University¹, BAE Systems², King's College London³, University of Central Punjab⁴

09 May 2013

TL;DR: The experimental results show that the proposed system is successful for on-the-fly classification of web spambots and computer viruses hence eliminating spam in web 2.0 applications and detecting infected files in computers.

...read moreread less

Abstract: In this paper, we describe REAL: An efficient Read Aligner for next generation sequencing reads structures to detect and compare the results of web spambots and Viruses. Email spam, also known as junk email or unsolicited bulk email (UBE), is a subset of electronic spam involving nearly identical messages sent to numerous recipients by email. In the last decade or so, Web spam has emerged to be a bigger than previous thought problem. It not only wastes resources, misleads people but also has the ability to trick search algorithms to gain unfair search result ranking, hence resulting in the decrease of quality and reliability of the World Wide Web (WWW) and its content. The Internet brings a new dimension to the virus problem. Before, viruses generally spread from system to system on physical media, often the floppy disk. This is a fundamentally slow way for viruses to spread. The Internet changes all this. The viruses that really win in the Internet environment are the macro viruses. They are attached to data, not code, making them harder to avoid. An increasing number of documents on the Net are available as Word files, for example, with no alternative format, and Word documents are frequently exchanged via email. Our experimental results show that the proposed system is successful for on-the-fly classification of web spambots and computer viruses hence eliminating spam in web 2.0 applications and detecting infected files in computers. Our comparison shows it is slightly harder to detect viruses due to nature of the complexity and especially if they have an executable packing to dodge antivirus engines.

...read moreread less

Journal Article•DOI•

Querying highly similar sequences.

[...]

Carl Barton¹, Mathieu Giraud², Costas S. Iliopoulos¹, Thierry Lecroq³, Laurent Mouchard¹, Solon P. Pissis⁴ - Show less +2 more•Institutions (4)

King's College London¹, French Institute for Research in Computer Science and Automation², University of Rouen³, Florida Museum of Natural History⁴

21 Feb 2013-International Journal of Computational Biology and Drug Design

TL;DR: An asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O( nk/w) time algorithms for solving the extreme similarity sequencing problem.

...read moreread less

Abstract: In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S0, S1, …, Sk, of sequences of equal length, where Si, for all 1≤i≤k, differs from S0 by a constant number of errors – around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k – the number of sequences.

...read moreread less

Journal Article•DOI•

Transcriptome map of mouse isochores in embryonic and neonatal cortex

[...]

Kimon Frousios¹, Costas S. Iliopoulos², German Tischler³, Sophia Kossida⁴, Solon P. Pissis⁵, Stilianos Arhondakis⁴ - Show less +2 more•Institutions (5)

King's College London¹, University of Western Australia², University of Würzburg³, Academy of Athens⁴, Heidelberg Institute for Theoretical Studies⁵

01 Feb 2013-Genomics

TL;DR: Using RNA-seq data from two distinct developmental stages of the mouse cortex, embryonic day 18 (E18) and postnatal day 7 (P7), this work established for the first time a developmental-related transcriptome map of the Mouse isochores and estimated the correlation between isochore' GC level and their expression activity, and the genes' expression patterns for each isochORE family.

...read moreread less

Book Chapter•DOI•

Algorithms For Next‐Generation Sequencing Data

[...]

Costas S. Iliopoulos¹, Costas S. Iliopoulos², Solon P. Pissis¹•Institutions (2)

King's College London¹, Curtin University²

27 Dec 2013

Proceedings Article•

Circular string matching revisited

[...]

Carl Barton¹, Costas S. Iliopoulos², Solon P. Pissis³•Institutions (3)

King's College London¹, Bangladesh University of Engineering and Technology², University of Florida³

01 Jan 2013

Journal Article•

Overlapping factors in words

[...]

Manolis Christodoulakis¹, Michalis Christou, Maxime Crochemore, Costas S. Iliopoulos•Institutions (1)

University of Cyprus¹

01 Jan 2013-The Australasian Journal of Combinatorics

TL;DR: A linear time algorithm is proposed for the identification of all overlapping factors of a word, the appearance of overlapping factors in Fibonacci words is investigated, and some bounds on the maximum number of distinct overlap factors in a word are provided.

...read moreread less

Abstract: The concept of quasiperiodicity is a generalization of the notion of periodicity where in contrast to periodicity the quasiperiods of a quasiperiodic string may overlap. A lot of research has been concentrated around algorithms for the computation of quasiperiodicities in strings while not much is known about bounds on their maximum number of occurrences in words. We study the overlapping factors of a word as a means to provide more insight into quasiperiodic structures of words. We propose a linear time algorithm for the identification of all overlapping factors of a word, we investigate the appearance of overlapping factors in Fibonacci words and we provide some bounds on the maximum number of distinct overlapping factors in a word.

...read moreread less

Journal Article•DOI•

Malware Detection using Computational Biology Tools

[...]

Ali Alatabbi, Moudhi Aljamea, Costas S. Iliopoulos

01 Jan 2013-International journal of engineering and technology

TL;DR: Experimental results shows that the proposed system is efficient and it is a novel way for detecting malware code embedded in different types of computer files, using bioinformatics tools with consistency and accuracy in detecting the malware and it was able to complete the assignment in high speed without excessive memory usages.

...read moreread less

Abstract: The Internet is considered to be as a rich platform of information where many people get benefit from its access but still they are being attacked by computer malwares and various other threats which distract their normal work flow to be carried out in an efficient manner. In this paper, we give an overview of the efficient read aligner software termed as REAL which is used for next generation sequencing. It reads structures as a tool to detect computer Malware. Using this tools a dynamic computer malware detection model has been presented in this paper that can detect the malwares to prevent attacks which might cause damaging or stealing sensitive information. This model is inspired by REAL which is an efficient read aligner for next generation sequencing for processing biological data. New anti-Malware technologies are introduced to the world by the clock, but at the same time new malware techniques have also emerged to misuse these technologies. Experimental results of this study shows that the proposed system is efficient and it is a novel way for detecting malware code embedded in different types of computer files, using bioinformatics tools with consistency and accuracy in detecting the malware and it was able to complete the assignment in high speed without excessive memory usages.

...read moreread less

Journal Article•DOI•

Tree template matching in unranked ordered trees

[...]

Michalis Christou¹, Tomas Flouri², Costas S. Iliopoulos³, Jan Janousek², Bořivoj Melichar², Solon P. Pissis⁴, Jan áRek² - Show less +3 more•Institutions (4)

King's College London¹, Czech Technical University in Prague², University of Western Australia³, Heidelberg Institute for Theoretical Studies⁴

01 May 2013-Journal of Discrete Algorithms

TL;DR: The tree pattern matching problem for unranked ordered trees is transformed to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix bar notation, and a table-driven algorithm is proposed to solve it.

...read moreread less

Book Chapter•DOI•

Identification of All Exact and Approximate Inverted Repeats in Regular and Weighted Sequences

[...]

Carl Barton¹, Costas S. Iliopoulos¹, Costas S. Iliopoulos², Nicola Mulder³, Bruce W. Watson² - Show less +1 more•Institutions (3)

King's College London¹, University of Pretoria², University of Cape Town³

13 Sep 2013

TL;DR: The detection of various types of repeats is a fundamental and well studied problem in stringology and extensions to this problem with applications to bioinformatics are presented.

...read moreread less

Abstract: The detection of various types of repeats is a fundamental and well studied problem in stringology. In this paper we present extensions to this problem with applications to bioinformatics. In this paper we consider the detection of all exact and approximate inverted repeats, as well as all exact and approximate weighted inverted repeats and give efficient algorithms for their computation.

...read moreread less

Posted Content•

Suffix Tree of Alignment: An Efficient Index for Similar Data

[...]

Joong Chae Na¹, Heejin Park², Maxime Crochemore³, Jan Holub⁴, Costas S. Iliopoulos³, Laurent Mouchard⁵, Kunsoo Park⁶ - Show less +3 more•Institutions (6)

Sejong University¹, Hanyang University², King's College London³, Czech Technical University in Prague⁴, University of Rouen⁵, Seoul National University⁶

08 May 2013-arXiv: Data Structures and Algorithms

TL;DR: In this article, a space/time-efficient suffix tree of alignment is proposed, which wisely exploits the similarity in an alignment of two similar strings and can be constructed in O(|A|+|B|) time.

...read moreread less

Abstract: We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings $A$ and $B$ is a compacted trie representing all suffixes in $A$ and $B$. It has $|A|+|B|$ leaves and can be constructed in $O(|A|+|B|)$ time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of $A$ and $B$. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of $A$ and $B$ has $|A| + l_d + l_1$ leaves where $l_d$ is the sum of the lengths of all parts of $B$ different from $A$ and $l_1$ is the sum of the lengths of some common parts of $A$ and $B$. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern $P$ in $O(|P|+occ)$ time where $occ$ is the number of occurrences of $P$ in $A$ and $B$. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires $O(|A| + l_d + l_1 + l_2)$ time where $l_2$ is the sum of the lengths of other common substrings of $A$ and $B$. When the suffix tree of $A$ is already given, it requires $O(l_d + l_1 + l_2)$ time.

...read moreread less

Journal Article•DOI•

Degree/diameter problem for trees and pseudotrees

[...]

Michalis Christou, Costas S. Iliopoulos, Mirka Miller

01 Jan 2013-AKCE International Journal of Graphs and Combinatorics

TL;DR: In this article, the degree/diameter problem on trees was considered for Cayley trees, caterpillars, lobsters, banana trees and firecracker trees, as well as for tree-like structures such as pseudotrees.

...read moreread less