Showing papers by "Arlindo L. Oliveira published in 2005"

PDF

Open Access

Proceedings Article•DOI•

A highly scalable algorithm for the extraction of cis-regulatory regions

[...]

Alexandra M. Carvalho¹, Ana T. Freitas¹, Arlindo L. Oliveira¹, Marie-France Sagot•Institutions (1)

01 Jan 2005

TL;DR: A new algorithm for identifying cis-regulatory modules in genomic sequences that extracts structured motifs, defined as a collection of highly conserved regions with pre-specified sizes and spacings between them, which is extremely relevant in the research of gene regulatory mechanisms.

...read moreread less

Abstract: In this paper we propose a new algorithm for identifying cis-regulatory modules in genomic sequences. In particular, the algorithm extracts structured motifs, defined as a collection of highly conserved regions with pre-specified sizes and spacings between them. This type of motifs is extremely relevant in the research of gene regulatory mechanisms since it can e! ectively represent promoter models. The proposed algorithm uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the dataset sequences. The complexity analysis shows a time and space gain over previous algorithms that is exponential on the spacings between binding sites. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than two orders of magnitude. The application of the method to biological datasets shows its ability to extract relevant consensi.

...read moreread less

51 citations

Journal Article•DOI•

Inference of regular languages using state merging algorithms with search

[...]

Miguel Bugalho¹, Arlindo L. Oliveira¹•Institutions (1)

INESC-ID¹

01 Sep 2005-Pattern Recognition

TL;DR: This work surveys the existing approaches that generalize state merging algorithms by using search to explore the tree that represents the space of possible sequences of state mergings and presents comparisons of existing algorithms that show that the quality of the derived solutions is improved by applying this type of search.

...read moreread less

46 citations

Book Chapter•DOI•

A linear time biclustering algorithm for time series gene expression data

[...]

Sara C. Madeira¹, Arlindo L. Oliveira¹•Institutions (1)

INESC-ID¹

03 Oct 2005

TL;DR: This work proposes an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees.

...read moreread less

Abstract: Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data In this context, we are interested in finding biclusters with consecutive columns For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix This complexity is obtained by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees We report results in both synthetic and real data that show the effectiveness of the approach

...read moreread less

45 citations

Book Chapter•DOI•

An efficient algorithm for generating super condensed neighborhoods

[...]

Luís M. S. Russo¹, Arlindo L. Oliveira¹•Institutions (1)

INESC-ID¹

19 Jun 2005

TL;DR: It is pointed out that condensed neighborhoods are not a minimal representation of a pattern neighborhood, and an algorithm for generating Super Condensed Neighborhoods is presented, which takes O(m⌈ m / w ⌉ s) time and is very fast.

...read moreread less

Abstract: Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. Here, we point out that condensed neighborhoods are not a minimal representation of a pattern neighborhood. We show that we can restrict our attention to super condensed neighborhoods which are minimal. We then present an algorithm for generating Super Condensed Neighborhoods. The algorithm runs in O(m⌈ m / w ⌉ s), where m is the pattern size, s is the size of the super condensed neighborhood and w the size of the processor word. Previous algorithms took O(m⌈ m / w ⌉ c) time, where c is the size of the condensed neighborhood. We further improve this algorithm by using Bit-Parallelism and Increased Bit-Parallelism techniques. Our experimental results show that the resulting algorithm is very fast.

...read moreread less

6 citations

Book Chapter•DOI•

Using a more powerful teacher to reduce the number of queries of the l* algorithm in practical applications

[...]

André L. Martins¹, H. Sofia Pinto¹, Arlindo L. Oliveira¹•Institutions (1)

Instituto Superior Técnico¹

05 Dec 2005

TL;DR: A more powerful set of replies to the membership queries posed by the L* algorithm that reduces the number of such queries by several orders of magnitude in a practical application is defined.

...read moreread less

Abstract: In this work we propose to use a more powerful teacher to effectively apply query learning algorithms to identify regular languages in practical, real-world problems. More specifically, we define a more powerful set of replies to the membership queries posed by the L* algorithm that reduces the number of such queries by several orders of magnitude in a practical application. The basic idea is to avoid the needless repetition of membership queries in cases where the reply will be negative as long as a particular condition is met by the string in the membership query. We present an example of the application of this method to a real problem, that of inferring a grammar for the structure of technical articles.

...read moreread less

5 citations

Book Chapter•DOI•

Faster generation of super condensed neighbourhoods using finite automata

[...]

Luís M. S. Russo¹, Arlindo L. Oliveira¹•Institutions (1)

INESC-ID¹

02 Nov 2005

TL;DR: This work presents a bit-parallel algorithm based on automata which is faster, conceptually much simpler and uses less memory than the existing method.

...read moreread less

Abstract: We present a new algorithm for generating super condensed neighbourhoods. Super condensed neighbourhoods have recently been presented as the minimal set of words that represent a pattern neighbourhood. These sets play an important role in the generation phase of hybrid algorithms for indexed approximate string matching. An existing algorithm for this purpose is based on a dynamic programming approach, implemented using bit-parallelism. In this work we present a bit-parallel algorithm based on automata which is faster, conceptually much simpler and uses less memory than the existing method.

...read moreread less

2 citations