Top 4 papers published by Eric Blais from University of Waterloo in 2006

Journal Article•DOI•

On the inference of parsimonious indel evolutionary scenarios.

[...]

Leonid Chindelevitch¹, Zhentao Li¹, Eric Blais¹, Mathieu Blanchette¹•Institutions (1)

01 Jun 2006-Journal of Bioinformatics and Computational Biology

TL;DR: This work shows that the problem of reconstructing a most parsimonious scenario of insertions and deletions capable of explaining the gaps observed in the alignment of orthologous DNA sequences is NP-complete, and provides an algorithm based on the fractional relaxation of an integer linear programming formulation.

...read moreread less

Abstract: Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing a most parsimonious scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, called the Indel Parsimony Problem, is a crucial component of the problem of ancestral genome reconstruction, and its solution provides valuable information to many genome functional annotation approaches. We first show that the problem is NP-complete. Second, we provide an algorithm, based on the fractional relaxation of an integer linear programming formulation. The algorithm is fast in practice, and the solutions it produces are, in most cases, provably optimal. We describe a divide-and-conquer approach that makes it possible to solve very large instances on a simple desktop machine, while retaining guaranteed optimality. Our algorithms are tested and shown efficient and accurate on a set of 1.8 Mb mammalian orthologous sequences in the CFTR region.

...read moreread less

30 citations

Book Chapter•DOI•

Inferring gene orders from gene maps using the breakpoint distance

[...]

Guillaume Blin¹, Eric Blais², Pierre Guillon¹, Mathieu Blanchette², Nadia El-Mabrouk³ - Show less +1 more•Institutions (3)

University of Marne-la-Vallée¹, McGill University², Université de Montréal³

24 Sep 2006

TL;DR: An NP-complete complexity result is proved and a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps is given.

...read moreread less

Abstract: Preliminary to most comparative genomics studies is the annotation of chromosomes as ordered sequences of genes. Unfortunately, different genetic mapping techniques usually give rise to different maps with unequal gene content, and often containing sets of unordered neighboring genes. Only partial orders can thus be obtained from combining such maps. However, once a total order O is known for a given genome, it can be used as a reference to order genes of a closely related species characterized by a partial order P. In this paper, the problem is to find a linearization of P that is as close as possible to O in term of the breakpoint distance. We first prove an NP-complete complexity result for this problem. We then give a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps. A time-efficient greedy heuristic is then given for the general case, with a performance higher than 90% on simulated data. Applications to the analysis of grass genomes are presented.

...read moreread less

5 citations

Journal Article•

Common substrings in random strings

[...]

Eric Blais¹, Mathieu Blanchette¹•Institutions (1)

McGill University¹

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: This work introduces two new methods for computing the probability that a word of length k exists in a set of r random strings under Bernoulli and Markov models and shows that these approximations are significantly more accurate than methods previously published.

...read moreread less

Abstract: In computational biology, an important problem is to identify a word of length k present in each of a given set of sequences. Here, we investigate the problem of calculating the probability that such a word exists in a set of r random strings. Existing methods to approximate this probability are either inaccurate when r > 2 or are restricted to Bernoulli models. We introduce two new methods for computing this probability under Bernoulli and Markov models. We present generalizations of the methods to compute the probability of finding a word of length k shared among q of r sequences, and to allow mismatches. We show through simulations that our approximations are significantly more accurate than methods previously published.

...read moreread less

2 citations

Book Chapter•DOI•

Common substrings in random strings

[...]

Eric Blais¹, Mathieu Blanchette¹•Institutions (1)

McGill University¹

05 Jul 2006

TL;DR: In this article, the problem of finding a word of length k shared among q of r random strings is investigated, and two new methods for computing this probability under Bernoulli and Markov models are presented.

...read moreread less

Abstract: In computational biology, an important problem is to identify a word of length k present in each of a given set of sequences. Here, we investigate the problem of calculating the probability that such a word exists in a set of r random strings. Existing methods to approximate this probability are either inaccurate when r > 2 or are restricted to Bernoulli models. We introduce two new methods for computing this probability under Bernoulli and Markov models. We present generalizations of the methods to compute the probability of finding a word of length k shared among q of r sequences, and to allow mismatches. We show through simulations that our approximations are significantly more accurate than methods previously published.

...read moreread less

1 citations

Showing papers by "Eric Blais published in 2006"