Showing papers by "Martin Farach published in 1995"

PDF

Open Access

Journal Article•DOI•

A robust model for finding optimal evolutionary trees

[...]

Martin Farach¹, Sampath Kannan², Tandy Warnow²•Institutions (2)

Rutgers University¹, University of Pennsylvania²

01 Feb 1995-Algorithmica

TL;DR: This paper presents several natural and realistic ways of modeling the inaccuracies in the distance data, and considers various ways of “fitting” a given distance matrix to a tree in order to minimize various criteria of error in the fit.

...read moreread less

Abstract: Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted treeT in which the distanced in the tree between the leaves ofT corresponding to the speciesi andj exactly equals the observed distance,d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix isadditive, and trees can be constructed from additive distance matrices in0(n 2) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but alsoultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of “fitting” a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.

...read moreread less

152 citations

Journal Article•DOI•

On the agreement of many trees

[...]

Martin Farach¹, Teresa M. Przytycka², Mikkel Thorup³•Institutions (3)

Rutgers University¹, Odense University², University of Copenhagen³

29 Sep 1995-Information Processing Letters

TL;DR: An algorithm which computes the MAST of k trees on n leaves where some tree has maximum outdegree d in time O( kn 3 + n d ) is given.

...read moreread less

124 citations

Journal Article•DOI•

Improved Dynamic Dictionary Matching

[...]

Amihood Amir, Martin Farach, Ramana M. Idury, J.A. Lapoutre, Alejandro A. Schäffer - Show less +1 more

01 Jun 1995-Information & Computation

TL;DR: A faster algorithm for dynamic string dictionary matching with bounded alphabets, and a novel method to efficiently manipulate failure links for two-dimensional patterns.

...read moreread less

Abstract: In the dynamic dictionary matching problem, a dictionary D contains a set of patterns that can change over time by insertion and deletion of individual patterns. The user also presents text strings and asks for all occurrences of any patterns in the text. The two main contributions of this paper are: (1) a faster algorithm for dynamic string dictionary matching with bounded alphabets, and (2) a dynamic dictionary matching algorithm for two-dimensional texts and patterns. The first contribution is based on an algorithm that solves the general problem of maintaining a sequence of well-balanced parentheses under the operations insert, delete, and find nearest enclosing parenthesis pair. The main new idea behind the second contribution is a novel method to efficiently manipulate failure links for two-dimensional patterns.

...read moreread less

114 citations

Proceedings Article•DOI•

String matching in Lempel-Ziv compressed strings

[...]

Martin Farach¹, Mikkel Thorup²•Institutions (2)

Rutgers University¹, University of Copenhagen²

29 May 1995

TL;DR: The theory of string matching has a long association with compression algorithms, and data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel—Ziv (LZ77) algorithm.

...read moreread less

Abstract: String matching and compression are two widely studied areas of computer science. The theory of string matching has a long association with compression algorithms. Data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel—Ziv (LZ77) algorithm. Intuitively, once a string has been compressed—and therefore its repetitive nature has been elucidated—one might be tempted to exploit this knowledge to speed up string matching. The Compressed Matching Problem is that of performing string matching in a compressed text, without uncompressing it. More formally, let T be a text, let Z be the compressed string representing T , and let P be a pattern. The Compressed Matching Problem is that of deciding if P occurs in T , given only P and Z . Compressed matching algorithms have been given for several compression schemes such as LZW.

...read moreread less

113 citations

Proceedings Article•DOI•

On the entropy of DNA: algorithms and measurements based on memory and rapid convergence

[...]

Martin Farach¹, Michiel O. Noordewier¹, Serap A. Savari², Larry A. Shepp³, Abraham J. Wyner⁴, Jacob Ziv⁵ - Show less +2 more•Institutions (5)

Rutgers University¹, Massachusetts Institute of Technology², Bell Labs³, Stanford University⁴, Technion – Israel Institute of Technology⁵

22 Jan 1995

TL;DR: It is proved that the match length entropy estimator has a relatively fast converge rate and it is demonstrated experimentally that by using this entropy estimators, one can indeed extract a meaningful signal from segments of DNA.

...read moreread less

Abstract: gree than the retained sequences (“exons”) We have applied the information theoretic notion of entropy to characterize DNA sequences We consider a genetic sequence signal that is too small for asymptotic entropy estimates to be accurate, and for which similar approaches have previously failed We prove that the match length entropy estimator has a relatively fast converge rate and demonstrate experimentally that by using this entropy estimator, we can indeed extract a meaningful signal from segments of DNA Further, we derive a method for detecting certain signals within DNA known as splice junctions with significantly better performance than previously known methods

...read moreread less

92 citations

Journal Article•DOI•

Fast comparison of evolutionary trees

[...]

Martin Farach¹, Mikkel Thorup¹•Institutions (1)

University of Copenhagen¹

15 Nov 1995-Information & Computation

TL;DR: This work derives an O ( n 2 + o (1) ) time algorithm for the Unrooted Maximum Agreement Subtree Problem and its rooted variant ( RMAST).

...read moreread less

Abstract: Constructing evolutionary trees for species sets is a fundamental problem in biology. Unfortunately, there is no single agreed upon method for this task, and many methods are in use. Current practice dictates that trees be constructed using different methods and that the resulting trees then be compared for consensus. It has become necessary to automate this process as the number of species under consideration has grown. We study the Unrooted Maximum Agreement Subtree Problem ( UMAST ) and its rooted variant ( RMAST ). The UMAST problem is as follows: given a set A and two trees T 0 and T 1 leaf-labeled by the elements of A , find a maximum cardinality subset B of A such that the restrictions of T 0 and T 1 to B are topologically isomorphic. Our main result is an O ( n 2 + o (1) ) time algorithm for the UMAST problem. We also derive an O ( n 2 ) time algorithm for the RMAST problem. The previous best algorithm for both these problems has running time O ( n 4.5 + o (1) ).

...read moreread less

73 citations

Journal Article•DOI•

Efficient 2-Dimensional Approximate Matching of Half-Rectangular Figures

[...]

Amihood Amir¹, Amihood Amir², Martin Farach•Institutions (2)

Bar-Ilan University¹, Georgia Institute of Technology²

01 Apr 1995-Information & Computation

TL;DR: A O(kn2 √ m logm √ k log k + k2n2) algorithm which combines convolutions with dynamic programming is shown which solves the Smaller Matching Problem and the k-Aligned Ones with Location Problem.

...read moreread less

Abstract: Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m × m pattern in an n × n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to generalize this problem to non-rectangular figures. We make progress towards this goal by defining half-rectangular figures of height m and area a. The approximate two dimensional matching problem for half-rectangular patterns can be solved using a dynamic programming approach in time O(an2). We show an O(kn2formula]formula] + k2n2) algorithm which combines convolutions with dynamic programming. Note that our algorithm is superior to previous known solutions for k ? m13. At the heart of the algorithm are the Smaller Matching Problem and the k-Aligned Ones with Location Problem. These are interesting problems in their own right. Efficient algorithms to solve both these problems are presented.

...read moreread less

58 citations

Book Chapter•DOI•

Computing the Agreement of Trees with Bounded Degrees

[...]

Martin Farach¹, Teresa M. Przytycka², Mikkel Thorup³•Institutions (3)

Rutgers University¹, Odense University², University of Copenhagen³

25 Sep 1995

TL;DR: An algorithm is given which computes the MAST of k trees on n species where some tree has maximum degree d in time O(kn3+n d ).

...read moreread less

Abstract: The Maximum Agreement Subtree (MAST) is a well-studied measure of similarity of leaf-labelled trees. There are several variants, depending on the number of trees, their degrees, and whether or not they are rooted. It turns out that the different variants display very different computational behavior. We address the common situation in biology, where the involved trees are rooted and of bounded degree, most typically simply being binary. We give an algorithm which computes the MAST of k trees on n species where some tree has maximum degree d in time O(kn3+n d ). This improves the Amir and Keselman FOCS '94 O(knd+1+n2d) bound. We give an algorithm which computes the MAST of 2 trees with degree bound d in time O(n√d log3 n). This should be contrasted with the Farach and Thorup FOCS '94 \(O(nc^{\sqrt {log n} } + n\sqrt d \log n)\) bound. Thus, for d a constant, we get an O(n log3n) bound, replacing the previous \(O(nc^{\sqrt {log n} } )\)bound.

...read moreread less

24 citations

Proceedings Article•DOI•

Optimal parallel dictionary matching and compression (extended abstract)

[...]

Martin Farach¹, Shanmugavelayut Muthukrishnan•Institutions (1)

Rutgers University¹

20 Jul 1995

TL;DR: Parallel Dictionary Matching and Compression is a parallel search algorithm that automates the very labor-intensive and therefore time-heavy and expensive and expensive process of manually cataloging words in a dictionary.

...read moreread less

Abstract: Parallel Dictionary Matching and Compression

...read moreread less

17 citations