Home
/
Authors
/
Martin Farach

Author

Martin Farach

Other affiliations: University of Latvia, University of Copenhagen, University of Maryland, College Park ...read more

Bio: Martin Farach is an academic researcher from Rutgers University. The author has contributed to research in topics: Approximate string matching & Pattern matching. The author has an hindex of 35, co-authored 55 publications receiving 3591 citations. Previous affiliations of Martin Farach include University of Latvia & University of Copenhagen.

Papers published on a yearly basis

1999
1998
1997
1996
1995
1994
1993
1992
1991
1990

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Optimal suffix tree construction with large alphabets

[...]

Martin Farach¹•Institutions (1)

Rutgers University¹

19 Oct 1997

TL;DR: This work builds suffix trees in linear time for integer alphabet using Weiner's algorithm, which matches a trivial /spl Omega/(n log n)-time lower bound based on sorting.

...read moreread less

Abstract: The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. Weiner (1973), who introduced the data structure, gave an O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant size alphabet. In the comparison model, there is a trivial /spl Omega/(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound trivially. For integer alphabets, a substantial gap remains between the known upper and lower bounds, and closing this gap is the main open question in the construction of suffix trees. There is no super-linear lower bound, and the fastest known algorithm was the O(n log n) time comparison based algorithm. We settle this open problem by closing the gap: we build suffix trees in linear time for integer alphabet.

...read moreread less

426 citations

Journal Article•DOI•

Let Sleeping Files Lie

[...]

Amihood Amir¹, Gary Benson², Martin Farach³•Institutions (3)

Georgia Institute of Technology¹, University of Southern California², Rutgers University³

01 Apr 1996-Journal of Computer and System Sciences

TL;DR: In this article, the authors consider pattern matching without decompression in the UNIX Z-compression scheme and show how to modify their algorithms to achieve a trade-off between the amount of extra space used and the algorithm's time complexity.

...read moreread less

223 citations

Journal Article•DOI•

String matching in Lempel-Ziv compressed strings

[...]

Martin Farach¹, Mikkel Thorup²•Institutions (2)

Rutgers University¹, University of Copenhagen²

01 Jan 1998-Algorithmica

TL;DR: This paper gives the first nontrivial compressed matching algorithm for the classic adaptive compression scheme, the LZ77 algorithm, which is known to compress more than other dictionary compression schemes, such as LZ78 and LZW, though for strings with constant per bit entropy, all these schemes compress optimally in the limit.

...read moreread less

Abstract: String matching and compression are two widely studied areas of computer science. The theory of string matching has a long association with compression algorithms. Data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel—Ziv (LZ77) algorithm. Intuitively, once a string has been compressed—and therefore its repetitive nature has been elucidated—one might be tempted to exploit this knowledge to speed up string matching. The Compressed Matching Problem is that of performing string matching in a compressed text, without uncompressing it. More formally, let T be a text, let Z be the compressed string representing T , and let P be a pattern. The Compressed Matching Problem is that of deciding if P occurs in T , given only P and Z . Compressed matching algorithms have been given for several compression schemes such as LZW. In this paper we give the first nontrivial compressed matching algorithm for the classic adaptive compression scheme, the LZ77 algorithm. In practice, the LZ77 algorithm is known to compress more than other dictionary compression schemes, such as LZ78 and LZW, though for strings with constant per bit entropy, all these schemes compress optimally in the limit. However, for strings with o(1) per bit entropy, while it was recently shown that the LZ77 gives compression to within a constant factor of optimal, schemes such as LZ78 and LZW may deviate from optimality by an exponential factor. Asymptotically, compressed matching is only relevant if |Z|=o(|T|) , i.e., if the compression ratio |T|/|Z| is more than a constant. These results show that LZ77 is the appropriate compression method in such settings. We present an LZ77 compressed matching algorithm which runs in time O(n log 2 u/n + p) where n=|Z| , u=|T| , and p=|P| . Compare with the naive ``decompresion'' algorithm, which takes time Θ(u+p) to decide if P occurs in T . Writing u+p as (n u)/n+p , we see that we have improved the complexity, replacing the compression factor u/n by a factor log 2 u/n . Our algorithm is competitive in the sense that O(n log 2 u/n + p)=O(u+p) , and opportunistic in the sense that O(n log 2 u/n + p)=o(u+p) if n=o(u) and p=o(u) .

...read moreread less

179 citations

Proceedings Article•DOI•

On the approximability of numerical taxonomy (fitting distances by tree metrics)

[...]

Richa Agarwala¹, Vineet Bafna¹, Martin Farach¹, Babu Narayanan¹, Mike Paterson², Mikkel Thorup³ - Show less +2 more•Institutions (3)

Rutgers University¹, University of Warwick², University of Copenhagen³

28 Jan 1996

TL;DR: In this paper, the problem of fitting an n x n distance matrix D by a tree metric T was considered and an O(n sup 2) algorithm was proposed for this problem with a performance guarantee.

...read moreread less

Abstract: We consider the problem of fitting an n x n distance matrix D by a tree metric T. Let e be the distance to the closest tree metric under the Linf norm, that is e=minT{||T-D||inf}. First we present an O(n sup 2) algorithm for finding a tree metric T such that ||T-D||inf >= 3e. Second we show that it is NP-hard to find a tree metric T such that ||T-D||inf >= 9e/8. This paper presents the first algorithm for this problem with a performance guarantee.

...read moreread less

159 citations

Journal Article•DOI•

A robust model for finding optimal evolutionary trees

[...]

Martin Farach¹, Sampath Kannan², Tandy Warnow²•Institutions (2)

Rutgers University¹, University of Pennsylvania²

01 Feb 1995-Algorithmica

TL;DR: This paper presents several natural and realistic ways of modeling the inaccuracies in the distance data, and considers various ways of “fitting” a given distance matrix to a tree in order to minimize various criteria of error in the fit.

...read moreread less

Abstract: Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted treeT in which the distanced in the tree between the leaves ofT corresponding to the speciesi andj exactly equals the observed distance,d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix isadditive, and trees can be constructed from additive distance matrices in0(n 2) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but alsoultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of “fitting” a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.

...read moreread less

152 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Evolution of Protein Molecules

[...]

S. Jeffery

01 Apr 1979-Biochemical Society Transactions

3,734 citations

Journal Article•DOI•

Information Theory and Reliable Communication

[...]

D.A. Bell

01 Aug 1969-Electronics and Power

2,415 citations

Journal Article•DOI•

NeighborNet: An Agglomerative Method for the Construction of Planar Phylogenetic Networks

[...]

David Bryant, Vincent Moulton¹•Institutions (1)

Uppsala University¹

17 Sep 2002

TL;DR: Neighbor-Net is presented, a distance based method for constructing phylogenetic networks that is based on the Neighbor-Joining (NJ) algorithm of Saitou and Nei and can quickly produce detailed and informative networks for several hundred taxa.

...read moreread less

Abstract: We introduce NeighborNet, a network construction and data representation method that combines aspects of the neighbor joining (NJ) and SplitsTree. Like NJ, NeighborNet uses agglomeration: taxa are combined into progressively larger and larger overlapping clusters. Like SPLITSTREE, NeighborNet constructs networks rather than trees, and so can be used to represent multiple phylogenetic hypotheses simultaneously, or to detect complex evolutionary processes like recombination, lateral transfer and hybridization. NeighborNet tends to produce networks that are substantially more resolved than those made with SPLITSTREE. The method is efficient (O(n3) time) and is well suited for the preliminary analyses of complex phylogenetic data. We report results of three case studies: one based on mitochondrial gene order data from early branching eukaryotes, another based on nuclear sequence data from New Zealand alpine buttercups (Ranunculi), and a third on poorly corrected synthetic data.

...read moreread less

1,846 citations

Journal Article•DOI•

Sphere Packings, Lattices and Groups

[...]

Werner Fischer, Marburg

01 Feb 1990-Zeitschrift Fur Kristallographie

1,584 citations

Journal Article•DOI•

On-line construction of suffix trees

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Sep 1995-Algorithmica

TL;DR: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string, developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries.

...read moreread less

Abstract: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).

...read moreread less

1,528 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse