scispace - formally typeset
Journal ArticleDOI

On-line construction of suffix trees

Esko Ukkonen
- 01 Sep 1995 - 
- Vol. 14, Iss: 3, pp 249-260
TLDR
An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string, developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries.
Abstract: 
An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).

read more

Citations
More filters
Journal ArticleDOI

A guided tour to approximate string matching

TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.
Proceedings ArticleDOI

Winnowing: local algorithms for document fingerprinting

TL;DR: The class of local document fingerprinting algorithms is introduced, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies, and a novel lower bound on the performance of any local algorithm is proved.
Proceedings ArticleDOI

Web document clustering: a feasibility demonstration

TL;DR: To satisfy the stringent requirements of the Web domain, an incremental, linear time algorithm called Suffix Tree Clustering (STC) is introduced which creates clusters based on phrases shared between documents, showing that STC is faster than standard clustering methods in this domain.
Proceedings ArticleDOI

The spectrum kernel: a string kernel for SVM protein classification.

TL;DR: A new sequence-similarity kernel, the spectrum kernel, is introduced for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem and performs well in comparison with state-of-the-art methods for homology detection.
Journal ArticleDOI

Alignment of whole genomes

TL;DR: Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides and should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications.
References
More filters
Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Proceedings ArticleDOI

Linear pattern matching algorithms

Peter Weiner
TL;DR: A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented and indicated how to solve several pattern matching problems, including some from [4] in linear time.
Journal ArticleDOI

A Space-Economical Suffix Tree Construction Algorithm

Edward M. McCreight
- 01 Apr 1976 - 
TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.
Journal ArticleDOI

The smallest automaton recognizing the subwords of a text

TL;DR: In this article, the smallest partial DFA for the set of all subwords of a given word w, Iwl>2, has at most 21w(-2 states and 3wl-4 transition edges, independently of the alphabet size.
Book ChapterDOI

The Myriad Virtues of Subword Trees

TL;DR: Several nontrivial applications of subword trees have been developed since their first appearance as mentioned in this paper, some of which depart considerably from the original motivations of the subword tree's purpose.