scispace - formally typeset
Search or ask a question

Showing papers on "Trie published in 1989"


Proceedings ArticleDOI
30 Oct 1989
TL;DR: An O(nM/sup 0.75/ polylog(m))-step algorithm for tree pattern matching problem is designed and the problems of linear string matching with don't care symbols and linear string max-min convolution are treated.
Abstract: A classic open problem on tree pattern matching is whether the naive O(mn)-step algorithm for finding all the occurrences of a pattern tree of size m in a text tree of size n can be improved. An O(nM/sup 0.75/ polylog(m))-step algorithm for this tree pattern matching problem is designed. The problems of linear string matching with don't care symbols and linear string max-min convolution are treated. >

133 citations


Proceedings ArticleDOI
06 Feb 1989
TL;DR: The design of the cell tree, an object-oriented dynamic index structure for geometric databases, is described, which is designed for paged secondary memory to minimize the number of disk accesses occurring during a tree search.
Abstract: The design of the cell tree, an object-oriented dynamic index structure for geometric databases, is described. The data objects in the database are represented as unions of convex point sets (cells). The cell tree is a balanced tree structure whose leaves contain the cells and whose interior nodes correspond to a hierarchy of nested convex polyhedra. This index structure allows quick access to the cells (and thereby to the data objects) that occupy a given location in space. The cell tree is designed for paged secondary memory to minimize the number of disk accesses occurring during a tree search. Point locations and range searches can be carried out very efficiently using the cell tree. >

123 citations


Patent
28 Sep 1989
TL;DR: A prefix index tree as discussed by the authors is a tree structure for locating data records stored through keys related to information stored in data records, where each node includes a prefix field for a prefix string of length p of the longest string of key characters shared by all subtrees of the node.
Abstract: A prefix index tree structure for locating data records stored through keys related to information stored in data records. Each node includes a prefix field for a prefix string of length p of the longest string of key characters shared by all subtrees of the node and a data record field for a reference to a data record whose key is completed by the prefix string. A node may include one or more branch fields when the prefix string is a prefix of keys stored in at least one subtree of the node, with a branch field for each distinct p+1 st key character in the keys, wherein each p+1 st key character is a branch character. Each branch field includes a branch character and a branch pointer field for a reference to a node containing at least one key whose p+1 st character is the branch character. Each node further includes a field for storing the number of key characters in the prefix string and a field for storing the number of branch fields in the node. Also disclosed are methods for constructing and searching a prefix index tree of the present invention, and for inserting nodes into the tree and deleting nodes from the tree.

89 citations


Journal ArticleDOI
TL;DR: A precise asymptotic expansion of the variance of the size of a trie built on random binary strings is presented, and numerical results are given.
Abstract: A precise asymptotic expansion of the variance of the size of a trie built on random binary strings is presented. This data structure appears in some hashing schemes and communications protocols. The variance is asymptotically linear, and numerical results are given. The reader is referred to an earlier work for formal proofs. >

65 citations


Journal ArticleDOI
TL;DR: It is proved that for the binary symmetric trie the variance is asymptotically equal to 4.35…·n+nf(log2n) where n is the number of stored records and f(x) is a periodic function with a very small amplitude.

61 citations


Journal ArticleDOI
TL;DR: The results show that the variable-depth Tries constructed according to the proposed algorithms are viable and efficient for indexing large-to-very-large files by attributes in practical applications.
Abstract: We develop an efficient approach to Trie index optimization A Trie is a data structure used to index a file having a set of attributes as record identifiers In the proposed methodology, a file is horizontally partitioned into subsets of records using a Trie index whose depth of indexing is allowed to vary The retrieval of a record from the file proceeds by “stepping through” the index to identify a subset of records in the file in which a binary search is performed This paper develops a taxonomy of optimization problems underlying variable-depth Trie index construction All these problems are solvable in polynomial time, and their characteristics are studied Exact algorithms and heuristics for their solution are presented The algorithms are employed in CRES-an expert system for editing written narrative material, developed for the Department of the Navy CRES uses several large-to-very-large dictionary files for which Trie indexes are constructed using these algorithms Computational experience with CRES shows that search and retrieval using variable-depth Trie indexes can be as much as six times faster than pure binary search The space requirements of the Tries are reasonable The results show that the variable-depth Tries constructed according to the proposed algorithms are viable and efficient for indexing large-to-very-large files by attributes in practical applications

25 citations


Proceedings ArticleDOI
06 Feb 1989
TL;DR: The DRSAM structure captures the hashed order in consecutive storage areas so that order-preserving schemes result in performance improvements for range queries and sequential processing; and it adapts elastic buckets for the control of file growth.
Abstract: A novel class of order-preserving dynamic hashing structures is introduced and analyzed. The access method is referred to as dynamic random-sequential access method (DRSAM) and is derived from linear hashing. With respect to previous methods DRSAM presents the following characteristics: (1) the structure captures the hashed order in consecutive storage areas so that order-preserving schemes result in performance improvements for range queries and sequential processing; and (2) it adapts elastic buckets for the control of file growth. This approach outperforms the partial expansion method. The file structure is also extended with proper control mechanisms to cope with nonuniform distributions. The outcome is a multilevel trie stored as a two-level sequentially allocated file. >

12 citations


Book ChapterDOI
17 Aug 1989
TL;DR: It is established that the height of a digital trie under independent model, is asymptotically equal to 2 logαn where n is the number of words stored in the trie and α is a parameter of the probabilistic model.
Abstract: This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of it to the height of digital trees associated with this set of words. A word is defined as a random sequence of (possible infinite) symbols over a finite alphabet. A key notion of alignment matrix {C ij } n i,j=1 is introduced where C ij is the length of the longest string that is prefix of the i-th and the j-th word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of a digital trie under independent model (i.e., all words are statistically independent), is asymptotically equal to 2 logαn where n is the number of words stored in the trie and α is a parameter of the probabilistic model. Some extensions of our basic model to other digital trees such as b-tries, tries with random number of keys (Poisson model) and suffix trees (dependent keys !) are also shortly discussed.

10 citations


Book ChapterDOI
21 Aug 1989
TL;DR: Correctness of some operations defined on the structure of a conceptual trie structure is demonstrated and the expected values of those parameters which are critical for the performance of the structure are derived.
Abstract: Computer representations often lack those nice properties'' which make proofs of correctness and thorough analysis possible. Compact 0-complete trees are among the rare exceptions. They maintain a strong tie with their conceptual counterpart, a special kind of binary trie, which mirrors their properties and behavior. The ability to shift the focus of analysis, as needed, between the conceptual trie structure and the actual representation enables a more flexible and more powerful set of analytical tools. We have used this paradigm in our investigation of compact 0-complete trees, and here we present some results of that research. In particular, we demonstrate correctness of some operations defined on the structure and derive the expected values of those parameters which are critical for the performance of the structure. 4 refs., 7 figs.

8 citations


Journal ArticleDOI
TL;DR: This paper considers concurrent execution of the TH operations, and in addition to the usual search, insertion and deletion operations, also includes range queries among the concurrent operations.
Abstract: The Trie Hashing (TH), defined by Litwin, is one of the fastest access methods for dynamic and ordered files. The hashing function is defined in terms of a trie, which is basically a binary tree where a character string is associated implicitly with each node. This string is compared with a prefix of the given key in the search process, and depending on the result either the left or the right child is chosen as the next node to visit. The leaf nodes point to buckets which contain the records. The buckets are on a disk, whereas the trie itself is in the core memory. In this paper we consider concurrent execution of the TH operations. In addition to the usual search, insertion and deletion operations, we also include range queries among the concurrent operations. Our algorithm locks only leaf nodes and at most two nodes need to be locked simultaneously by any operation regardless of the number of buckets being accessed. The modification required in the basic data structure in order to accommodate concurrent operations is very minor.

7 citations


Journal ArticleDOI
TL;DR: A set of easy-to-use FORTRAN routines for building and accessing data structures of the type commonly encountered in scientific applications is introduced, which can be integrated into an existing code with no additional code requirements.

Journal ArticleDOI
J. D. Parker1
TL;DR: The sibling trie is a highly concurrent dynamic search structure that supports search, update, insertion, and deletion and is designed to minimize “hot spots” in highly concurrent shared memory environments.

Journal ArticleDOI
TL;DR: This paper presents an approach that allows different definitions of indivisible string elements for different applications, and the only information the user provides for the determination of the beginning and ends of strings is a specification of a maximum length for output strings.
Abstract: In recent years, several authors have presented algorithms that locate instances of a given string, or set of strings, within a text. Recently, authors have given less consideration to the complementary problem of processing a text to find out what strings appear in the text, without any preconceived notion of what strings might be present. A system called PATRICIA, which was developed two decades ago, is an implementation of a solution to this problem. The design of PATRICIA is very tightly bound to the assumptions that individual string elements are bits and that the user of the system can provide complete lists of starting and stopping places for strings. This paper presents an approach that drops these assumptions. Our method allows different definitions of indivisible string elements for different applications, and the only information the user provides for the determination of the beginning and ends of strings is a specification of a maximum length for output strings. This paper also describes a portable C implementation of the method, called PORTREP. The primary data structure of PORTREP is a trie represented as a ternary tree. PORTREP has a method for eliminating redundancy from the output, and it can function with a bounded number of nodes by employing a heuristic process that reuses seldom-visited nodes. Theoretical analysis and empirical studies, reported here, give confidence in the efficiency of the algorithms. PORTREP has the ability to form the basis for a variety of text-analysis applications, and this paper considers one such application, automatic document indexing.

Proceedings ArticleDOI
06 Feb 1989
TL;DR: A method is presented for indexing terms in a knowledge-base retrieval-by-unification (RBU) system that uses hashing and trie structures to reduce the number of comparisons between elements of a search condition and of an object term relation.
Abstract: A method is presented for indexing terms in a knowledge-base retrieval-by-unification (RBU) system. The term is a well-defined structure which represents knowledge using variables. RBU operations are an extension of relational database operations using unification and backtracking to retrieve terms from term relations. The term indexing proposed uses hashing and trie structures to reduce the number of comparisons between elements of a search condition and of an object term relation. Unification on a trie structure is suited to backtracking bindings of variables. The search and updating speed of an RBU prototype is measured to evaluate the indexing method. This method is effective for fast term retrieval for a large number of similar and varied form terms. The overhead for maintaining indexes in updating is low. >

Proceedings ArticleDOI
05 Jun 1989
TL;DR: An augmented binary (AB) tree architecture is proposed with a view to providing fault tolerance, an augmentation of an n-level full binary tree with n redundant nodes and 2/sup n/+3n-6 redundant links.
Abstract: An augmented binary (AB) tree architecture is proposed with a view to providing fault tolerance. This architecture is an augmentation of an n-level full binary tree with n redundant nodes and 2/sup n/+3n-6 redundant links. The AB tree can be configured into a full binary tree even when one node is faulty at each level. While functionally equivalent to the RAE-tree, the proposed AB tree has a regular topology, reduced number of maximum input-output channels per processor, and fewer wire crossovers when implemented using very large-scale integration layout. A reconfiguration algorithm, which constructs an n-level full binary tree from an n-level faulty AB tree, is given. A distributed fault diagnosis algorithm is given which runs concurrently on each nonfaulty processor, enabling each nonfaulty processor to identify all faulty processors. >