scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 1994"


Proceedings ArticleDOI
08 Mar 1994
TL;DR: This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.
Abstract: The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.

781 citations


Journal ArticleDOI
01 Oct 1994
TL;DR: A file structure to index high-dimensionality data, which are typically points in some feature space, and the design of the tree structure and the associated algorithms that handle such “varying length” feature vectors are presented.
Abstract: We propose a file structure to index high-dimensionality data, which are typically points in some feature space. The idea is to use only a few of the features, using additional features only when the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such "varying length" feature vectors. Finally, we report simulation results, comparing the proposed structure with the R*-tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, which saves up to 80% in disk accesses.

572 citations


Journal ArticleDOI
TL;DR: In this paper, a simple transformation that allows for the fast recovery of a tree from the probabilities such a tree induces on the colourations of its leaves under a simple Markov process (with unknown parameters).

290 citations


Journal ArticleDOI
TL;DR: The main focus of this paper is on the description, analysis, and application of an extremely efficient optimal estimation algorithm for this class of multiscale dynamic models evolving on dyadic trees.
Abstract: We describe a framework for modeling stochastic phenomena at multiple scales and for their efficient estimation or reconstruction given partial and/or noisy measurements which may also be at several scales. In particular multiscale signal representations lead naturally to pyramidal or tree-like data structures in which each level in the tree corresponds to a particular scale of representation. A class of multiscale dynamic models evolving on dyadic trees is introduced. The main focus of this paper is on the description, analysis, and application of an extremely efficient optimal estimation algorithm for this class of models. This algorithm consists of a fine-to-coarse filtering sweep, followed by a coarse-to-fine smoothing step, corresponding to the dyadic tree generalization of Kalman filtering and Rauch-Tung-Striebel smoothing. The Kalman filtering sweep consists of the recursive application of three steps: a measurement update step, a fine-to-coarse prediction step, and a fusion step. We illustrate the use of our methodology for the fusion of multiresolution data and for the efficient solution of "fractal regularizations" of ill-posed signal and image processing problems encountered. >

290 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the case where branch probabilities are products of nonnegative integer powers in the parameters, and their complements, 1 - θs, and show that the EM algorithm necessarily converges to a local maximum.
Abstract: Multinomial processing tree models assume that an observed behavior category can arise from one or more processing sequences represented as branches in a tree. These models form a subclass of parametric, multinomial models, and they provide a substantively motivated alternative to loglinear models. We consider the usual case where branch probabilities are products of nonnegative integer powers in the parameters, 0≤θs≤1, and their complements, 1 - θs. A version of the EM algorithm is constructed that has very strong properties. First, the E-step and the M-step are both analytic and computationally easy; therefore, a fast PC program can be constructed for obtaining MLEs for large numbers of parameters. Second, a closed form expression for the observed Fisher information matrix is obtained for the entire class. Third, it is proved that the algorithm necessarily converges to a local maximum, and this is a stronger result than for the exponential family as a whole. Fourth, we show how the algorithm can handle quite general hypothesis tests concerning restrictions on the model parameters. Fifth, we extend the algorithm to handle the Read and Cressie power divergence family of goodness-of-fit statistics. The paper includes an example to illustrate some of these results.

275 citations


Proceedings ArticleDOI
01 Jun 1994
TL;DR: A linear-time algorithm for finding SESE regions and for building the PST of arbitrary control flow graphs (including irreducible ones) is given and it is shown how to use the algorithm to find control regions in linear time.
Abstract: In this paper, we describe the program structure tree (PST), a hierarchical representation of program structure based on single entry single exit (SESE) regions of the control flow graph. We give a linear-time algorithm for finding SESE regions and for building the PST of arbitrary control flow graphs (including irreducible ones). Next, we establish a connection between SESE regions and control dependence equivalence classes, and show how to use the algorithm to find control regions in linear time. Finally, we discuss some applications of the PST. Many control flow algorithms, such as construction of Static Single Assignment form, can be speeded up by applying the algorithms in a divide-and-conquer style to each SESE region on its own. The PST is also used to speed up data flow analysis by exploiting “sparsity”. Experimental results from the Perfect Club and SPEC89 benchmarks confirm that the PST approach finds and exploits program structure.

232 citations


Patent
01 Jul 1994
TL;DR: In this article, the authors propose a hierarchical data structure for multi-dimensional information databases, in which the index nodes are arranged in an index tree structure, and when extra information inserted into the memory results in index node overflow, the index node is split and, in certain specified circumstances, an index entry will become disposed at a level higher than the hierarchical level to which it corresponds.
Abstract: A computer data storage management system includes a memory employing a hierarchical data structure comprising a plurality of nodes (root, branch and leaf), in particular a multi-dimensional information database. The branch nodes are index nodes and the leaf nodes are data nodes. The index nodes are arranged in an index tree structure. When extra information inserted into the memory results in index node overflow, the index node is split and, in certain specified circumstances, an index entry will become disposed at an index tree level higher than the hierarchical level to which it corresponds, i.e. is promoted. Whilst this makes the index tree unbalanced, it facilitates the addition of information to and the searching of such a database.

224 citations


Patent
24 Aug 1994
TL;DR: In this paper, a fact-tree is used to specify queries of the data contained in a database, and the fact tree is verified using the Query Mapper of the present invention, invoked as Fact_Tree_to_SQL_Query Module 400.
Abstract: Computerized tools for modeling database designs and specifying queries of the data contained therein. Once it is determined that an information system needs to be created, the Fact Compiler 100 of the present invention is invoked to create it. After creating the information system, the user creates a fact-tree, using the Fact_Tree_Specification Module 300, as a prelude to generating queries to the system. After creating the fact-tree, the user verifies that it is correct, using the Tree Interpreter, invoked as Fact_Tree_to_Description Module 500, of the present invention. Once the fact tree has been verified, the Query Mapper of the present invention, invoked as Fact_Tree_to_SQL_Query Module 400, is used to generate information system queries.

219 citations


Journal ArticleDOI
TL;DR: This paper proposes to model the uncertainty due to noise, e.g. the error in an object's position, by conventional covariance matrices, independent of the sensing modality, being applicable to most temporal data association problems.

202 citations


Patent
Frank H. Bowers1, Stuart K. Card1
28 Jul 1994
TL;DR: In this paper, a tree structure is created based on user specified parameters and the tree is then mapped to a static reference surface which is visually perceived as a 3D view of the tree.
Abstract: A method and apparatus for representing the results of a search of a database. The present invention provides for creating a view of database search results via a tree structure in which detail is selected and context preserved. In the present invention, the tree structure is created based on user specified parameters. These parameters represent attributes of documents stored in the database and may differ from the search parameters. The tree structure is then mapped to a static reference surface which is visually perceived as three-dimensional. The reference surface is comprised of a detail area where detail of the tree structure is displayed and a context area for displaying other portions of the tree in less detail but which conveys to the viewer a sense of context. The tree structure may be scrolled about the reference surface to bring portions of the structure into a direct detail view while retaining a context view of the overall tree.

194 citations


Journal ArticleDOI
TL;DR: This paper presents a system, called approximate-tree-by-example (ATBE), which allows inexact matching of trees, and describes the architecture of ATBE, its use and describes some aspects ofATBE implementation.
Abstract: Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology, programming compilation, and natural language processing. Many of the applications involve comparing trees or retrieving/extracting information from a repository of trees. Examples include classification of unknown patterns, analysis of newly sequenced RNA structures, semantic taxonomy for dictionary definitions, generation of interpreters for nonprocedural programming languages, and automatic error recovery and correction for programming languages. Previous systems use exact matching (or generalized regular expression matching) for tree comparison. This paper presents a system, called approximate-tree-by-example (ATBE), which allows inexact matching of trees. The ATBE system interacts with the user through a simple but powerful query language; graphical devices are provided to facilitate inputing the queries. The paper describes the architecture of ATBE, illustrates its use and describes some aspects of ATBE implementation. We also discuss the underlying algorithms and provide some sample applications. >

Proceedings Article
12 Sep 1994
TL;DR: This paper forms the content-based image indexing problem as a multi-dimensional nearest-neighbor search problem, and develops/implement an optimistic vantage-point tree algorithm that can dynamically adapt the indexed search process to the characteristics of given queries.
Abstract: We formulate the content-based image indexing problem as a multi-dimensional nearest-neighbor search problem, and develop/implement an optimistic vantage-point tree algorithm that can dynamically adapt the indexed search process to the characteristics of given queries. Based on our performance study, the system typically only needs to touch less than 20 % of the index entries for well -behaved queries, i.e., when the query images are relatively close to their nearest neighbors in the database. We also report in this paper the results of extensive performance experiments, which characterise the impacts of various configuration and workload parameters on the performance of the proposed algorithm.

Journal ArticleDOI
TL;DR: This paper addresses the problem: given a completely accurate, but complex, definition of a concept, simplify the definition, possibly at the expense of accuracy, so that the simplified definition still corresponds to the concept “sufficiently” well.
Abstract: When communicating concepts, it is often convenient or even necessary to define a concept approximately. A simple, although only approximately accurate concept definition may be more useful than a completely accurate definition which involves a lot of detail. This paper addresses the problem: given a completely accurate, but complex, definition of a concept, simplify the definition, possibly at the expense of accuracy, so that the simplified definition still corresponds to the concept “sufficiently” well. Concepts are represented by decision trees, and the method of simplification is tree pruning. Given a decision tree that accurately specifies a concept, the problem is to find a smallest pruned tree that still represents the concept within some specified accuracy. A pruning algorithm is presented that finds an optimal solution by generating a dense sequence of pruned trees, decreasing in size, such that each tree has the highest accuracy among all the possible pruned trees of the same size. An efficient implementation of the algorithm, based on dynamic programming, is presented and empirically compared with three progressive pruning algorithms using both artificial and real-world data. An interesting empirical finding is that the real-world data generally allow significantly greater simplification at equal loss of accuracy.



Journal ArticleDOI
TL;DR: A method, based on the bootstrap procedure, is proposed for the estimation of branch-length errors and confidence intervals in a phylogenetic tree for which equal rates of substitution among lineages do not necessarily hold.
Abstract: A method, based on the bootstrap procedure, is proposed for the estimation of branch-length errors and confidence intervals in a phylogenetic tree for which equal rates of substitution among lineages do not necessarily hold. The method can be used to test whether an estimated internodal distance is significantly greater than zero. In the application of the method, any estimator of genetic distances, as well as any tree reconstruction procedure (based on distance matrices), can be used. Also the method is not limited by the number of species involved in the phylogenetic tree. An example of the application of the method in the reconstruction of the phylogenetic tree for the four hominoid species—human, chimpanzee, gorilla, and orangutan—is shown.

Journal ArticleDOI
01 Jan 1994-Networks
TL;DR: An integer programming formulation of k-CARD TREE and an efficient exact separation routine for a set of generalized subtour elimination constraints are given and the polyhedral structure of the convex hull of the integer solutions is studied.
Abstract: We consider the k-CARD TREE problem, i.e., the problem of finding in a given undirected graph G a subtree with k edges, having minimum weight. Applications of this problem arise in oil-field leasing and facility layout. Although the general problem is shown to be strongly NP hard, it can be solved in polynomial time if G is itself a tree. We give an integer programming formulation of k-CARD TREE and an efficient exact separation routine for a set of generalized subtour elimination constraints. The polyhedral structure of the convex hull of the integer solutions is studied. © 1994 by John Wiley & Sons, Inc.

Patent
26 May 1994
TL;DR: In this paper, a distributed call setup and rerouting is realized in a mobile-communications network, where a connection tree is set up within the network, e.g., upon a mobile user accessing a base station.
Abstract: Distributed call setup and rerouting are realized in a mobile-communications network. A connection tree is set up within the network, e.g., upon a mobile user accessing a base station. The connection tree comprises communication routes from a fixed point in the network, the root of the tree, to each base station within a vicinity of the base station accessed by the mobile user. When the mobile user moves from one cell to another within the connection tree, the call is rerouted to another route within the connection tree.

Journal ArticleDOI
TL;DR: Comparison of the ML and LS methods shows that the ML method is much more, indeed extremely, tolerant to violation of its assumptions and also has smaller sampling errors caused by limited data.
Abstract: A proof is presented that the maximum likelihood (ML) method of phylogenetic estimation from DNA sequences (Felsenstein, 1981, J. Mol. Evol. 17:368-376) is statistically consistent despite the irregularity of the parameter space of the estimation problem. The distance matrix method using the least squares (LS) criterion is also consistent, but disconnection of two steps in the method, i.e., estimation of sequence divergence and construction of the tree topology, appears to lead to both theoretical contradictions and practical problems. Comparison of the ML and LS methods shows that the ML method is much more, indeed extremely, tolerant to violation of its assumptions and also has smaller sampling errors caused by limited data. This conclusion should be general, independent of particular models, tree topologies, and number of species in the data set. The problem of evaluating the reliability of the estimated tree topology was ex? amined. The test of positivity of the interior branch length in the estimated tree is not a test of the significance of the ML tree but may be taken as a test of the significance of the LS tree. (Phylogenetic estimation; maximum likelihood; least squares; consistency; sampling error; ro? bustness; parameter space; molecular systematics.)

01 Jan 1994
TL;DR: Experimental design analysis for use in tree improvement as discussed by the authors, Experimental design analysis to improve tree performance, experimental design analysis of trees for tree improvement, and experimental design for tree repair.
Abstract: Experimental design analysis for use in tree improvement , Experimental design analysis for use in tree improvement , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

Proceedings ArticleDOI
27 Jun 1994
TL;DR: This paper introduces two new crossover operators for genetic programming (GP): strong context- Preserving crossover and weak context-preserving crossover, both of which attempt to preserve the context in which subtrees appeared in the parent trees.
Abstract: This paper introduces two new crossover operators for genetic programming (GP): strong context-preserving crossover and weak context-preserving crossover. Contrary to the regular GP crossover, the operators presented attempt to preserve the context in which subtrees appeared in the parent trees. A simple coordinate scheme for nodes in an S-expression tree is proposed, and crossovers are only allowed between nodes with exactly or partially matching coordinates. >

Proceedings ArticleDOI
01 Jan 1994
TL;DR: This paper presents a new GST-based sequence alignment algorithm, called GESTALT, which finds all exact matches in parallel, and uses best-first search to extend them to produce alignments.
Abstract: This paper addresses applications of suffix trees and generalized suffix trees (GSTs) to biological sequence data analysis. We define a basic set of suffix trees and GST operations needed to support sequence data analysis. While those definitions are straightforward, the construction and manipulation of disk-based GST structures for large volumes of sequence data requires intricate design. GST processing is fast because the structure is content addressable, supporting efficient searches for all sequences that contain particular subsequences. Instead of laboriously searching sequences stored as arrays, we search by walking down the tree. We present a new GST-based sequence alignment algorithm, called GESTALT. GESTALT finds all exact matches in parallel, and uses best-first search to extend them to produce alignments. Our implementation experiences with applications using GST structures for sequence analysis lead us to conclude that GSTs are valuable tools for analyzing biological sequence data. >

Proceedings ArticleDOI
17 Oct 1994
TL;DR: This paper explores the use of multi-dimensional trees to provide spatial and temporal efficiencies in imaging large data sets and compares the hierarchical model to actual data values, and the second compares the pixel values of images produced by different parameter settings.
Abstract: This paper explores the use of multi-dimensional trees to provide spatial and temporal efficiencies in imaging large data sets. Each node of the tree contains a model of the data in terms of a fixed number of basis functions, a measure of the error in that model, and a measure of the importance of the data in the region covered by the node. A divide-and-conquer algorithm permits efficient computation of these quantities at all nodes of the tree. The flexible design permits various sets of basis functions, error criteria, and importance criteria to be implemented easily. Selective traversal of the tree provides images in acceptable time, by drawing nodes that cover a large volume as single objects when the approximation error and/or importance are low, and descending to finer detail otherwise. Trees over very large datasets can be pruned by the same criterion to provide data representations of acceptable size and accuracy. Compression and traversal are controlled by a user-defined combination of modeling error and data importance. For imaging decisions additional parameters are considered, including grid location, allowed time, and projected screen area. To analyse results, two evaluation metrics are used: the first compares the hierarchical model to actual data values, and the second compares the pixel values of images produced by different parameter settings.

Patent
10 Aug 1994
TL;DR: In this paper, a fixed-size Ziv-Lempel parse-tree is adapted to database characteristics in one of two alternate ways: first, the parse tree is overbuilt substantially and then pruned back to a static size by eliminating the least recently used (LRU) nodes having the lowest use count.
Abstract: A system for creating a static data compression dictionary adapted to a hardware-based data compression architecture. A static Ziv-Lempel dictionary is created and stored in memory for use in compressing database records. No data compression occurs during dictionary construction. A fixed-size Ziv-Lempel parse-tree is adapted to database characteristics in one of two alternate ways. First, the parse-tree is overbuilt substantially and then pruned back to a static size by eliminating the least recently used (LRU) nodes having the lowest use count. Alternatively, the parse-tree is built to a static size and thereafter selected nodes are replaced with new nodes upon database sampling. This node recycling procedure chooses the least-useful nodes for replacement according to a use count and LRU strategy while exhausting the database sample. The pruned Ziv-Lempel parse-tree is then transformed to a static dictionary configuration and stored in memory for use in a hardware-based database compression procedure. Completion of the static dictionary before starting data compression eliminates the initial compression inefficiencies well-known for the Ziv-Lempel procedure. The parse-tree construction is enhanced by initializing the tree with NULL and DEFAULT sequences from database definitions before examining any data.


Journal ArticleDOI
TL;DR: This work shows that, given suitable restrictions on the rate distribution, the true tree is uniquely identified by its sequence spectrum, and exploits a novel theorem on the action of polynomials with non-negative coefficients on sequences.
Abstract: For a sequence of colors independently evolving on a tree under a simple Markov model, we consider conditions under which the tree can be uniquely recovered from the “sequence spectrum”—th...

Journal ArticleDOI
TL;DR: The VTA-Method (VISUAL TREE ASSESSMENT) consists in three steps as mentioned in this paper : visual control of the tree in order to find external symptoms of internal defects, if symptoms are detected the related defect has to be confirmed and measured by deeper inspection.
Abstract: Summary The VTA-Method (VISUAL TREE ASSESSMENT) consists in three steps. Visual control of the tree in order to find external symptoms of internal defects. If the constant stress distribution in a tree is disturbed due to the presence of a defect the tree attaches more wood at the overloaded spot. So bulges or dents are formed near decayed hollows and ribs near cracks etc. If symptoms are detected the related defect has to be confirmed and measured by deeper inspection. This can be done by measuring the velocity of a sound wave traveling through the cross-section and by drilling methods. The strength of the remaining healthy wood is determined now with the FRACTOMETER, a wood testing device in pocket-size. If defect size and wood quality is known failure criteria are used to decide whether the tree is dangerous or not. VTA is nondestructive for healthy trees. Only if there is a reason for increasing concern the tree is subject of deeper inspection but also in this case the wounding of the tree has to be k...

Patent
11 Oct 1994
TL;DR: A tree manager provided by the present invention stores data such as pointers, variable length data records, other B-trees, and directories, in a multidimensional B-tree (MDB-tree) as mentioned in this paper.
Abstract: A computer method and storage structure for storing and accessing multidimensional data is provided A tree manager provided by the present invention stores data such as pointers, variable length data records, other B-trees, and directories, in a Multidimensional B-tree (MDB-tree) An MDB-tree has an imbedded "parent-child" structure which allows subtrees to be stored within nodes The subtrees contain subnodes, which, in turn, may contain subtrees The nodes are indexed by a primary key value while the subnodes in a subtree are indexed by secondary key values Nodes of a MDB-tree contain a key value table, a subnode table, and a data area When the tree manager attempts to store a unit of data on a page and the unit of data is too large for the page, the tree manager attempts to split a node currently stored on the page (or the unit of data being inserted) into a subnode and a subtree The subtree is then stored on a new page If the unit of data cannot be split into a subnode and a subtree, then one or more of the node currently stored on the page are moved to a new page

Journal ArticleDOI
TL;DR: This work improves an O(nmo.75polylog(m) step algorithm for tree pattern matching by designing a simple O( n6 polylog( m)) algorithm.
Abstract: Recently R. Kosaraju gave an O(nmo.75polylog(m)) step algorithm for tree pattern matching. We improve this result by designing a simple O(n6 polylog(m)) algorithm.

Journal ArticleDOI
TL;DR: Open-form expressions for optimal processing time are derived for a general case of networks with different processor speeds and different communication link speeds and analytically proves a number of significant results that in earlier studies were only conjectured from computational results.
Abstract: The problem of obtaining optimal processing time in a distributed computing system consisting of (N+1) processors and N communication links, arranged in a single-level tree architecture, is considered. It is shown that optimality can be achieved through a hierarchy of steps involving optimal load distribution, load sequencing, and processor-link arrangement. Closed-form expressions for optimal processing time is derived for a general case of networks with different processor speeds and different communication link speeds. Using these closed-form expressions, the paper analytically proves a number of significant results that in earlier studies were only conjectured from computational results. In addition, it also extends these results to a more general framework. The above analysis is carried out for the cases in which the root processor may or may not be equipped with a front-end processor. Illustrative examples are given for all cases considered. >