scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 1997"


Book
01 Jan 1997
TL;DR: The speech recognition problem hidden Markov models the acoustic model basic language modelling the Viterbi search hypothesis search on a tree and the fast match elements of information theory.
Abstract: The speech recognition problem hidden Markov models the acoustic model basic language modelling the Viterbi search hypothesis search on a tree and the fast match elements of information theory the complexity of tasks - the quality of language models the expectation - maximization algorithm and its consequences decision trees and tree language models phonetics from orthography - spelling-to-base from mappings triphones and allophones maximum entropy probability estimation and language models three applications of maximum entropy estimation to language modelling estimation of probabilities from counts and the Back-Off method.

2,153 citations


Book
01 Jan 1997
TL;DR: The goal of this book is to provide a textbook which presents the basics ofTree automata and several variants of tree automata which have been devised for applications in the aforementioned domains.
Abstract: CONTENTS 7 Acknowledgments Many people gave substantial suggestions to improve the contents of this book. These are, in alphabetic order, Introduction During the past few years, several of us have been asked many times about references on finite tree automata. On one hand, this is the witness of the liveness of this field. On the other hand, it was difficult to answer. Besides several excellent survey chapters on more specific topics, there is only one monograph devoted to tree automata by Gécseg and Steinby. Unfortunately, it is now impossible to find a copy of it and a lot of work has been done on tree automata since the publication of this book. Actually using tree automata has proved to be a powerful approach to simplify and extend previously known results, and also to find new results. For instance recent works use tree automata for application in abstract interpretation using set constraints, rewriting, automated theorem proving and program verification, databases and XML schema languages. Tree automata have been designed a long time ago in the context of circuit verification. Many famous researchers contributed to this school which was headed by A. Church in the late 50's and the early 60's: B. Trakhtenbrot, Many new ideas came out of this program. For instance the connections between automata and logic. Tree automata also appeared first in this framework, following the work of Doner, Thatcher and Wright. In the 70's many new results were established concerning tree automata, which lose a bit their connections with the applications and were studied for their own. In particular, a problem was the very high complexity of decision procedures for the monadic second order logic. Applications of tree automata to program verification revived in the 80's, after the relative failure of automated deduction in this field. It is possible to verify temporal logic formulas (which are particular Monadic Second Order Formulas) on simpler (small) programs. Automata, and in particular tree automata, also appeared as an approximation of programs on which fully automated tools can be used. New results were obtained connecting properties of programs or type systems or rewrite systems with automata. Our goal is to fill in the existing gap and to provide a textbook which presents the basics of tree automata and several variants of tree automata which have been devised for applications in the aforementioned domains. We shall discuss only finite tree automata, and the …

1,492 citations


Journal ArticleDOI
TL;DR: This work presents several types of decision tree classification algorithms and shows that decision trees have several advantages for remote sensing applications by virtue of their relatively simple, explicit, and intuitive classification structure.

1,419 citations


Journal ArticleDOI
TL;DR: What determines the height to which a tree will grow in a particular region and climate is examined and mechanisms for growth including respiration hypothesis, nutrient limitation hypothesis, maturation hypothesis and the hydraulic limitation hypothesis are examined.
Abstract: Examines what determines the height to which a tree will grow in a particular region and climate. The relationship between maximum tree height and the speed at which the tree grew when young; Mechanisms for growth including the respiration hypothesis, the nutrient limitation hypothesis, the maturation hypothesis and the hydraulic limitation hypothesis; Details about each hypothesis; Evidence for hydraulic limitation; Conclusions.

1,065 citations


01 Jan 1997
TL;DR: This article presents an algorithm called QUEST that has negligible bias, which shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning.
Abstract: Classification trees based on exhaustive search algorithms tend to be bi- ased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search. A classification tree is a rule for predicting the class of an object from the values of its predictor variables. The tree is constructed by recursively parti- tioning a learning sample of data in which the class label and the values of the predictor variables for each case are known. Each partition is represented by a node in the tree. Two approaches to split selection have been proposed in the statistical liter- ature. The first and more popular approach examines all possible binary splits of the data along each predictor variable to select the split that most reduces some measure of node impurity. It is used, for example, by the THAID (Morgan and Sonquist (1963), Morgan and Messenger (1973)) and CART (Breiman, Friedman, Olshen and Stone (1984)) algorithms. If X is an ordered variable, this approach searches over all possible values c for splits of the form X ≤ c. (1)

1,015 citations


Proceedings ArticleDOI
07 Apr 1997
TL;DR: The results from an extensive comparison study of three R-tree packing algorithms are presented: the Hilbert and nearest-X packing algorithms, and an algorithm which is very simple to implement, called the STR (Sort-Tile-Recursive) algorithm.
Abstract: Presents the results from an extensive comparison study of three R-tree packing algorithms: the Hilbert and nearest-X packing algorithms, and an algorithm which is very simple to implement, called the STR (Sort-Tile-Recursive) algorithm The algorithms are evaluated using both synthetic and actual data from various application domains including VLSI design, GIS (Tiger files), and computational fluid dynamics Our studies also consider the impact that various degrees of buffering have on query performance Experimental results indicate that none of the algorithms as best for all types of data In general, our new algorithm requires up to 50% fewer disk accesses than the best previously proposed algorithm for point and region queries on uniformly distributed or mildly skewed point and region data, and approximately the same for highly skewed point and region data

501 citations



Journal ArticleDOI
TL;DR: Tree shape may help us detect mass extinctions and adaptive radiations, measure continuos variation in speciation and extinction rates, and associate changes in these rates with ecological or biogeographical causes, and extend well beyond the study of macroevolution.
Abstract: Inferences about macroevolutionary processes have traditionally depended solely on the fossil record, but such inferences can be strengthened by also considering the shapes of the phylogenetic trees that link extant taxa. The realization that phylogenies reflect macroevolutionary processes has led to a growing literature of theoretical and comparative studies of tree shape. Two aspects of tree shape are particularly important: tree balance and the distribution of branch lenghts. We examine and evaluate recent developments in and connections between these two aspects, and suggest directions for future research. Studies of tree shape promise useful and powerful tests of macroevolutionary hypotheses. With appropriate further research, tree shape may help us detect mass extinctions and adaptive radiations, measure continuos variation in speciation and extinction rates, and associate changes in these rates with ecological or biogeographical causes. The usefulness of tree shape extends well beyond the study of ...

453 citations


Book
01 Apr 1997

447 citations


Journal ArticleDOI
TL;DR: The use of reconciled trees to reconstruct the history of a gene tree with respect to a species tree is described and heuristic searches to find the species tree which yields the reconciled tree with the lowest cost are described.

417 citations


Proceedings ArticleDOI
01 Jun 1997
TL;DR: This paper introduces a distance based index structure called multi-vantage point (mvp) tree for similarity queries on high-dimensional metric spaces and shows that mvp-tree outperforms the vp-tree 20% to 80% for varying query ranges and different distance distributions.
Abstract: In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the data domain is high dimensional, or the distance function used to compute distances between data objects is non-Euclidean. In this paper, we introduce a distance based index structure called multi-vantage point (mvp) tree for similarity queries on high-dimensional metric spaces. The mvp-tree uses more than one vantage point to partition the space into spherical cuts at each level. It also utilizes the pre-computed (at construction time) distances between the data points and the vantage points. We have done experiments to compare mvp-trees with vp-trees which have a similar partitioning strategy, but use only one vantage point at each level, and do not make use of the pre-computed distances. Empirical studies show that mvp-tree outperforms the vp-tree 20% to 80% for varying query ranges and different distance distributions.

Proceedings ArticleDOI
20 Jun 1997
TL;DR: ImageRover is a search-by-image-content navigation tool for the World Wide Web that utilizes a distributed fleet of WWW robots to gather images expediently and employs a novel relevance feedback algorithm that selects the distance metrics that are appropriate for a particular query.
Abstract: ImageRover is a search-by-image-content navigation tool for the World Wide Web (WWW). To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics that are appropriate for a particular query

Journal ArticleDOI
TL;DR: This paper describes the use of decision tree and rule induction in data-mining applications and presents a synopsis of some major state-of-the-art tree andrule mining methodologies, as well as some recent advances.

Journal ArticleDOI
TL;DR: In this paper, the authors study the problem of constructing multicast trees to meet the quality of service requirements of real-time interactive applications operating in high-speed packet-switched environments and present a heuristic that demonstrates good average case behavior in terms of the maximum interdestination delay variation.
Abstract: We study the problem or constructing multicast trees to meet the quality of service requirements of real-time interactive applications operating in high-speed packet-switched environments. In particular, we assume that multicast communication depends on: (1) bounded delay along the paths from the source to each destination and (2) bounded variation among the delays along these paths. We first establish that the problem of determining such a constrained tree is NP-complete. We then present a heuristic that demonstrates good average case behavior in terms of the maximum interdestination delay variation. The heuristic achieves its best performance under conditions typical of multicast scenarios in high speed networks. We also show that it is possible to dynamically reorganize the initial tree in response to changes in the destination set, in a way that is minimally disruptive to the multicast session.

Journal ArticleDOI
TL;DR: This work presents a framework that organizes the approaches to tree simplification and summarizes and critique the approaches within this framework, and discusses the application of tree induction algorithms to case retrieval in case-based reasoning systems.
Abstract: Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy, and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree induction algorithms to case retrieval in case-based reasoning systems.

Journal ArticleDOI
01 Sep 1997
TL;DR: The Zipper is Huet's nifty name for a nifty data structure which fulfills this need of representing a tree together with a subtree that is the focus of attention, where that focus may move left, right, up or down the tree.
Abstract: Almost every programmer has faced the problem of representing a tree together with a subtree that is the focus of attention, where that focus may move left, right, up or down the tree. The Zipper is Huet's nifty name for a nifty data structure which fulfills this need. I wish I had known of it when I faced this task, because the solution I came up with was not quite so efficient or elegant as the Zipper.

Journal ArticleDOI
TL;DR: FTT is applied to the integration of the Euler equations of fluid dynamics and an adaptive-mesh time-stepping algorithm is described in which different time steps are used at different levels of the tree.
Abstract: A fully threaded tree (FTT) for adaptive refinement of regular meshes is described. By using a tree threaded at all levels, tree traversals for finding nearest neighbors are avoided. All operations on a tree including tree modifications are O(N), where N is a number of cells, and are performed in parallel. An efficient implementation of the tree is described that requires 2N words of memory. A filtering algorithm for removing high-frequency noise during mesh refinement is described. A FTT can be used in various numerical applications. In this paper, it is applied to the integration of the Euler equations of fluid dynamics. An adaptive-mesh time stepping algorithm is described in which different time steps are used at different levels of the tree. Time stepping and mesh refinement are interleaved to avoid extensive buffer layers of fine mesh which were otherwise required ahead of moving shocks. Test examples are presented, and the FTT performance is evaluated. The three-dimensional simulation of the interaction of a shock wave and a spherical bubble is carried out that shows the development of azimuthal perturbations on the bubble surface.

Book ChapterDOI
01 Jan 1997
TL;DR: The reconstruction problem can be briefly stated as finding the rooted evolutionary tree best fitting the current DNA data and once the best tree is identified, it is also of interest to estimate the branch lengths of the tree.
Abstract: Inferring the evolutionary relationships among related taxa (species, genera, families, or higher groupings) is one of the most fascinating problems of molecular genetics [11, 13, 14]. It is now relatively simple to sequence genes and to compare the results from several contemporary taxa. In the current chapter we will assume that the chore of aligning the DNA sequences from these taxa has been successfully accomplished. The taxa are then arranged in an evolutionary tree (or phytogeny) depicting how taxa diverge from common ancestors. A single ancestral taxon roots the binary tree describing the evolution of the contemporary taxa. The reconstruction problem can be briefly stated as finding the rooted evolutionary tree best fitting the current DNA data. Once the best tree is identified, it is also of interest to estimate the branch lengths of the tree. These tell us something about the pace of evolution. For the sake of brevity, we will focus on the problem of finding the best tree.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a tree-based approach to connect a heat generating volume to a point heat sink by using a finite amount of high-conductivity material that can be distributed through the volume.
Abstract: This paper addresses the fundamental problem of how to connect a heat generating volume to a point heat sink by using a finite amount of high-conductivity material that can be distributed through the volume. The problem is one of optimizing the access (or minimizing the thermal resistance) between a finite-size volume and one point. The solution is constructed by covering the volume with a sequence of building blocks, which proceeds toward larger sizes (assemblies), hence, the “constructal” name for this approach. Optimized numerically at each stage are geometric features such as the overall shape of the building block, its number of constituents, and the internal distribution of high-conductivity inserts. It is shown that in the optimal design, the high-conductivity material has a distribution with the shape of a tree. Every aspect of the tree architecture is deterministic: the shapes of the largest assembly and all its constituents, the number of branches at each level of assembly, the relative position of building blocks in each assembly, and the relative thicknesses of successive branches. The finer, innermost details of the tree architecture (e.g., the branching angle) have a negligible effect on the overall thermal resistance. The main conclusion is that the structure, working mechanism, and minimal resistance of the tree network can be obtained deterministically, and that the constrained optimization of access routes accounts for the macroscopic structure in nature.

Proceedings ArticleDOI
01 Jun 1997
TL;DR: This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read isolation outside the context of B-trees.
Abstract: This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read. The algorithms are developed in the context of the Generalized Search Tree (GiST) data structure, an index structure supporting an extensible set of queries and data types. Although developed in a GiST context, the algorithms are generally applicable to many tree-based access methods. The concurrency control protocol is based on an extension of the link technique originally developed for B-trees, and completely avoids holding node locks during I/Os. Repeatable read isolation is achieved with a novel combination of predicate locks and two-phase locking of data records. To our knowledge, this is the first time that isolation issues have been addressed outside the context of B-trees. A discussion of the fundamental structural differences between B-trees and more general tree structures like GiSTs explains why the algorithms developed here deviate from their B-tree counterparts. An implementation of GiSTs emulating B-trees in DB2/Common Server is underway.

Proceedings Article
01 Aug 1997
TL;DR: In this article, a non-uniform partition across all variables as opposed to uniform partition of each variable separately reduces the size of the data structures needed to represent a continuous function.
Abstract: We consider probabilistic inference in general hybrid networks, which include continuous and discrete variables in an arbitrary topology. We reexamine the question of variable discretization in a hybrid network aiming at minimizing the information loss induced by the discretization. We show that a nonuniform partition across all variables as opposed to uniform partition of each variable separately reduces the size of the data structures needed to represent a continuous function. We also provide a simple but efficient procedure for nonuniform partition. To represent a nonuniform discretization in the computer memory, we introduce a new data structure, which we call a Binary Split Partition (BSP) tree. We show that BSP trees can be an exponential factor smaller than the data structures in the standard uniform discretization in multiple dimensions and show how the BSP trees can be used in the standard join tree algorithm. We show that the accuracy of the inference process can be significantly improved by adjusting discretization with evidence. We construct an erative anytime algorithm that gradually improves the quality of the discretization and the accuracy of the answer on a query. We provide empirical evidence that the algorithm converges.

Journal ArticleDOI
TL;DR: In this article, it is shown that the electrical treeing phenomenon is the result of a deterministic breakdown mechanism operating in a chaotic regime at fields lower than those required for runaway breakdown.
Abstract: Two alternative theoretical approaches to electrical tree propagation exist. Stochastic models attribute tree structures to random probabilistic factors, whereas in the discharge-avalanche model mechanism-driven field fluctuations are responsible. Here we review the predictions of these approaches in the light of the available experimental evidence. It is shown that both models give the fractal structures and the form of structure distribution observed experimentally. The width of the distribution functions predicted are, however, less than those found experimentally. The quantitative formulation available to the physical model also enables it to reproduce several other features of tree propagation such as voltage dependence, growth laws, and discharge behavior patterns. This is not possible in the stochastic approach without mechanistic assumptions which are difficult to relate to the stochastic process. The connection between the discharge-avalanche model and deterministic chaos is explored. Experimental evidence is presented supporting the contention that the electrical treeing phenomenon is the result of a deterministic breakdown mechanism operating in a chaotic regime at fields lower than those required for runaway breakdown. Space-charge deposition and re-arrangement is proposed as the physical origin of the chaotic field fluctuations. Tree shapes are shown to be related to the variation in the fluctuation range available as the tree grows in accord with the predictions of the discharge-avalanche model.

Proceedings Article
Luís Torgo1
08 Jul 1997
TL;DR: This study indicates that by integrating regression trees with other regression approaches the authors are able to overcome the limitations of individual methods both in terms of accuracy as well as in computational efficiency.
Abstract: This paper presents a study about functional models for regression tree leaves. We evaluate experimentally several alternatives to the averages commonly used in regression trees. We have implemented a regression tree learner (HTL) that is able to use several alternative models in the tree leaves. We study the effect on accuracy and the computational cost of these alternatives. The experiments carried out on 11 data sets revealed that it is possible to significantly outperform the “naive” averages of regression trees. Among the four alternative models that we evaluated, kernel regressors were usually the best in terms of accuracy. Our study also indicates that by integrating regression trees with other regression approaches we are able to overcome the limitations of individual methods both in terms of accuracy as well as in computational efficiency.

01 Jan 1997
TL;DR: This chapter discusses the structure of the hierarchical tree method, the optimization of hierarchical tree codes, and the fast multipole method, which is applied to periodic boundary conditions.
Abstract: 1. Introduction 2. Basic structure of the hierarchical tree method 3. Open boundary problems 4. Optimisation of hierarchical tree codes 5. Periodic boundary conditions 6. Periodic boundary problems 7. The fast multipole method Appendices.

Journal ArticleDOI
TL;DR: A new systematic procedure is presented to exploit the structure of the uncertainty to decompose a multidimensional polynomial matrix into sums and products of simple factors for which minimal linear fractional representations can be obtained.

Journal ArticleDOI
TL;DR: A new incremental character optimization algorithm is described which is exact, correct, and comparable in speed to current methods.

Journal ArticleDOI
TL;DR: This work states that similarity queries on feature vectors have been widely used to perform content-based retrieval of images, but to apply this technique to large databases, it is required to develop multidimens to handle large amounts of data.
Abstract: Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimens...

Patent
George Politis1
21 May 1997
TL;DR: In this article, a method, apparatus and system for optimizing an expression tree (101, 902, 1102) for compositing an image is presented, where each node is either a graphical element (102, 104) or image compositing operator (103, 104).
Abstract: The present invention relates to a method, apparatus and system for optimizing an expression tree (101,902,1102) for compositing an image. Such an expression tree (101,902, 1102) can comprise at least two nodes. Each node is either a graphical element (102,104) or image compositing operator ((103,104) and has a region of the image represented by the node (102,103,104). In the method, for at least one node in the tree, several steps are carried out. The region represented by the node (103,104) is compared to a region representation data structure, which is preferably a quadtree representation, corresponding to one or more regions represented by at least one other node. A determination is then made if the region represented by the node (102,103,104) is totally or partially obscured by the one or more regions. If the region represented by the node is at least partially or totally obscured, the expression tree (101,902,1102) is modified. Modifying the expression tree (101,902,1102) involves applying a clipping operator (58,59) to the node if the region represented by the node is partially obscured. If the node is totally obscured, either removing the node if the node is a graphical element (102, 104) or applying a predetermined set of node replacement rules in accordance with the image compositing operator if the node (103) is a image compositing operator.

Journal ArticleDOI
TL;DR: Simulation results in which use of a false model in the maximum-likelihood method recovers the correct tree with higher probabilities than use of the true model highlight the complexity of phylogeny reconstruction and the need for more theoretical work on statistical methods for this type of estimation problem.
Abstract: As phylogenetic analyses find widespread use in various fields of biology, studies on methods of phylogeny reconstruction are becoming ever more important. Although tree reconstruction has been identified as a statistical estimation problem since the pioneering work of Cavalli-Sforza and Edwards (1967), the complexity of the problem does not seem to be well recognized. In this note I report simulation results in which use of a false model in the maximum-likelihood method recovers the correct tree with higher probabilities than use of the true model. Indeed the false model on average recovers the correct tree more often than the true model, and the difference is not due to small sample sizes or restricted to the case of four taxa. The results highlight the complexity of phylogeny reconstruction and the need for more theoretical work on statistical methods for this type of estimation problem. A simulation study was carried out to examine the performance of phylogeny reconstruction by the maximum-likelihood method when either the correct evolutionary model (referred to as the True method) or a wrong model (the False method) is assumed. The correct model used for simulating data, represented as “JC+G,” uses the substitution model of Jukes and Cantor (1969) in combination with a gamma model of rate variation among sites in which the rates at different sites are multiplied by independent gamma variates normalized to have mean one (Yang 1994). The gamma parameter (x is inversely related to the extent of among-site rate variation and is fixed at (x = 0.2. Probabilities of observing all site patterns were calculated under the JC+G model, and the observed numbers of site patterns, which constitute the simulated data, were sampled from these probabilities (Yang 1996). Five sets of branch lengths for a tree of four taxa and 12 sequence lengths were used (fig. 1); for each tree and sequence length combination, 5,000 data sets were generated. Each simulated data set was analyzed by the two methods to recover the correct tree (Felsenstein 198 1; Yang 1994). Although (x could be estimated from the data under the JC+G model, the correct value (0.2) was used in the True method so that the same number of parameters was estimated in both methods. The only difference between the two methods is that True uses the correct value of CY (0.2) while False uses a false value (a). Twice the log-likelihood difference between the two models averaged from 10.0 for tree D to 27.2 for tree A in the simulation for N = 100 nucleotides and was greater for

Patent
25 Nov 1997
TL;DR: The preferred embodiment of the present invention provides a method and apparatus for storing and accessing data as discussed by the authors, which provides the ability to perform fast searching using tree database search techniques and to search all user data fields using array search techniques without requiring the user data be duplicated and stored in two separate databases.
Abstract: The preferred embodiment of the present invention provides a method and apparatus for storing and accessing data. The preferred embodiment hybrid tree-array database provides the ability to perform fast searching using tree database search techniques and the ability to search all user data fields using array search techniques. In particular, fast key searching as a typical tree database and sequential array searching of all data fields as a typical array database are provided in a single database, without requiring the user data be duplicated and stored in two separate databases. Thus, the preferred embodiment provides searching flexibility without the excessive storage requirements and complexity inherent in managing separate array and tree databases. The preferred embodiment also provides the advantage of allowing individual users of the database to search the data using either tree or array search techniques without requiring any detailed knowledge of the dual nature of the hybrid tree-array database.