Showing papers on "Tree (data structure) published in 2002"

PDF

Open Access

Proceedings Article•DOI•

Structural joins: a primitive for efficient XML query pattern matching

[...]

Shurug Al-Khalifa¹, H. V. Jagadish, Nick Koudas¹, Jignesh M. Patel¹, Divesh Srivastava¹, Yuqing Wu¹ - Show less +2 more•Institutions (1)

University of Michigan¹

07 Aug 2002

TL;DR: It is shown that, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse, and this behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack- tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-MERge algorithms do not have the same guarantee.

...read moreread less

Abstract: XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.

...read moreread less

895 citations

Journal Article•DOI•

A Hierarchical O(N) Force Calculation Algorithm

[...]

Walter Dehnen¹•Institutions (1)

Max Planck Society¹

10 Jun 2002-Journal of Computational Physics

TL;DR: A novel code for the approximate computation of long-range forces between N mutually interacting bodies based on a hierarchical tree of cubic cells and features mutual cell–cell interactions which are calculated via a Cartesian Taylor expansion in a symmetric way, such that total momentum is conserved.

...read moreread less

769 citations

Proceedings Article•DOI•

Ownership types for safe programming: preventing data races and deadlocks

[...]

Chandrasekhar Boyapati¹, Robert K. K. Lee¹, Martin Rinard¹•Institutions (1)

Massachusetts Institute of Technology¹

04 Nov 2002

TL;DR: A new static type system for multithreaded programs is presented; well-typed programs in the system are guaranteed to be free of data races and deadlocks.

...read moreread less

Abstract: This paper presents a new static type system for multithreaded programs; well-typed programs in our system are guaranteed to be free of data races and deadlocks. Our type system allows programmers to partition the locks into a fixed number of equivalence classes and specify a partial order among the equivalence classes. The type checker then statically verifies that whenever a thread holds more than one lock, the thread acquires the locks in the descending order.Our system also allows programmers to use recursive tree-based data structures to describe the partial order. For example, programmers can specify that nodes in a tree must be locked in the tree order. Our system allows mutations to the data structure that change the partial order at runtime. The type checker statically verifies that the mutations do not introduce cycles in the partial order, and that the changing of the partial order does not lead to deadlocks. We do not know of any other sound static system for preventing deadlocks that allows changes to the partial order at runtime.Our system uses a variant of ownership types to prevent data races and deadlocks. Ownership types provide a statically enforceable way of specifying object encapsulation. Ownership types are useful for preventing data races and deadlocks because the lock that protects an object can also protect its encapsulated objects. This paper describes how to use our type system to statically enforce object encapsulation as well as prevent data races and deadlocks. The paper also contains a detailed discussion of different ownership type systems and the encapsulation guarantees they provide.

...read moreread less

634 citations

Journal Article•DOI•

Towards an operational MODIS continuous field of percent tree cover algorithm: examples using AVHRR and MODIS data

[...]

Matthew C. Hansen¹, Ruth DeFries¹, John R. Townshend¹, R. A. Sohlberg¹, C. Dimiceli¹, Mark L. Carroll¹ - Show less +2 more•Institutions (1)

University of Maryland, College Park¹

01 Nov 2002-Remote Sensing of Environment

TL;DR: In this paper, a regression tree algorithm is used to predict the dependent variable of tree cover based on signatures from the multitemporal metrics and a root mean square error (rmse) of 9.06% tree cover was found from the global training data set.

...read moreread less

524 citations

Journal Article•DOI•

An instance-weighting method to induce cost-sensitive trees

[...]

Kai Ming Ting¹•Institutions (1)

Monash University¹

01 May 2002-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The algorithm incorporating the instance-weighting method is found to be better than the original algorithm in in of total misclassification costs, the number of high cost errors, and tree size two-class data sets.

...read moreread less

Abstract: We introduce an instance-weighting method to induce cost-sensitive trees. It is a generalization of the standard tree induction process where only the initial instance weights determine the type of tree to be induced-minimum error trees or minimum high cost error trees. We demonstrate that it can be easily adapted to an existing tree learning algorithm. Previous research provides insufficient evidence to support the idea that the greedy divide-and-conquer algorithm can effectively induce a truly cost-sensitive tree directly from the training data. We provide this empirical evidence in this paper. The algorithm incorporating the instance-weighting method is found to be better than the original algorithm in in of total misclassification costs, the number of high cost errors, and tree size two-class data sets. The instance-weighting method is simpler and more effective in implementation than a previous method based on altered priors.

...read moreread less

459 citations

Journal Article•DOI•

What do we measure by co-authorships?

[...]

Grit Laudel

01 Apr 2002-Research Evaluation

TL;DR: This paper identified six types of research collaborations with distinct patterns of rewards and found that about half of the collaborations are invisible in formal communication channels because they are not rewarded; and showed that about one third of collaborations are rewarded only by acknowledgements.

...read moreread less

Abstract: Interviews with scientists about the content and reward of collaborations, and classification of contributions of co-authors and scientists cited in acknowledgements, identified six types of research collaborations with distinct patterns of rewards; showed that about half of the collaborations are invisible in formal communication channels because they are not rewarded; and showed that about one third of the collaborations are rewarded only by acknowledgements Copyright , Beech Tree Publishing

...read moreread less

403 citations

Journal Article•DOI•

Automated tree crown detection and delineation in high-resolution digital camera imagery of coniferous forest regeneration

[...]

D. A. Pouliot¹, Douglas J. King¹, F. W. Bell², Douglas G. Pitt³•Institutions (3)

Carleton University¹, Ontario Ministry of Natural Resources², Canadian Forest Service³

01 Oct 2002-Remote Sensing of Environment

TL;DR: A tree detection–delineation algorithm designed specifically for high-resolution digital imagery of 6-year-old trees is presented and rigorously evaluated, showing that tree-detection accuracy was better than that using commonly applied fixed-window local maximum filters and crown-diameter accuracy was more sensitive to image resolution.

...read moreread less

387 citations

Journal Article•DOI•

Succinct Representation of Balanced Parentheses and Static Trees

[...]

J. Ian Munro, Venkatesh Raman

01 Mar 2002-SIAM Journal on Computing

TL;DR: This work considers the implementation of abstract data types for the static objects: binary tree, rooted ordered tree, and a balanced sequence of parentheses to produce a succinct representation of planar graphs in which one can test adjacency in constant time.

...read moreread less

Abstract: We consider the implementation of abstract data types for the static objects: binary tree, rooted ordered tree, and a balanced sequence of parentheses. Our representations use an amount of space within a lower order term of the information theoretic minimum and support, in constant time, a richer set of navigational operations than has previously been considered in similar work. In the case of binary trees, for instance, we can move from a node to its left or right child or to the parent in constant time while retaining knowledge of the size of the subtree at which we are positioned. The approach is applied to produce a succinct representation of planar graphs in which one can test adjacency in constant time.

...read moreread less

376 citations

Book Chapter•DOI•

Implicit Probabilistic Models of Human Motion for Synthesis and Tracking

[...]

Hedvig Sidenbladh, Michael J. Black¹, Leonid Sigal¹•Institutions (1)

Brown University¹

28 May 2002

TL;DR: A low dimensional linear model of human motion is learned that is used to structure the example motion database into a binary tree and an approximate probabilistic tree search method exploits the coefficients of this low-dimensional representation and runs in sub-linear time.

...read moreread less

Abstract: This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution. These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set; efficiency is particularly important for tracking. Towards that end, we learn a low dimensional linear model of human motion that is used to structure the example motion database into a binary tree. An approximate probabilistic tree search method exploits the coefficients of this low-dimensional representation and runs in sub-linear time. This probabilistic tree search returns a particular sample human motion with probability approximating the true distribution of human motions in the database. This sampling method is suitable for use with particle filtering techniques and is applied to articulated 3D tracking of humans within a Bayesian framework. Successful tracking results are presented, along with examples of synthesizing human motion using the model.

...read moreread less

374 citations

Patent•

Creation of structured data from plain text

[...]

Alexander Saldanha, Patrick C. McGeer, Luca P. Carloni

07 Jan 2002

TL;DR: In this paper, a method and system for converting plain text into structured data is presented, which can be used both for populating a database and/or for retrieving data from a database based on a query.

...read moreread less

Abstract: A method and system for converting plain text into structured data. Parse trees for the plain text are generated based on the grammar of a natural language, the parse trees are mapped on to instance trees generated based on an application-specific model. The best map is chosen, and the instance tree is passing to an application for execution. The method and system can be used both for populating a database and/or for retrieving data from a database based on a query.

...read moreread less

366 citations

Estimating plot-level tree heights with lidar : local filtering with a canopy-height based variable window size

[...]

Sc Popescu

01 Jan 2002

TL;DR: In this paper, the authors developed and tested algorithms to estimate plot level tree height using LIDAR data, and investigated how ground measurements can help in the processing phase of lidar data for tree height assessment.

...read moreread less

Abstract: In recent years, the use of airborne lidar technology to measure forest biophysical characteristics has been rapidly increasing. This paper discusses processing algorithms for deriving the terrain model and estimating tree heights by using a multiple return, high � / density, small-footprint lidar data set. The lidar data were acquired over deciduous, coniferous, and mixed stands of varying age classes and settings typical of the southeastern US. The specific objectives were: (1) to develop and test algorithms to estimate plot level tree height using lidar data, and (2) to investigate how ground measurements can help in the processing phase of lidar data for tree height assessment. The study area is located in the Piedmont physiographic province of Virginia, USA and includes a portion of the Appomattox-Buckingham State Forest (37825?N, 78841?W). Two lidar processing algorithms are discussed */the first based on single tree crown identification using a variable window size for local filtering, and the second based on the height of all laser pulses within the area covered by the ground truth data. Height estimates resulted from processing lidar data with both algorithms were compared to field measurements obtained with a plot design following the USDA Forest Service Forest Inventory and Analysis (FIA) field data layout. Linear regression was used to develop equations relating lidar-estimated parameters with field inventories for each of the FIA plots. As expected, the maximum height on each plot was predicted with the highest accuracy (R 2 values of 85 and 90%, for the first and the second algorithm, respectively). The variable window size algorithm performed better for predicting heights of dominant and co-dominant trees (R 2 values 84 � /85%), with a diameter at breast height (dbh) larger than 12.7 cm (5 in), when compared with the algorithm based on all laser

...read moreread less

Patent•

Method and apparatus for disseminating topology information and for discovering new neighboring nodes

[...]

Richard G. Ogier¹, Fred Lambert Templin, Mark Lewis•Institutions (1)

SRI International¹

29 Nov 2002

TL;DR: In this article, a proactive link-state routing protocol for mobile ad-hoc networks is proposed, which employs a combination of periodic and differential updates to keep all neighbors informed of the reportable part of its source tree.

...read moreread less

Abstract: A proactive link-state routing protocol designed for mobile ad-hoc networks is disclosed, which provides hop-by-hop routing along shortest paths to each destination. Each node running the present protocol will compute a source tree (providing paths to all reachable nodes) based on partial topology information stored in its topology table. To minimize overhead, each node reports only “part” of its source tree to neighbors. The present invention employs a combination of periodic and differential updates to keep all neighbors informed of the reportable part of its source tree. The present invention performs neighbor discovery using “differential” HELLO messages that report only “changes” in the status of neighbors. This results in HELLO messages that are much smaller than those of other link-state routing protocols.

...read moreread less

Proceedings Article•DOI•

SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation

[...]

Catherine Plaisant¹, J. Grosjean¹, Benjamin B. Bederson¹•Institutions (1)

University of Maryland, College Park¹

28 Oct 2002

TL;DR: In this article, the authors present a tree browser that adds dynamic rescaling of branches of the tree to best fit the available screen space, optimized camera movement, and the use of preview icons summarizing the topology of the branches that cannot be expanded.

...read moreread less

Abstract: We present a novel tree browser that builds on the conventional node link tree diagrams. It adds dynamic rescaling of branches of the tree to best fit the available screen space, optimized camera movement, and the use of preview icons summarizing the topology of the branches that cannot be expanded. In addition, it includes integrated search and filter functions. This paper reflects on the evolution of the design and highlights the principles that emerged from it. A controlled experiment showed benefits for navigation to already previously visited nodes and estimation of overall tree topology.

...read moreread less

Journal Article•DOI•

Genome trees and the tree of life

[...]

Yuri I. Wolf¹, Igor B. Rogozin¹, Nick V. Grishin², Eugene V. Koonin¹•Institutions (2)

National Institutes of Health¹, University of Texas Southwestern Medical Center²

01 Sep 2002-Trends in Genetics

TL;DR: Alternative approaches to tree construction that attempt to determine tree topology on the basis of comparisons of complete gene sets seem to reveal a phylogenetic signal that supports the three-domain evolutionary scenario and suggests the possibility of delineation of previously undetected major clades of prokaryotes.

...read moreread less

Journal Article•DOI•

Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees.

[...]

Ying Xu¹, Victor Olman¹, Dong Xu¹•Institutions (1)

Oak Ridge National Laboratory¹

01 Apr 2002-Bioinformatics

TL;DR: A new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory, which can overcome many of the problems faced by classical clustering algorithms.

...read moreread less

Abstract: Motivation: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem. Results: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR). To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae, expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging. Availability: EXCAVATOR is available on request from the authors.

...read moreread less

Book Chapter•DOI•

Induction of Association Rules: Apriori Implementation

[...]

Christian Borgelt¹, Rudolf Kruse¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

01 Jan 2002

TL;DR: An implementation of the well-known apriori algorithm for the induction of association rules that is based on the concept of a prefix tree, which may be used in order to minimize the time needed to find the frequent itemsets as well as to reduce the amount of memory needed to store the counters.

...read moreread less

Abstract: We describe an implementation of the well-known apriori algorithm for the induction of association rules [Agrawal et al. (1993), Agrawal et al. (1996)] that is based on the concept of a prefix tree. While the idea to use this type of data structure is not new, there are several ways to organize the nodes of such a tree, to encode the items, and to organize the transactions, which may be used in order to minimize the time needed to find the frequent itemsets as well as to reduce the amount of memory needed to store the counters. Consequently, our emphasis is less on concepts, but on implementation issues, which, however, can make a considerable difference in applications.

...read moreread less

Proceedings Article•DOI•

Constructing minimum-energy broadcast trees in wireless ad hoc networks

[...]

Weifa Liang¹•Institutions (1)

Australian National University¹

09 Jun 2002

TL;DR: The technique adopted in this paper is to reduce the minimum-energy broadcast (multicast) tree problem on a wireless ad hoc network to an optimization problem on an auxiliary weighted graph, and solve the optimization Problem on the auxiliary graph which in turn gives an approximate solution for the original problem.

...read moreread less

Abstract: In this paper we assume that a multihop wireless network (also called a wireless ad hoc network) consists of nodes whose transmitting powers are finitely adjustable. We consider two fundamental problems related to power consumption in this kind of network. One is the minimum-energy broadcast tree problem, which broadcasts a message from a source node to all the other nodes in the network such that the summation of transmission powers at all nodes is minimized; and another is the minimum-energy multicast tree problem, which multicasts a message from a source node to the nodes in a given subset of nodes such that the summation of the transmission powers at all involved nodes is minimized.We first show the minimum-energy broadcast tree problem is NP-complete. We then present an approximate algorithm for the problem in a general setting, which delivers an approximate solution with a bounded performance guarantee. The algorithm takes O((k+1)1/ϵn3/ϵ time, where n is the number of nodes in the wireless network, k is the number of power levels at each node, and ϵ is constant with 0

...read moreread less

Journal Article•DOI•

QuickTree: building huge Neighbour-Joining trees of protein sequences.

[...]

Kevin L. Howe¹, Alex Bateman¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2002-Bioinformatics

TL;DR: This work has written a fast implementation of the popular Neighbor-Joining tree building algorithm QuickTree, which allows the reconstruction of phylogenies for very large protein families that would be infeasible using other popular methods.

...read moreread less

Abstract: We have written a fast implementation of the popular Neighbor-Joining tree building algorithm. QuickTree allows the reconstruction of phylogenies for very large protein families (including the largest Pfam alignment containing 27000 HIV GP120 glycoprotein sequences) that would be infeasible using other popular methods.

...read moreread less

Journal Article•DOI•

Understanding electrical trees in solids: from experiment to theory

[...]

L.A. Dissado¹•Institutions (1)

University of Leicester¹

07 Nov 2002-IEEE Transactions on Dielectrics and Electrical Insulation

TL;DR: A review of recent developments made in the understanding of the electrical tree mechanism is presented and the chaotic nature of the tree propagation mechanism is discussed both through experimental data and the results of a completely deterministic theoretical model.

...read moreread less

Abstract: A review of recent developments made in the understanding of the electrical tree mechanism is presented. The life of the tree is covered from initiation, through propagation, to long-term changes in shape. The initiation process is examined in terms of the injection of space charge and its ability to transfer energy to the polymer to create damage. Theoretical models for the processes involved are assessed in terms of the experimental data and an outline for the sequence of events in tree initiation developed. The inter-relationship between tree discharges, tree propagation, and tree shape is discussed. Theoretical models for these processes are evaluated in terms of their ability to reproduce experimental data, especially tree shapes and discharge sequences in time and space. The chaotic nature of the tree propagation mechanism is discussed both through experimental data and the results of a completely deterministic theoretical model. Some special features of electrical trees such as the existence of conducting trees, acceleration at long times and slow growth in thick insulation are briefly touched upon. Finally a summary of the state of the art is presented.

...read moreread less

Book Chapter•DOI•

Dynamic Replica Placement for Scalable Content Delivery

[...]

Yan Chen¹, Randy H. Katz¹, John Kubiatowicz¹•Institutions (1)

University of California¹

07 Mar 2002

TL;DR: Simulation results show that the dissemination tree has close to the optimal number of replicas, good load distribution, small delay and bandwidth penalties for update multicast compared with the ideal case: static replica placement on IP multicast.

...read moreread less

Abstract: In this paper, we propose the dissemination tree, a dynamic content distribution system built on top of a peer-to-peer location service. We present a replica placement protocol that builds the tree while meeting QoS and server capacity constraints. The number of replicas as well as the delay and bandwidth consumption for update propagation are significantly reduced. Simulation results show that the dissemination tree has close to the optimal number of replicas, good load distribution, small delay and bandwidth penalties for update multicast compared with the ideal case: static replica placement on IP multicast.

...read moreread less

Journal Article•DOI•

Recognizing mathematical expressions using tree transformation

[...]

Richard Zanibbi¹, Dorothea Blostein¹, James R. Cordy¹•Institutions (1)

Queen's University¹

01 Nov 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A robust and efficient system for recognizing typeset and handwritten mathematical notation that allows robust handling of unexpected input, increases the scalability of the system, and provides the groundwork for handling dialects of mathematical notation.

...read moreread less

Abstract: We describe a robust and efficient system for recognizing typeset and handwritten mathematical notation. From a list of symbols with bounding boxes the system analyzes an expression in three successive passes. The Layout Pass constructs a Baseline Structure Tree (BST) describing the two-dimensional arrangement of input symbols. Reading order and operator dominance are used to allow efficient recognition of symbol layout even when symbols deviate greatly from their ideal positions. Next, the Lexical Pass produces a Lexed BST from the initial BST by grouping tokens comprised of multiple input symbols; these include decimal numbers, function names, and symbols comprised of nonoverlapping primitives such as "=". The Lexical Pass also labels vertical structures such as fractions and accents. The Lexed BST is translated into L/sup A/T/sub E/X. Additional processing, necessary for producing output for symbolic algebra systems, is carried out in the Expression Analysis Pass. The Lexed BST is translated into an Operator Tree, which describes the order and scope of operations in the input expression. The tree manipulations used in each pass are represented compactly using tree transformations. The compiler-like architecture of the system allows robust handling of unexpected input, increases the scalability of the system, and provides the groundwork for handling dialects of mathematical notation.

...read moreread less

Journal Article•DOI•

A decision-tree-based symbolic rule induction system for text categorization

[...]

D. E. Johnson¹, F. J. Oles¹, Tong Zhang¹, T. Goetz¹•Institutions (1)

IBM¹

01 Jul 2002-Ibm Systems Journal

TL;DR: A decision-tree-based symbolic rule induction system for categorizing text documents automatically and a new method for converting a decision tree to a rule set that is simplified, but still logically equivalent to, the original tree is presented.

...read moreread less

Abstract: We present a decision-tree-based symbolic rule induction system for categorizing text documents automatically. Our method for rule induction involves the novel combination of (1) a fast decision tree induction algorithm especially suited to text data and (2) a new method for converting a decision tree to a rule set that is simplified, but still logically equivalent to, the original tree. We report experimental results on the use of this system on some practical problems.

...read moreread less

Journal Article•DOI•

Searching in metric spaces by spatial approximation

[...]

Gonzalo Navarro¹•Institutions (1)

University of Chile¹

01 Aug 2002

TL;DR: In this article, the authors propose a data structure called sa-tree (SPatial approximation tree), which is based on approaching the searched objects spatially, that is, getting closer and closer to them.

...read moreread less

Abstract: We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not need to tune parameters, which makes it appealing for use by non-experts.

...read moreread less

Patent•

Computer processes for selecting nodes to call to attention of a user during browsing of a hierarchical browse structure

[...]

Ruben Ernesto Ortega, Joel R. Spiegel, Lauri E. Bortscheller

29 Jul 2002

TL;DR: In this article, a computer-implemented process identifies specific nodes within a browse tree or other hierarchical browse structure based on historical actions of online users, and calls such nodes to the attention of users during navigation of the browse structure.

...read moreread less

Abstract: A computer-implemented process identifies specific nodes within a browse tree or other hierarchical browse structure based on historical actions of online users, and calls such nodes to the attention of users during navigation of the browse structure. The system and method are particularly useful for assisting users in locating popular products and/or product categories within a catalog of an online merchant, but may be used in connection with browse structures used to locate other types of items. In one embodiment, node popularity levels are determined periodically (e.g., once per day) based on user activity data that represents users' affinities for such nodes (items and/or item categories). Popular nodes are called to the attention of users, preferably by automatically “elevating” such nodes for display within the browse tree. The node elevation process may also be used to elevate nodes that are predicted to be of interest to a particular user.

...read moreread less

Journal Article•DOI•

Inferring the Root of a Phylogenetic Tree

[...]

John P. Huelsenbeck¹, Jonathan P. Bollback, Amy M. Levine•Institutions (1)

University of Rochester¹

01 Jan 2002-Systematic Biology

TL;DR: A Bayesian method for inferring the root of a phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution is introduced.

...read moreread less

Abstract: Phylogenetic trees can be rooted by a number of criteria. Here, we introduce a Bayesian method for inferring the root of a phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution. We perform simulation analyses to examine the relative ability of these three criteria to correctly identify the root of the tree. The outgroup and molecular clock criteria were best able to identify the root of the tree, whereas the nonreversible model was able to identify the root only when the substitution process was highly nonreversible. We also examined the performance of the criteria for a tree of four species for which the topology and root position are well supported. Results of the analyses of these data are consistent with the simulation results.

...read moreread less

Proceedings Article•DOI•

Mining frequent item sets by opportunistic projection

[...]

Junqiang Liu¹, Yunhe Pan¹, Ke Wang², Jiawei Han³•Institutions (3)

Zhejiang University¹, Simon Fraser University², University of Illinois at Urbana–Champaign³

23 Jul 2002

TL;DR: This paper presents a novel algorithm Opportune Project for mining complete set of frequent item sets by projecting databases to grow a frequent item set tree, and proposes novel methods to build tree-based pseudo projections and array-based unfiltered projections for projected transaction subsets.

...read moreread less

Abstract: In this paper, we present a novel algorithm Opportune Project for mining complete set of frequent item sets by projecting databases to grow a frequent item set tree. Our algorithm is fundamentally different from those proposed in the past in that it opportunistically chooses between two different structures, array-based or tree-based, to represent projected transaction subsets, and heuristically decides to build unfiltered pseudo projection or to make a filtered copy according to features of the subsets. More importantly, we propose novel methods to build tree-based pseudo projections and array-based unfiltered projections for projected transaction subsets, which makes our algorithm both CPU time efficient and memory saving. Basically, the algorithm grows the frequent item set tree by depth first search, whereas breadth first search is used to build the upper portion of the tree if necessary. We test our algorithm versus several other algorithms on real world datasets, such as BMS-POS, and on IBM artificial datasets. The empirical results show that our algorithm is not only the most efficient on both sparse and dense databases at all levels of support threshold, but also highly scalable to very large databases.

...read moreread less

Book Chapter•DOI•

Tree Pattern Relaxation

[...]

Sihem Amer-Yahia¹, SungRan Cho², Divesh Srivastava¹•Institutions (2)

AT&T Labs¹, Stevens Institute of Technology²

25 Mar 2002

TL;DR: This paper studies the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns, and designs data pruning algorithms where intermediate query results are filtered dynamically during the evaluation process.

...read moreread less

Abstract: Tree patterns are fundamental to querying tree-structured data like XML Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers In this paper, we study the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns We consider weighted tree patterns, where exact and relaxed weights, associated with nodes and edges of the tree pattern, are used to compute the scores of query answers We are interested in the problem of finding answers whose scores are at least as large as a given threshold We design data pruning algorithms where intermediate query results are filtered dynamically during the evaluation process We develop anoptimization that exploits scores of intermediate results to improve query evaluation efficiency Finally, we show experimentally that our techniques outperform rewriting-based and post-pruning strategies

...read moreread less

Journal Article•DOI•

Fast indexing and visualization of metric data sets using slim-trees

[...]

Caetano Traina¹, Agma J. M. Traina¹, Christos Faloutsos², Bernhard Seeger³•Institutions (3)

University of São Paulo¹, Carnegie Mellon University², University of Marburg³

01 Mar 2002-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The slim-tree is a metric access method that tackles the problem of overlaps between nodes in metric spaces and that allows one to minimize the overlap, and how to improve the performance of a metric tree through the proposed "slim-down" algorithm is shown.

...read moreread less

Abstract: Many recent database applications need to deal with similarity queries. For such applications, it is important to measure the similarity between two objects using the distance between them. Focusing on this problem, this paper proposes the slim-tree, a new dynamic tree for organizing metric data sets in pages of fixed size. The slim-tree uses the triangle inequality to prune the distance calculations that are needed to answer similarity queries over objects in metric spaces. The proposed insertion algorithm uses new policies to select the nodes where incoming objects are stored. When a node overflows, the slim-tree uses a minimal spanning tree to help with the splitting. The new insertion algorithm leads to a tree with high storage utilization and improved query performance. The slim-tree is a metric access method that tackles the problem of overlaps between nodes in metric spaces and that allows one to minimize the overlap. The proposed "fat-factor" is a way to quantify whether a given tree can be improved and also to compare two trees. We show how to use the fat-factor to achieve accurate estimates of the search performance and also how to improve the performance of a metric tree through the proposed "slim-down" algorithm. This paper also presents a new tool in the slim-tree's arsenal of resources, aimed at visualizing it. Visualization is a powerful tool for interactive data mining and for the visual tracking of the behavior of a tree under updates. Finally, we present a formula to estimate the number of disk accesses in range queries. Results from experiments with real and synthetic data sets show that the new slim-tree algorithms lead to performance improvements. These results show that the slim-tree outperforms the M-tree by up to 200% for range queries. For insertion and splitting, the minimal-spanning-tree-based algorithm achieves up to 40 times faster insertions. We observed improvements of up to 40% in range queries after applying the slim-down algorithm.

...read moreread less

Patent•

System and method for filtering XML documents with XPath expressions

[...]

Chee-Yong Chan¹, Pascal Felber¹, Minos Garofalakis¹, Rajeev Rastogi¹•Institutions (1)

Alcatel-Lucent¹

09 Jul 2002

TL;DR: A system for filtering an XML document with XPath expressions and a selective data dissemination system incorporating the system or the method is described in this paper, where a tree prober is associated with the tree builder and employs the XPath expression tree to probe the document data tree and obtain matches with the substrings.

...read moreread less

Abstract: A system for, and method of, filtering an XML document with XPath expressions and a selective data dissemination system incorporating the system or the method. In one embodiment, the filtering system includes: (1) a tree builder that builds a document data tree for the XML document and an XPath expression tree based on substrings in the XPath expressions and (2) a tree prober, associated with the tree builder, that employs the XPath expression tree to probe the document data tree and obtain matches with the substrings.

...read moreread less

Journal Article•DOI•

Automated ortholog inference from phylogenetic trees and calculation of orthology reliability.

[...]

Christian E. V. Storm¹, Erik L. L. Sonnhammer¹•Institutions (1)

Karolinska Institutet¹

01 Jan 2002-Bioinformatics

TL;DR: A novel method is presented that resolves the problem of finding orthologs by analyzing a set of bootstrap trees instead of the optimal tree and calculates orthology support levels for all pairwise combinations of homologous sequences of two species.

...read moreread less

Abstract: Motivation: Orthologous proteins in different species are likely to have similar biochemical function and biological role. When annotating a newly sequenced genome by sequence homology, the most precise and reliable functional information can thus be derived from orthologs in other species. A standard method of finding orthologs is to compare the sequence tree with the species tree. However, since the topology of phylogenetic tree is not always reliable one might get incorrect assignments. Results: Here we present a novel method that resolves this problem by analyzing a set of bootstrap trees instead of the optimal tree. The frequency of orthology assignments in the bootstrap trees can be interpreted as a support value for the possible orthology of the sequences. Our method is efficient enough to analyze data in the scale of whole genomes. It is implemented in Java and calculates orthology support levels for all pairwise combinations of homologous sequences of two species. The method was tested on simulated datasets and on real data of homologous proteins. Availability: Downloadable free of charge from ftp://ftp. cgb.ki.se/pub/prog/orthostrapper/ or on request from the authors.

...read moreread less

Collapse