scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 2001"


Patent
09 Apr 2001
TL;DR: In this article, the authors present a system and apparatus for efficient and reliable, control and distribution of data files or portions of files, applications, or other data objects in large-scale distributed networks.
Abstract: The present invention provides a system and apparatus for efficient and reliable, control and distribution of data files or portions of files, applications, or other data objects in large-scale distributed networks. A unique content-management front-end provides efficient controls for triggering distribution of digitized data content to selected groups of a large number of remote computer servers. Transport-layer protocols interact with distribution controllers to automatically determine an optimized tree-like distribution sequence to group leaders selected by network devices at each remote site. Reliable store-and-forward transfer to clusters is accomplished using a unicast protocol in the ordered tree sequence. Once command messages and content arrive at all participating group leaders, local hybrid multicast protocols efficiently and reliably distribute them to the back-end nodes for interpretation and execution. Positive acknowledgement is then sent back to the content manager from each group leader, and the updated content in each remote device autonomously goes 'live' when the content change is locally completed.

1,261 citations


Journal ArticleDOI
TL;DR: A new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network that applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used.
Abstract: Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226‐233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. Results: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. Availability: A server running the program can be found at: http://bioinfo.cnio.es/sotarray

641 citations


Journal ArticleDOI
TL;DR: This paper provides an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature and has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache.

602 citations


Journal ArticleDOI
TL;DR: This work presents the first practical algorithm for the optimal linear leaf ordering of trees that are generated by hierarchical clustering, and shows how optimal leaf ordering can reveal biological structure that is not observed with an existing heuristic ordering method.
Abstract: We present the first practical algorithm for the optimal linear leaf ordering of trees that are generated by hierarchical clustering. Hierarchical clustering has been extensively used to analyze gene expression data, and we show how optimal leaf ordering can reveal biological structure that is not observed with an existing heuristic ordering method. For a tree with n leaves, there are 2(n-1) linear orderings consistent with the structure of the tree. Our optimal leaf ordering algorithm runs in time O(n(4)), and we present further improvements that make the running time of our algorithm practical.

483 citations


Journal ArticleDOI
TL;DR: It is proved that marginal cost and Shapley value have a natural algorithm that uses only two messages per link of the multicast tree, while it is shown that the welfare value achieved by an optimal multicasts tree is NP-hard to approximate within any constant factor, even for bounded-degree networks.

446 citations


ReportDOI
13 Aug 2001
TL;DR: This work proposes a heuristic and a data-structure that network devices (such as routers) can use to detect (and eliminate) denial-of-service bandwidth attacks.
Abstract: A denial-of-service bandwidth attack is an attempt to disrupt an online service by generating a traffic overload that clogs links or causes routers near the victim to crash. We propose a heuristic and a data-structure that network devices (such as routers) can use to detect (and eliminate) such attacks. With our method, each network device maintains a data-structure, MULTOPS, that monitors certain traffic characteristics. MULTOPS (MUlti-Level Tree for Online Packet Statistics) is a tree of nodes that contains packet rate statistics for subnet prefixes at different aggregation levels. The tree expands and contracts within a fixed memory budget. A network device using MULTOPS detects ongoing bandwidth attacks by the significant, disproportional difference between packet rates going to and coming from the victim or the attacker. MULTOPS-equipped routing software running on an off-the-shelf 700 Mhz Pentium III PC can process up to 340,000 packets per second.

412 citations


Proceedings Article
11 Sep 2001
TL;DR: The MV3R-tree is proposed, a structure that utilizes the concepts of multi-version B-trees and 3D-Rtrees that compares favorably with specialized structures aimed at timestamp and interval window queries, both in terms of time and space requirements.
Abstract: Among the various types of spatio-temporal queries, the most common ones involve window queries in time. In particular, timestamp (or timeslice) queries retrieve all objects that intersect a window at a specific timestamp. Interval queries include multiple (usually consecutive) timestamps. Although several indexes have been developed for either type, currently there does not exist a structure that can efficiently process both query types. This is a significant problem due to the fundamental importance of these queries in any spatio-temporal system that deals with historical information retrieval. Our paper addresses the problem by proposing the MV3R-tree, a structure that utilizes the concepts of multi-version B-trees and 3D-Rtrees. Extensive experimentation proves that MV3R-trees compare favorably with specialized structures aimed at timestamp and interval window queries, both in terms of time and space requirements.

392 citations


Journal ArticleDOI
TL;DR: Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed.
Abstract: Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination...

323 citations


Posted Content
TL;DR: A large-scale experimental comparison of logistic regression and tree induction is presented, assessing classification accuracy and the quality of rankings based on class-membership probabilities, and a learning-curve analysis is used to examine the relationship of these measures to the size of the training set.
Abstract: Tree induction and logistic regression are two standard, off-the-shelf methodsfor building models for classification. We present a large-scale experimentalcomparison of logistic regression and tree induction, assessing classification accuracyand the quality of rankings based on class-membership probabilities. Weuse a learning-curve analysis to examine the relationship of these measures tothe size of the training set. The results of the study show several remarkablethings. (I) Contrary to prior observations, logistic regression does not generallyoutperform tree induction. (2) More specifically, and not surprisingly, logisticregression is better for smaller training sets and tree induction for larger datasets. Importantly, this often holds for training sets drawn from the same domain(i.e., the learning curves cross), so conclusions about induction-algorithmsuperiority on a given domain must be based on an analysis of the learningcurves. (3) Contrary to conventional wisdom, tree induction is effective at producingprobability-based rankings, although apparently comparatively less sofor a given training--set size than at making classifications. Finally, (4) the domainson which tree induction and logistic regression are ultimately preferablecan be characterized surprisingly well by a simple measure of signal-to-noiseratio.

319 citations


Patent
18 May 2001
TL;DR: In this article, a query is automatically sent to the nodes to determine what contents to download, and the desired contents from the subset of the block files from nodes that are least congested is downloaded.
Abstract: A method for initializing a new node in a network. The network has multiple nodes arranged in a virtual tree format. The new node is a node of the tree, and each node of the tree has a set of attributes and a set of rolled up attributes to identify each node. A query is automatically sent to the nodes to determine what contents to download. The contents are then stored as block files in the nodes. The query contains the set of attributes and rolled up attributes for the new node. The query receives replies from a subset of the nodes that have the contents needed for the new node. Each reply identifies what subset of the block files is available and the performance characteristics of that replying node. Then the desired contents from the subset of the block files from nodes that are least congested is downloaded.

314 citations


01 Jan 2001
TL;DR: This paper surveys the main consensus tree methods used in phylogenetics, and explores the links between the different methods, producing a classification of consensus Tree methods.
Abstract: A consensus tree method takes a collection of phylogenetic trees and outputs a single “representative” tree. The first consensus method was proposed by Adams in 1972. Since then a large variety of different methods have been developed, and there has been considerable debate over how they should be used. This paper has two goals. First, we survey the main consensus tree methods used in phylogenetics. Second, we explore, pretty exhaustively, the links between the different methods, producing a classification of consensus tree methods.

01 Jan 2001
TL;DR: This master thesis compares different descriptors that will describe the leaves different features and looks at different classiffication models to build a system hat could classify the different tree classes.
Abstract: The aim of this master thesis is to classify the tree class from an image of a leaf with a computer vision classiffication system. We compare different descriptors that will describe the leaves dif ...

Proceedings ArticleDOI
01 May 2001
TL;DR: A fast algorithm is presented, CDM, that identifies and eliminates local redundancies due to ICs, based on propagating “information labels” up the tree pattern, and shows the surprising result that the algorithm obtained by first augmenting the tree patterns using ICS, and then applying CIM, always finds the unique minimal equivalent query.
Abstract: Tree patterns forms a natural basis to query tree-structured data such as XML and LDAP. Since the efficiency of tree pattern matching against a tree-structured database depends on the size of the pattern, it is essential to identify and eliminate redundant nodes in the pattern and do so as quickly as possible. In this paper, we study tree pattern minimization both in the absence and in the presence of integrity constraints (ICs) on the underlying tree-structured database.When no ICs are considered, we call the process of minimizing a tree pattern, constraint-independent minimization. We develop a polynomial time algorithm called CIM for this purpose. CIM's efficiency stems from two key properties: (i) a node cannot be redundant unless its children are, and (ii) the order of elimination of redundant nodes is immaterial. When ICs are considered for minimization, we refer to it as constraint-dependent minimization. For tree-structured databases, required child/descendant and type co-occurrence ICs are very natural. Under such ICs, we show that the minimal equivalent query is unique. We show the surprising result that the algorithm obtained by first augmenting the tree pattern using ICS, and then applying CIM, always finds the unique minimal equivalent query; we refer to this algorithm as ACIM. While ACIM is also polynomial time, it can be expensive in practice because of its inherent non-locality. We then present a fast algorithm, CDM, that identifies and eliminates local redundancies due to ICs, based on propagating “information labels” up the tree pattern. CDM can be applied prior to ACIM for improving the minimization efficiency. We complement our analytical results with an experimental study that shows the effectiveness of our tree pattern minimization techniques.


Journal ArticleDOI
TL;DR: A critique is presented of the use of tree-based partitioning algorithms to formulate classification rules and identify subgroups from clinical and epidemiological data, and the issue of redundancy in tree-derived decision rules is discussed.

Journal ArticleDOI
TL;DR: In this article, a cost-based approach to multicast pricing, based on accurate characterization of multicast scalability, will facilitate the efficient and equitable resource allocation between traffic types, and a price ceiling should be set to account for the effect of tree saturation.
Abstract: Multicast and unicast traffic share and compete for network resources. A cost-based approach to multicast pricing, based on accurate characterization of multicast scalability, will facilitate the efficient and equitable resource allocation between traffic types. Through the quantification of link usage, this paper establishes a multicast scaling relationship: the cost of a multicast distribution tree varies at the 0.8 power of the multicast group size. This result is validated with both real and generated networks, and is robust across topological styles and network sizes. Since multicast cost can be accurately predicted given the membership size, there is strong motivation to price multicast according to membership size. Furthermore, a price ceiling should be set to account for the effect of tree saturation. This tariff structure is superior to either a purely membership-based or a flat-rate pricing scheme, since it reflects the actual tree cost at all group membership levels.

Patent
18 Jun 2001
TL;DR: In this article, a method for parsing a stream of tokens representative of language usage is presented, where a set of packages, each representing a phrase-structure tree associated with a grammar, are stored in the system.
Abstract: A method of parsing a stream of tokens representative of language usage is provided in one embodiment. The method includes: a. storing a set of packages, each package being representative of a phrase-structure tree, each tree derived from a rule-based grammar; and b. parsing the stream using the packages to establish a structural description for the stream. In another embodiment, there is also provided a method of parsing a stream of tokens representative of language usage. The method of this embodiment includes: a. storing a set of packages, each package being representative of a phrase structure tree associated with a grammar, wherein a subset of the packages includes a set of relational descriptions, and b. parsing the stream using the packages establish a structural description and a relational description of the stream.

Proceedings ArticleDOI
09 Jan 2001
TL;DR: This paper proposes a labeling scheme with maximum label size close to 3/2 log n, which is close to the lower bound of log n that follows from the fact that different vertices must have different labels.
Abstract: We consider the following problem. Give a rooted tree T, label the nodes of T in the most compact way such that given the labels of two nodes one can determine in constant time, by looking only at the labels, if one node is an ancestor of the other. The best known labeling scheme is rather straightforward and uses labels of size at most 2 log n, where n is the number of vertices In the tree. Our main result in this paper is a labeling scheme with maximum label size close to 3/2 log n.Our motivation for studying this problem is enhancing the performance of Web search engines. In the context of this application each indexed document is a tree and the labels of all trees are maintained in main memory. Therefore even small improvements in the maximum label size are important.There are no lower bounds known for this problem except for an obvious lower bound of log n that follows from the fact that different vertices must have different labels. The question whether one can find even shorter labels remains an intriguing open question.

Journal ArticleDOI
TL;DR: It is shown that the restrictions of tree approximation cost little in terms of rates of approximation, and encoders for compression are designed that provide upper estimates for the Kolmogorov entropy of Besov balls.

Proceedings ArticleDOI
02 Apr 2001
TL;DR: A new index structure is introduced, the Rdnn-tree, that answers both RNN and NN queries efficiently and outperforms existing methods in various aspects, and makes the index structure extremely preferable in both static and dynamic cases.
Abstract: The Reverse Nearest Neighbor (RNN) problem is to find all points in a given data set whose nearest neighbor is a given query point. Just like the Nearest Neighbor (NN) queries, the RNN queries appear in many practical situations such as marketing and resource management. Thus, efficient methods for the RNN queries in databases are required. The paper introduces a new index structure, the Rdnn-tree, that answers both RNN and NN queries efficiently. A single index structure is employed for a dynamic database, in contrast to the use of multiple indexes in previous work. This leads to significant savings in dynamically maintaining the index structure. The Rdnn-tree outperforms existing methods in various aspects. Experiments on both synthetic and real world data show that our index structure outperforms previous methods by a significant margin (more than 90% in terms of number of leaf nodes accessed) in RNN queries. It also shows improvement in NN queries over standard techniques. Furthermore, performance in insertion and deletion is significantly enhanced by the ability to combine multiple queries (NN and RNN) in one traversal of the tree. These facts make our index structure extremely preferable in both static and dynamic cases.

Patent
15 Mar 2001
TL;DR: In this paper, a plurality of web pages may be represented as a node and visualized on a dome tree, a three-dimensional image of a dome, with a portion of the outer wall removed, displayed on a two-dimensional monitor.
Abstract: A method and system for visualizing actual and predicted usage patterns through a web site is provided. A plurality of web pages may be represented as a node and visualized on a dome tree. The dome tree is a three-dimensional image of a dome, with a portion of the outer wall removed, displayed on a two-dimensional monitor. Paths into and out of each node are displayed using a variety of colors and patterns and information relating to the nodes and paths may also be accessed. By designating a web page as the root node each of the associated pages are laid out within the dome tree radially based on actual usage information. Predicted information for each node is displayed as a bar near the node, thereby assisting a user in understanding the relationship between actual and predicted usage patterns.

Proceedings ArticleDOI
29 Nov 2001
TL;DR: A new parallel algorithm MLFPT (multiple local frequent pattern tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating candidate items, and distributing the work fairly among processors.
Abstract: In this paper we introduce a new parallel algorithm MLFPT (multiple local frequent pattern tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating candidate items, and distributing the work fairly among processors. We have devised partitioning strategies at different stages of the mining process to achieve near optimal balancing between processors. We have successfully tested our algorithm on datasets larger than 50 million transactions.

Journal ArticleDOI
TL;DR: T-REX allows the user to visualize obtained tree or network structures using Hierarchical, Radial or Axial types of tree drawing and manipulate them interactively, such as: tree reconstruction using weights, tree inference from incomplete distance matrices or modeling a reticulation network for a collection of objects or species.
Abstract: Summary: T-REX (tree and reticulogram reconstruction) is an application to reconstruct phylogenetic trees and reticulation networks from distance matrices. The application includes a number of tree fitting methods like NJ, UNJ or ADDTREE which have been very popular in phylogenetic analysis. At the same time, the software comprises several new methods of phylogenetic analysis such as: tree reconstruction using weights, tree inference from incomplete distance matrices or modeling a reticulation network for a collection of objects or species. T-REX also allows the user to visualize obtained tree or network structures using Hierarchical, Radial or Axial types of tree drawing and manipulate them interactively. Availability: T-REX is a freeware package available online at: http://www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex

Journal ArticleDOI
TL;DR: An O(n/sup 3/p/sup 2//spl Lambda//sub max//sup 2/)-time algorithm for finding an optimal residence set of size p for an object in a tree with n nodes, taking into consideration the read, write, and storage costs.
Abstract: We consider the problem of placing copies of objects in a tree network in order to minimize the cost of servicing read and write requests to objects when the tree nodes have limited storage and the number of copies permitted is limited. The set of nodes that have a copy of the object is the residence set of the object. A node wishing to read the object will read the object from the closest node in the residence set. A node wishing to update the object will update the copy of the object at all the nodes in the residence set. Updates are propagated over a certain minimum spanning tree. The cost associated with a residence set equals the cost of servicing all the read and write requests and the storage costs for those copies. We describe an O(n/sup 3/p/sup 2/)-time algorithm for finding an optimal residence set of size p for an object in a tree with n nodes, taking into consideration the read, write, and storage costs. Furthermore, we describe a O(n/sup 3/p/sup 2//spl Lambda//sub max//sup 2/)-time algorithm for finding a minimum cost normal p-residence set for an object in a tree, this time also taking into account the load imposed by the nodes of the tree on the nodes in a residence set and their capacity constraints, where /spl Lambda//sub max/ is an upper bound on the capacity of each node of the tree.

Book
01 Mar 2001
TL;DR: This book discusses how to construct and present trees using PAUP on a Windows or UNIX computer using Phylogenetic Analysis and other techniques.
Abstract: Tutorial: Create a Tree! - Additional Methods for Creating Trees - Presenting and Printing Your Trees - Fine-Tuning Alignments - Using MrBayes to Recreate Ancestral DNA SEQUENCES - Dealing With Some Common Problems - Appendix I - Appendix II

Proceedings ArticleDOI
02 Apr 2001
TL;DR: This work proposes several estimation algorithms that apply set hashing and maximal overlap to estimate the number of matches of query twiglets formed using variations on different twiglet decomposition techniques, and demonstrates that accurate and robust estimates can be achieved, even with limited space.
Abstract: Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.

Patent
05 Oct 2001
TL;DR: In this paper, the authors describe a hierarchical tree-based storage area network (SAN), where each processor is associated with multiple groups from respective levels of a hierarchy, e.g., a first processor group and a second processor group hierarchically descendant from the first group.
Abstract: A storage area network (SAN), e.g., as described above, includes a one or more hosts digital data processors, each having a file system that effects access to one or more storage devices. Each processor is associated with multiple groups from respective levels of a hierarchy, e.g., a first processor group and a second processor group hierarchically descendant from the first processor group. A process, e.g., executing on a manager digital data processor, includes a graphical user interface that displays the processor groups as a hierarchical tree. Along, for example, with the identities of the processor groups, nodes of the displayed tree list attributes of the policy defined for each respective group.

Patent
21 Dec 2001
TL;DR: In this article, the EKB has a structure in which a device constituting a tree leaf holds a leaf key and a limited node key, and a specific effective key block (EKB) is generated and distributed to a group specified by a specific node.
Abstract: A content key, an authentication key, program data along with an effective key block (EKB) are transmitted by an encryption key structure of a tree structure. The EKB has a structure in which a device constituting a tree leaf holds a leaf key and a limited node key. A specific effective key block (EKB) is generated and distributed to a group specified by a specific node, thus limiting an updateable device. A device not belonging to a group cannot be decoded, ensuring the distribution security of the key and so forth. Keys or data is distributed by an encryption key structure of tree structure, thereby providing an information processing system and method capable of efficiently and safely distributing data.

Patent
26 Jul 2001
TL;DR: In this paper, a method comprising: providing a navigation tree comprising a semantic, hierarchical structure, having one or more paths associated with content of a conventional markup language document and a grammar comprising vocabulary including one-or more keywords; receiving a request to access the content; responsive to the request, traversing a path in the navigation tree, if the request includes at least one keyword of the vocabulary, is provided.
Abstract: A method comprising: providing a navigation tree comprising a semantic, hierarchical structure, having one or more paths associated with content of a conventional markup language document and a grammar comprising vocabulary including one or more keywords; receiving a request to access the content; responsive to the request, traversing a path in the navigation tree, if the request includes at least one keyword of the vocabulary, is provided.

Patent
20 Sep 2001
TL;DR: In this paper, an asynchronous distributed authoring and problem solving system, method, and computer program for focusing attention toward particular authoring/problem solving topics using a threaded discussion group and reward matrix is presented.
Abstract: Systems and methods facilitating authoring and problem solving by joint contributors working separately but against a common goal. On-line asynchronous distributed authoring and problem solving system, method, and computer program for focusing attention toward particular authoring and problem solving topics using a threaded discussion group and reward matrix. System, method, computer program and computer program product for coordinating the activities of a plurality of people, where the plurality may be any number from two to thousands or more people. Mechanism for directing the attention and focus of large numbers of people who are solving problems using a tree-based problem space, where the tree based problem space may be a virtual problem space. Algorithms and procedures for evaluating nodes in the virtual problem space and assigning values via a pay-off matrix that serves to focus the attention of large numbers of problem solvers. Combination of threaded discussion groups with the pay-off matrix and a variety of algorithms to create useful system for solving multi-level problems leveraging human expertise.