scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 2007"


Journal ArticleDOI
TL;DR: iTOL is a web-based tool for the display, manipulation and annotation of phylogenetic trees that can be interactively pruned and re-rooted.
Abstract: Summary: Interactive Tree Of Life (iTOL) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. Trees can be interactively pruned and re-rooted. Various types of data such as genome sizes or protein domain repertoires can be mapped onto the tree. Export to several bitmap and vector graphics formats is supported. Availability: iTOL is available at http://itol.embl.de Contact: [email protected]

2,648 citations


Journal ArticleDOI
TL;DR: Dendroscope is a user-friendly program for visualizing and navigating phylogenetic trees, for both small and large datasets, and is optimized to run interactively on trees containing hundreds of thousands of taxa.
Abstract: Research in evolution requires software for visualizing and editing phylogenetic trees, for increasingly very large datasets, such as arise in expression analysis or metagenomics, for example. It would be desirable to have a program that provides these services in an effcient and user-friendly way, and that can be easily installed and run on all major operating systems. Although a large number of tree visualization tools are freely available, some as a part of more comprehensive analysis packages, all have drawbacks in one or more domains. They either lack some of the standard tree visualization techniques or basic graphics and editing features, or they are restricted to small trees containing only tens of thousands of taxa. Moreover, many programs are diffcult to install or are not available for all common operating systems. We have developed a new program, Dendroscope, for the interactive visualization and navigation of phylogenetic trees. The program provides all standard tree visualizations and is optimized to run interactively on trees containing hundreds of thousands of taxa. The program provides tree editing and graphics export capabilities. To support the inspection of large trees, Dendroscope offers a magnification tool. The software is written in Java 1.4 and installers are provided for Linux/Unix, MacOS X and Windows XP. Dendroscope is a user-friendly program for visualizing and navigating phylogenetic trees, for both small and large datasets.

1,235 citations


Proceedings ArticleDOI
24 May 2007
TL;DR: This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK.
Abstract: Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space \mathbb{R}^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

1,008 citations


Posted Content
TL;DR: An application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction.
Abstract: We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning--the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

580 citations


Journal ArticleDOI
TL;DR: The change distilling algorithm is presented, a tree differencing algorithm for fine-grained source code change extraction that approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al.
Abstract: A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine-grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark that we developed, which consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, that is, we reach a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evolution.

566 citations


Journal ArticleDOI
TL;DR: It is shown that finding a control strategy leading to the desired global state is computationally intractable (NP-hard) in general and this hardness result is extended for BNs with considerably restricted network structures.

475 citations


Proceedings ArticleDOI
01 May 2007
TL;DR: It is shown that the main factors attributing in the inferior performance of the tree-based approach are the static mapping of content to a particular tree, and the placement of each peer as an internal node in one tree and as a leaf in all other trees.
Abstract: Existing approaches to P2P streaming can be divided into two general classes: (i) tree-based approaches use push-based content delivery over multiple tree-shaped overlays, and (ii) mesh-based approaches use swarming content delivery over a randomly connected mesh. Previous studies have often focused on a particular P2P streaming mechanism and no comparison between these two classes has been conducted. In this paper, we compare and contrast the performance of representative protocols from each class using simulations. We identify the similarities and differences between these two approaches. Furthermore, we separately examine the behavior of content delivery and overlay construction mechanisms for both approaches in static and dynamic scenarios. Our results indicate that the mesh-based approach consistently exhibits a superior performance over the tree-based approach. We also show that the main factors attributing in the inferior performance of the tree-based approach are (i) the static mapping of content to a particular tree, and (ii) the placement of each peer as an internal node in one tree and as a leaf in all other trees.

400 citations


Journal ArticleDOI
TL;DR: SkipGraphs as mentioned in this paper are a distributed data structure based on skip lists that provide the full functionality of a balanced tree in a distributed system where resources are stored in separate nodes that may fail at any time.
Abstract: Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functionality of a balanced tree in a distributed system where resources are stored in separate nodes that may fail at any time. They are designed for use in searching peer-to-peer systems, and by providing the ability to perform queries based on key ordering, they improve on existing search tools that provide only hash table functionality. Unlike skip lists or other tree data structures, skip graphs are highly resilient, tolerating a large fraction of failed nodes without losing connectivity. In addition, simple and straightforward algorithms can be used to construct a skip graph, insert new nodes into it, search it, and detect and repair errors within it introduced due to node failures.

324 citations


Journal ArticleDOI
TL;DR: A hierarchical classification of chemical scaffolds (molecular framework, which is obtained by pruning all terminal side chains) has been introduced and it is demonstrated that the classification procedure handles robustly synthetic structures and natural products.
Abstract: A hierarchical classification of chemical scaffolds (molecular framework, which is obtained by pruning all terminal side chains) has been introduced. The molecular frameworks form the leaf nodes in the hierarchy trees. By an iterative removal of rings, scaffolds forming the higher levels in the hierarchy tree are obtained. Prioritization rules ensure that less characteristic, peripheral rings are removed first. All scaffolds in the hierarchy tree are well-defined chemical entities making the classification chemically intuitive. The classification is deterministic, data-set-independent, and scales linearly with the number of compounds included in the data set. The application of the classification is demonstrated on two data sets extracted from the PubChem database, namely, pyruvate kinase binders and a collection of pesticides. The examples shown demonstrate that the classification procedure handles robustly synthetic structures and natural products.

318 citations


Journal ArticleDOI
TL;DR: A Bayesian model to estimate the a posteriori probability of the object class, after a certain match at a node of the tree, is presented, takes into account object scale and saliency and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on.
Abstract: This paper presents a novel probabilistic approach to hierarchical, exemplar-based shape matching. No feature correspondence is needed among exemplars, just a suitable pairwise similarity measure. The approach uses a template tree to efficiently represent and match the variety of shape exemplars. The tree is generated offline by a bottom-up clustering approach using stochastic optimization. Online matching involves a simultaneous coarse-to-fine approach over the template tree and over the transformation parameters. The main contribution of this paper is a Bayesian model to estimate the a posteriori probability of the object class, after a certain match at a node of the tree. This model takes into account object scale and saliency and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on. The proposed approach was tested in a variety of application domains. Here, results are presented on one of the more challenging domains: real-time pedestrian detection from a moving vehicle. A significant speed-up is obtained when comparing the proposed probabilistic matching approach with a manually tuned nonprobabilistic variant, both utilizing the same template tree structure.

305 citations


Journal Article
TL;DR: In this paper, a new time-series forecasting model based on the flexible neural tree (FNT) is introduced. But the model is not suitable for time series forecasting and it is difficult to select the proper input variables or time-lags for constructing a time series model.
Abstract: Time-series forecasting is an important research and application area. Much effort has been devoted over the past several decades to develop and improve the time-series forecasting models. This paper introduces a new time-series forecasting model based on the flexible neural tree (FNT). The FNT model is generated initially as a flexible multi-layer feed-forward neural network and evolved using an evolutionary procedure. Very often it is a difficult task to select the proper input variables or time-lags for constructing a time-series model. Our research demonstrates that the FNT model is capable of handing the task automatically. The performance and effectiveness of the proposed method are evaluated using time series prediction problems and compared with those of related methods.

Proceedings ArticleDOI
30 Apr 2007
TL;DR: This work ports Foley et al.'s kd-restart algorithm from multi-pass, using CPU load balancing, to single pass, using current GPUs' branching and looping abilities, and introduces three optimizations: a packetized formulation, a technique for restarting partially down the tree instead of at the root, and a small, fixed-size stack that is checked before resorting to restart.
Abstract: Over the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing on GPUs. Our work extends Foley et al.'s GPU k-d tree research. We port their kd-restart algorithm from multi-pass, using CPU load balancing, to single pass, using current GPUs' branching and looping abilities. We introduce three optimizations: a packetized formulation, a technique for restarting partially down the tree instead of at the root, and a small, fixed-size stack that is checked before resorting to restart. Our optimized implementation achieves 15 - 18 million primary rays per second and 16 - 27 million shadow rays per second on our test scenes.Our system also takes advantage of GPUs' strengths at rasterization and shading to offer a mode where rasterization replaces eye ray scene intersection, and primary hits and local shading are produced with standard Direct3D code. For 1024x1024 renderings of our scenes with shadows and Phong shading, we achieve 12-18 frames per second. Finally, we investigate the efficiency of our implementation relative to the computational resources of our GPUs and also compare it against conventional CPUs and the Cell processor, which both have been shown to raytrace well.

Journal ArticleDOI
TL;DR: In this paper, a methodology for individual tree-based species classification using high sampling density and small footprint lidar data is clarified, corrected and improved using a well-defined directed graph (digraph).
Abstract: In this paper, a methodology for individual tree-based species classification using high sampling density and small footprint lidar data is clarified, corrected and improved. For this purpose, a well-defined directed graph (digraph) is introduced and it plays a fundamental role in the approach. It is argued that there exists one and only one such unique digraph that describes all four pure events and resulting disjoint sets of laser points associated with a single tree in data from a two-return lidar system. However, the digraph is extendable so that it fits an n-return lidar system (n>2) with higher logical resolution. Furthermore, a mathematical notation for different types of groupings of the laser points is defined, and a new terminology for various types of individual tree-based concepts defined by the digraph is proposed. A novel calibration technique for estimating individual tree heights is evaluated. The approach replaces the unreliable maximum single laser point height of each tree with a more reliable prediction based on shape characteristics of a marginal height distribution of the whole first-return point cloud of each tree. The result shows a reduction of the RMSE of the tree heights of about 20% (stddev=1.1 m reduced to stddev=0.92 m). The method improves the species classification accuracy markedly, but it could also be used for reducing the sampling density at the time of data acquisition. Using the calibrated tree heights, a scale-invariant rescaled space for the universal set of points for each tree is defined, in which all individual tree-based geometric measurements are conducted. With the corrected and improved classification methodology the total accuracy raises from 60% to 64% for classifying three leaf-off individual tree deciduous species (N=200 each) in West Virginia, USA: oaks (Quercus spp.), red maple (Acer ruhrum), and yellow poplar (Liriodendron tuliperifera).

Proceedings Article
22 Oct 2007
TL;DR: This article introduces the bounded synthesis approach, which makes it possible to traverse this immense search space in a structured manner and demonstrates that bounded synthesis solves many synthesis problems that were previously considered intractable.
Abstract: The bounded synthesis problem is to construct an implementation that satisfies a given temporal specification and a given bound on the number of states. We present a solution to the bounded synthesis problem for linear-time temporal logic (LTL), based on a novel emptiness-preserving translation from LTL to safety tree automata. For distributed architectures, where standard unbounded synthesis is in general undecidable, we show that bounded synthesis can be reduced to a SAT problem. As a result, we obtain an effective algorithm for the bounded synthesis from LTL specifications in arbitrary architectures. By iteratively increasing the bound, our construction can also be used as a semi-decision procedure for the unbounded synthesis problem.

Journal ArticleDOI
TL;DR: In this article, an approach for delineating individual trees and estimating tree heights using LiDAR in coniferous (Pinus koraiensis, Larix leptolepis) and deciduous (Quercus spp.) forests in South Korea was presented.
Abstract: For estimation of tree parameters at the single-tree level using light detection and ranging (LiDAR), detection and delineation of individual trees is an important starting point. This paper presents an approach for delineating individual trees and estimating tree heights using LiDAR in coniferous (Pinus koraiensis, Larix leptolepis) and deciduous (Quercus spp.) forests in South Korea. To detect tree tops, the extended maxima transformation of morphological image-analysis methods was applied to the digital canopy model (DCM). In order to monitor spurious local maxima in the DCM, which cause false tree tops, different h values in the extended maxima transformation were explored. For delineation of individual trees, watershed segmentation was applied to the distance-transformed image from the detected tree tops. The tree heights were extracted using the maximum value within the segmented crown boundary. Thereafter, individual tree data estimated by LiDAR were compared to the field measurement data under five categories (correct delineation, satisfied delineation, merged tree, split tree, and not found). In our study, P. koraiensis, L. leptolepis, and Quercus spp. had the best detection accuracies of 68.1% at h = 0.18, 86.7% at h = 0.12, and 67.4% at h = 0.02, respectively. The coefficients of determination for tree height estimation were 0.77, 0.80, and 0.74 for P. koraiensis, L. leptolepis, and Quercus spp., respectively.

Proceedings ArticleDOI
29 Jul 2007
TL;DR: This paper proposes an approach for generating 3D models of natural-looking trees from images that has the additional benefit of requiring little user intervention and uses the shape patterns of visible branches to predict those of obscured branches.
Abstract: In this paper, we propose an approach for generating 3D models of natural-looking trees from images that has the additional benefit of requiring little user intervention While our approach is primarily image-based, we do not model each leaf directly from images due to the large leaf count, small image footprint, and widespread occlusions Instead, we populate the tree with leaf replicas from segmented source images to reconstruct the overall tree shape In addition, we use the shape patterns of visible branches to predict those of obscured branches We demonstrate our approach on a variety of trees

Dissertation
01 Jan 2007
TL;DR: This thesis introduces the algorithm for building binary best- first decision trees for classification problems and investigates two new pruning methods that determine an appropriate tree size by combining best-first decision tree growth with cross-validation-based selection of the number of expansions that are performed.
Abstract: Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depth-first order (Quinlan, 1993), while in best-first decision tree learners the ”best” node is expanded first. The ”best” node is the node whose split leads to maximum reduction of impurity (e.g. Gini index or information in this thesis) among all nodes available for splitting. The resulting tree will be the same when fully grown, just the order in which it is built is different. In practice, some branches of a fully-expanded tree do not truly reflect the underlying information in the domain. This problem is known as overfitting and is mainly caused by noisy data. Pruning is necessary to avoid overfitting the training data, and discards those parts that are not predictive of future data. Best-first node expansion enables us to investigate new pruning techniques by determining the number of expansions performed based on cross-validation. This thesis first introduces the algorithm for building binary best-first decision trees for classification problems. Then, it investigates two new pruning methods that determine an appropriate tree size by combining best-first decision tree growth with cross-validation-based selection of the number of expansions that are performed. One operates in a pre-pruning fashion and the other in a post-pruning fashion. They are called best-first-based pre-pruning and best-first-based post-pruning respectively in this thesis. Both of them use the same mechanisms and thus it is possible to compare the two on an even footing. Best-first-based pre-pruning stops splitting when further splitting increases the cross-validated error, while best-first-based post-pruning takes a fully-grown decision tree and then discards expansions based on the cross-validated error. Because the two new pruning methods implement cross-validation-based pruning, it is possible to compare the two to another cross-validation-based pruning method: minimal cost-complexity pruning (Breiman et al., 1984). The two main results are that best-first-based pre-pruning is competitive with best-first-based post-pruning if the so-called ”one standard error rule” is used. However, minimal

Journal ArticleDOI
TL;DR: The goal of this work is to design techniques and protocols that lead to efficient data aggregation without explicit maintenance of a structure, and proposes two corresponding mechanisms - data-aware anycast at the MAC layer and randomized waiting at the application layer.
Abstract: Data aggregation protocols can reduce the communication cost, thereby extending the lifetime of sensor networks. Prior works on data aggregation protocols have focused on tree-based or cluster-based structured approaches. Although structured approaches are suited for data gathering applications, they incur high maintenance overhead in dynamic scenarios for event-based applications. The goal of our work is to design techniques and protocols that lead to efficient data aggregation without explicit maintenance of a structure. As packets need to converge spatially and temporally for data aggregation, we propose two corresponding mechanisms - data-aware anycast at the MAC layer and randomized waiting at the application layer. We model the performance of the combined protocol that uses both the approaches and show that our analysis matches with the simulations. Using extensive simulations and experiments on a testbed with implementation in TinyOS, we study the performance and potential of structure-free data aggregation.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: This paper first introduces Gaussian process hierarchies through a simple dynamical model, then extends the approach to a more complex hierarchy which is applied to the visualisation of human motion data sets.
Abstract: The Gaussian process latent variable model (GP-LVM) is a powerful approach for probabilistic modelling of high dimensional data through dimensional reduction. In this paper we extend the GP-LVM through hierarchies. A hierarchical model (such as a tree) allows us to express conditional independencies in the data as well as the manifold structure. We first introduce Gaussian process hierarchies through a simple dynamical model, we then extend the approach to a more complex hierarchy which is applied to the visualisation of human motion data sets.

Proceedings Article
01 Jun 2007
TL;DR: Evaluation on the ACE RDC corpora shows that the dynamic context-sensitive tree span is much more suitable for relation extraction than SPT and the tree kernel outperforms the state-of-the-art Collins and Duffy’s convolution tree kernel.
Abstract: This paper proposes a tree kernel with contextsensitive structured parse tree information for relation extraction. It resolves two critical problems in previous tree kernels for relation extraction in two ways. First, it automatically determines a d ynamic context-sensitive tree span for relation extraction by extending the widely -used Shortest Path-enclosed Tree (SPT) to include necessary context information outside SPT. Second, it pr oposes a context -sensitive convolution tree kernel, which enumerates both context-free and contextsensitive sub-trees by consid ering their ancestor node paths as their contexts. Moreover, this paper evaluates the complementary nature between our tree kernel and a state -of-the-art linear kernel. Evaluation on the ACE RDC corpora shows that our dynamic context-sensitive tree span is much more suitable for relation extraction than SPT and our tree kernel outperforms the state-of-the-art Collins and Duffy’s convolution tree kernel. It also shows that our tree kernel achieves much better performance than the state-of-the-art linear kernels . Finally, it shows that feature-based and tree kernel-based methods much complement each other and the composite kernel can well integrate both flat and structured features.

Proceedings ArticleDOI
01 Dec 2007
TL;DR: A new effective and efficient scheme, called SET, to detect clones by computing set operations of exclusive subsets in the network and integrates interleaved authentication to prevent unauthorized falsification of subset information during forwarding.
Abstract: Sensor nodes that are deployed in hostile environments are vulnerable to capture and compromise. An adversary may obtain private information from these sensors, clone and intelligently deploy them in the network to launch a variety of insider attacks. This attack process is broadly termed as a clone attack. Currently, the defenses against clone attacks are not only very few, but also suffer from selective interruption of detection and high overhead (computation and memory). In this paper, we propose a new effective and efficient scheme, called SET, to detect such clone attacks. The key idea of SET is to detect clones by computing set operations (intersection and union) of exclusive subsets in the network. First, SET securely forms exclusive unit subsets among one-hop neighbors in the network in a distributed way. This secure subset formation also provides the authentication of nodes’ subset membership. SET then employs a tree structure to compute non-overlapped set operations and integrates interleaved authentication to prevent unauthorized falsification of subset information during forwarding. Randomization is used to further make the exclusive subset and tree formation unpredictable to an adversary. We show the reliability and resilience of SET by analyzing the probability that an adversary may effectively obstruct the set operations. Performance analysis and simulations also demonstrate that the proposed scheme is more efficient than existing schemes from both communication and memory cost standpoints.

Proceedings ArticleDOI
04 Sep 2007
TL;DR: It is shown that the open leaf venation model extended to three dimensions generates surprisingly realistic tree structures, offering convenient control of tree shape and structure.
Abstract: We extend the open leaf venation model by Runions et al. [RFL*05] to three dimensions and show that it generates surprisingly realistic tree structures. Model parameters correspond to visually relevant tree characteristics identified in landscaping, offering convenient control of tree shape and structure.

Journal ArticleDOI
TL;DR: It is shown that tree reconciliation methods are biased when the inferred gene tree is not correct, and these results cast doubt upon previous conclusions that vertebrate genome history has been marked by many ancient duplications and many recent gene losses.
Abstract: Background: Comparative genomic studies are revealing frequent gains and losses of whole genes via duplication and pseudogenization. One commonly used method for inferring the number and timing of gene gains and losses reconciles the gene tree for each gene family with the species tree of the taxa considered. Recent studies using this approach have found a large number of ancient duplications and recent losses among vertebrate genomes. Results: I show that tree reconciliation methods are biased when the inferred gene tree is not correct. This bias places duplicates towards the root of the tree and losses towards the tips of the tree. I demonstrate that this bias is present when tree reconciliation is conducted on both multiple mammal and Drosophila genomes, and that lower bootstrap cut-off values on gene trees lead to more extreme bias. I also suggest a method for dealing with reconciliation bias, although this method only corrects for the number of gene gains on some branches of the species tree. Conclusion: Based on the results presented, it is likely that most tree reconciliation analyses show biases, unless the gene trees used are exceptionally well-resolved and well-supported. These results cast doubt upon previous conclusions that vertebrate genome history has been marked by many ancient duplications and many recent gene losses.

Proceedings Article
19 Jul 2007
TL;DR: A Bandit Algorithm for Smooth Trees (BAST) is introduced which takes into account actual smoothness of the rewards for performing efficient "cuts" of sub-optimal branches with high confidence and is illustrated on a global optimization problem of a continuous function, given noisy values.
Abstract: Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to return rapidly a good value, and improve precision if more time is provided. The UCT algorithm [8], a tree search method based on Upper Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing efficient "cuts" of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illustrate these methods on a global optimization problem of a continuous function, given noisy values.

Proceedings ArticleDOI
29 Jul 2007
TL;DR: In this article, an approximate voxel-based tree volume is estimated using image information and the density values of the voxels are used to produce initial positions for a set of particles.
Abstract: We present a method for producing 3D tree models from input photographs with only limited user intervention. An approximate voxel-based tree volume is estimated using image information. The density values of the voxels are used to produce initial positions for a set of particles. Performing a 3D flow simulation, the particles are traced downwards to the tree basis and are combined to form twigs and branches. If possible, the trunk and the first-order branches are determined in the input photographs and are used as attractors for particle simulation. The geometry of the tree skeleton is produced using botanical rules for branch thicknesses and branching angles. Finally, leaves are added. Different initial seeds for particle simulation lead to a variety, yet similar-looking branching structures for a single set of photographs.

Patent
16 Aug 2007
TL;DR: In this article, the adaptive tree-based frame partitioning is used for encoding video data, where partitions are obtained from a combination of top-down tree partitioning and bottom-up tree joining.
Abstract: There are provided methods and apparatus for reduced resolution partitioning. An apparatus includes an encoder (300) for encoding video data using adaptive tree-based frame partitioning, wherein partitions are obtained from a combination of top-down tree partitioning and bottom-up tree joining.

Posted Content
TL;DR: A simple computationally efficient algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree.
Abstract: We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors.

Journal ArticleDOI
TL;DR: This letter proposes a two-step method for tree detection consisting of segmentation followed by classification using weighted features from aerial image and lidar, such as height, texture map, height variation, and normal vector estimates.
Abstract: In this letter, we present an approach to detecting trees in registered aerial image and range data obtained via lidar. The motivation for this problem comes from automated 3-D city modeling, in which such data are used to generate the models. Representing the trees in these models is problematic because the data are usually too sparsely sampled in tree regions to create an accurate 3-D model of the trees. Furthermore, including the tree data points interferes with the polygonization step of the building roof top models. Therefore, it is advantageous to detect and remove points that represent trees in both lidar and aerial imagery. In this letter, we propose a two-step method for tree detection consisting of segmentation followed by classification. The segmentation is done using a simple region-growing algorithm using weighted features from aerial image and lidar, such as height, texture map, height variation, and normal vector estimates. The weights for the features are determined using a learning method on random walks. The classification is done using the weighted support vector machines, allowing us to control the misclassification rate. The overall problem is formulated as a binary detection problem, and the results presented as receiver operating characteristic curves are shown to validate our approach

Book ChapterDOI
02 Dec 2007
TL;DR: This paper empirically evaluates a spectrum of Hoeffding tree variations: single trees, option trees and bagged trees, and empirically investigates pruning.
Abstract: Hoeffding trees are state-of-the-art for processing high-speed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost identical to those that would be made by conventional batch learning methods. Despite this guarantee, decisions are still subject to limited lookahead and stability issues. In this paper we explore Hoeffding Option Trees, a regular Hoeffding tree containing additional option nodes that allow several tests to be applied, leading to multiple Hoeffding trees as separate paths. We show how to control tree growth in order to generate a mixture of paths, and empirically determine a reasonable number of paths. We then empirically evaluate a spectrum of Hoeffding tree variations: single trees, option trees and bagged trees. Finally, we investigate pruning.We show that on some datasets a pruned option tree can be smaller and more accurate than a single tree.

Journal ArticleDOI
TL;DR: It is shown that the Average Linkage Minimum Spanning Tree recognizes economic sectors and sub-sectors as communities in the network slightly better than the Minimumspanning Tree and the average reliability of links is slightly greater than the average unreliable links in the average linkage minimum Spanning tree.
Abstract: We introduce a new technique to associate a spanning tree to the average linkage cluster analysis. We term this tree as the Average Linkage Minimum Spanning Tree. We also introduce a technique to associate a value of reliability to the links of correlation-based graphs by using bootstrap replicas of data. Both techniques are applied to the portfolio of the 300 most capitalized stocks traded on the New York Stock Exchange during the time period 2001–2003. We show that the Average Linkage Minimum Spanning Tree recognizes economic sectors and sub-sectors as communities in the network slightly better than the Minimum Spanning Tree. We also show that the average reliability of links in the Minimum Spanning Tree is slightly greater than the average reliability of links in the Average Linkage Minimum Spanning Tree.