scispace - formally typeset
Search or ask a question

Showing papers on "Tree (data structure) published in 2014"


Journal ArticleDOI
TL;DR: Variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N) are developed and shown to substantially accelerate and make it possible to learnembeddings of data sets with millions of objects.
Abstract: The paper investigates the acceleration of t-SNE--an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots--using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N log N). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.

2,079 citations


Journal ArticleDOI
06 Mar 2014-Nature
TL;DR: A global analysis of 403 tropical and temperate tree species shows that for most species mass growth rate increases continuously with tree size, which means large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees.
Abstract: Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle--particularly net primary productivity and carbon storage--increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree's total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to undertand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.

692 citations


Journal ArticleDOI
TL;DR: A new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets, which can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
Abstract: Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.

393 citations


Journal ArticleDOI
TL;DR: A novel, user-friendly software package engineered for conducting state-of-the-art Bayesian tree inferences on data sets of arbitrary size is introduced and first experiences with Bayesian inferences at the whole-genome level are reported on.
Abstract: Modern sequencing technology now allows biologists to collect the entirety of molecular evidence for reconstructing evolutionary trees. We introduce a novel, user-friendly software package engineered for conducting state-of-the-art Bayesian tree inferences on data sets of arbitrary size. Our software introduces a nonblocking parallelization of Metropolis-coupled chains, modifications for efficient analyses of data sets comprising thousands of partitions and memory saving techniques. We report on first experiences with Bayesian inferences at the whole-genome level using the SuperMUC supercomputer and simulated data.

369 citations


Journal ArticleDOI
TL;DR: This work is the first to show the empirical benefit of automatically generalized captions for composing natural image descriptions, and significantly better performance than previous approaches for both image caption generalization and generation.
Abstract: We present a new tree based approach to composing expressive image descriptions that makes use of naturally occuring web images with captions. We investigate two related tasks: image caption generalization and generation , where the former is an optional sub-task of the latter. The high-level idea of our approach is to harvest expressive phrases (as tree fragments) from existing image descriptions, then to compose a new description by selectively combining the extracted (and optionally pruned) tree fragments. Key algorithmic components are tree composition and compression , both integrating tree structure with sequence structure. Our proposed system attains significantly better performance than previous approaches for both image caption generalization and generation . In addition, our work is the first to show the empirical benefit of automatically generalized captions for composing natural image descriptions.

285 citations


Journal ArticleDOI
12 Dec 2014-Science
TL;DR: A statistical binning technique to address gene tree estimation error is developed and used to produce the first genome-scale coalescent-based avian tree of life, which is helpful in providing more accurate estimations of ILS levels in biological data sets.
Abstract: Gene tree incongruence arising from incomplete lineage sorting (ILS) can reduce the accuracy of concatenation-based estimations of species trees. Although coalescent-based species tree estimation methods can have good accuracy in the presence of ILS, they are sensitive to gene tree estimation error. We propose a pipeline that uses bootstrapping to evaluate whether two genes are likely to have the same tree, then it groups genes into sets using a graph-theoretic optimization and estimates a tree on each subset using concatenation, and finally produces an estimated species tree from these trees using the preferred coalescent-based method. Statistical binning improves the accuracy of MP-EST, a popular coalescent-based method, and we use it to produce the first genome-scale coalescent-based avian tree of life.

248 citations


Journal ArticleDOI
TL;DR: Extensive experimental results on five datasets with pixel-wise ground truths demonstrate that the proposed saliency tree model consistently outperforms the state-of-the-art saliency models.
Abstract: This paper proposes a novel saliency detection framework termed as saliency tree. For effective saliency measurement, the original image is first simplified using adaptive color quantization and region segmentation to partition the image into a set of primitive regions. Then, three measures, i.e., global contrast, spatial sparsity, and object prior are integrated with regional similarities to generate the initial regional saliency for each primitive region. Next, a saliency-directed region merging approach with dynamic scale control scheme is proposed to generate the saliency tree, in which each leaf node represents a primitive region and each non-leaf node represents a non-primitive region generated during the region merging process. Finally, by exploiting a regional center-surround scheme based node selection criterion, a systematic saliency tree analysis including salient node selection, regional saliency adjustment and selection is performed to obtain final regional saliency measures and to derive the high-quality pixel-wise saliency map. Extensive experimental results on five datasets with pixel-wise ground truths demonstrate that the proposed saliency tree model consistently outperforms the state-of-the-art saliency models.

245 citations



Journal ArticleDOI
TL;DR: High-resolution LiDAR data captured from a small multirotor unmanned aerial vehicle platform is used to determine the influence of the detection algorithm and point density on the accuracy of tree detection and delineation.
Abstract: Light detection and Ranging (LiDAR) is becoming an increasingly used tool to support decision-making processes within forest operations. Area-based methods that derive information on the condition of a forest based on the distribution of points within the canopy have been proven to produce reliable and consistent results. Individual tree-based methods, however, are not yet used operationally in the industry. This is due to problems in detecting and delineating individual trees under varying forest conditions resulting in an underestimation of the stem count and biases toward larger trees. The aim of this paper is to use high-resolution LiDAR data captured from a small multirotor unmanned aerial vehicle platform to determine the influence of the detection algorithm and point density on the accuracy of tree detection and delineation. The study was conducted in a four-year-old Eucalyptus globulus stand representing an important stage of growth for forest management decision-making process. Five different tree detection routines were implemented, which delineate trees directly from the point cloud, voxel space, and the canopy height model (CHM). The results suggest that both algorithm and point density are important considerations in the accuracy of the detection and delineation of individual trees. The best performing method that utilized both the CHM and the original point cloud was able to correctly detect 98% of the trees in the study area. Increases in point density (from 5 to 50 $\hbox{points/m}^{2}$ ) lead to significant improvements (of up to 8%) in the rate of omission for algorithms that made use of the high density of the data.

206 citations


Proceedings ArticleDOI
01 Oct 2014
TL;DR: Quantitative and qualitative analysis of the results, based on two user studies, show that the approach significantly outperforms extractive and abstractive baselines.
Abstract: We propose a novel abstractive summarization system for product reviews by taking advantage of their discourse structure. First, we apply a discourse parser to each review and obtain a discourse tree representation for every review. We then modify the discourse trees such that every leaf node only contains the aspect words. Second, we aggregate the aspect discourse trees and generate a graph. We then select a subgraph representing the most important aspects and the rhetorical relations between them using a PageRank algorithm, and transform the selected subgraph into an aspect tree. Finally, we generate a natural language summary by applying a template-based NLG framework. Quantitative and qualitative analysis of the results, based on two user studies, show that our approach significantly outperforms extractive and abstractive baselines.

203 citations


Posted Content
TL;DR: The authors showed that the unrooted topology of the $n$-leaf phylogenetic species tree is generically identifiable given observed data at the leaves of the tree that are assumed to have arisen from the coalescent process with time-reversible substitution.
Abstract: The inference of the evolutionary history of a collection of organisms is a problem of fundamental importance in evolutionary biology. The abundance of DNA sequence data arising from genome sequencing projects has led to significant challenges in the inference of these phylogenetic relationships. Among these challenges is the inference of the evolutionary history of a collection of species based on sequence information from several distinct genes sampled throughout the genome. It is widely accepted that each individual gene has its own phylogeny, which may not agree with the species tree. Many possible causes of this gene tree incongruence are known. The best studied is incomplete lineage sorting, which is commonly modeled by the coalescent process. Numerous methods based on the coalescent process have been proposed for estimation of the phylogenetic species tree given multi-locus DNA sequence data. However, use of these methods assumes that the phylogenetic species tree can be identified from DNA sequence data at the leaves of the tree, although this has not been formally established. We prove that the unrooted topology of the $n$-leaf phylogenetic species tree is generically identifiable given observed data at the leaves of the tree that are assumed to have arisen from the coalescent process with time-reversible substitution.

Posted Content
TL;DR: In this article, a bandit algorithm for smooth trees (BAST) is proposed, which takes into account ac- tual smoothness of the rewards for perform- ing efficient "cuts" of sub-optimal branches with high confidence.
Abstract: Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re- turn rapidly a good value, and improve preci- sion if more time is provided. The UCT algo- rithm [8], a tree search method based on Up- per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is "over-optimistic" in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us- ing a confidence sequence that scales expo- nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account ac- tual smoothness of the rewards for perform- ing efficient "cuts" of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly in- finite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illus- trate these methods on a global optimization problem of a continuous function, given noisy values.

BookDOI
01 Jan 2014
TL;DR: This dissertation aims to provide a history of web exceptionalism from 1989 to 2002, a period chosen in order to explore its roots as well as specific cases up to and including the year in which descriptions of “Web 2.0” began to circulate.
Abstract: Description based on online resource; title from PDF title page (ebrary, viewed December 16, 2013).


Proceedings ArticleDOI
01 Jun 2014
TL;DR: A much faster model whose time complexity is linear in the number of sentences, with two linear-chain CRFs applied in cascade as local classifiers and a novel approach of post-editing, which modifies a fully-built tree by considering information from constituents on upper levels, can improve the accuracy.
Abstract: Text-level discourse parsing remains a challenge. The current state-of-the-art overall accuracy in relation assignment is 55.73%, achieved by Joty et al. (2013). However, their model has a high order of time complexity, and thus cannot be applied in practice. In this work, we develop a much faster model whose time complexity is linear in the number of sentences. Our model adopts a greedy bottom-up approach, with two linear-chain CRFs applied in cascade as local classifiers. To enhance the accuracy of the pipeline, we add additional constraints in the Viterbi decoding of the first CRF. In addition to efficiency, our parser also significantly outperforms the state of the art. Moreover, our novel approach of post-editing, which modifies a fully-built tree by considering information from constituents on upper levels, can further improve the accuracy.

Journal ArticleDOI
TL;DR: A three-stage real-time Traffic Sign Recognition system, consisting of a segmentation, a detection and a classification phase, is presented, showing that only a subset of about one third of the features is sufficient to attain a high classification accuracy on the German Traffic Sign recognition Benchmark.

Journal ArticleDOI
TL;DR: A concise reference phylogeny is introduced whereby it does not aim to provide an exhaustive tree that includes all known Y‐SNPs but, rather, a quite stable reference tree aiming for optimal global discrimination capacity based on a strongly reduced set that includes only the most resolving Y‐ SNPs.
Abstract: During the last few decades, a wealth of studies dedicated to the human Y chromosome and its DNA variation, in particular Y-chromosome single-nucleotide polymorphisms (Y-SNPs), has led to the construction of a well-established Y-chromosome phylogeny. Since the recent advent of new sequencing technologies, the discovery of additional Y-SNPs is exploding and their continuous incorporation in the phylogenetic tree is leading to an ever higher resolution. However, the large and increasing amount of information included in the "complete" Y-chromosome phylogeny, which now already includes many thousands of identified Y-SNPs, can be overwhelming and complicates its understanding as well as the task of selecting suitable markers for genotyping purposes in evolutionary, demographic, anthropological, genealogical, medical, and forensic studies. As a solution, we introduce a concise reference phylogeny whereby we do not aim to provide an exhaustive tree that includes all known Y-SNPs but, rather, a quite stable reference tree aiming for optimal global discrimination capacity based on a strongly reduced set that includes only the most resolving Y-SNPs. Furthermore, with this reference tree, we wish to propose a common standard for Y-marker as well as Y-haplogroup nomenclature. The current version of our tree is based on a core set of 417 branch-defining Y-SNPs and is available online at http://www.phylotree.org/Y.

Proceedings ArticleDOI
24 Mar 2014
TL;DR: This paper proposes a new class of automated synthesis methods for generating approximate circuits directly from behavioral-level descriptions, and is able to identify the optimal designs that represent the Pareto frontier trade-off between accuracy and power consumption.
Abstract: Many classes of applications, especially in the domains of signal and image processing, computer graphics, computer vision, and machine learning, are inherently tolerant to inaccuracies in their underlying computations. This tolerance can be exploited to design approximate circuits that perform within acceptable accuracies but have much lower power consumption and smaller area footprints (and often better run times) than their exact counterparts. In this paper, we propose a new class of automated synthesis methods for generating approximate circuits directly from behavioral-level descriptions. In contrast to previous methods that operate at the Boolean level or use custom modifications, our automated behavioral synthesis method enables a wider range of possible approximations and can operate on arbitrary designs. Our method first creates an abstract synthesis tree (AST) from the input behavioral description, and then applies variant operators to the AST using an iterative stochastic greedy approach to identify the optimal inexact designs in an efficient way. Our method is able to identify the optimal designs that represent the Pareto frontier trade-off between accuracy and power consumption. Our methodology is developed into a tool we call ABACUS, which we integrate with a standard ASIC experimental flow based on industrial tools. We validate our methods on three realistic Verilog-based benchmarks from three different domains --- signal processing, computer vision and machine learning. Our tool automatically discovers optimal designs, providing area and power savings of up to 50% while maintaining good accuracy.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper developed a bottom-up method based on the intensity and 3D structure of leaf-off lidar point cloud data in order to solve challenges of tree segmentation in deciduous forests.
Abstract: Light Detection and Ranging (Lidar) can generate three-dimensional (3D) point cloud which can be used to characterize horizontal and vertical forest structure, so it has become a popular tool for forest research. Recently, various methods based on top-down scheme have been developed to segment individual tree from lidar data. Some of these methods, such as the one developed by Li et al. (2012) , can obtain the accuracy up to 90% when applied in coniferous forests. However, the accuracy will decrease when they are applied in deciduous forest because the interlacing tree branches can increase the difficulty to determine the tree top. In order to solve challenges of the tree segmentation in deciduous forests, we develop a new bottom-up method based on the intensity and 3D structure of leaf-off lidar point cloud data in this study. We applied our algorithm to segment trees in a forest at the Shavers Creek Watershed in Pennsylvania. Three indices were used to assess the accuracy of our method: recall, precision and F-score. The results show that the algorithm can detect 84% of the tree (recall), 97% of the segmented trees are correct (precision) and the overall F-score is 90%. The result implies that our method has good potential for segmenting individual trees in deciduous broadleaf forest.

Patent
28 Aug 2014
TL;DR: In this article, a connection method between devices in an M2M system according to an embodiment of the present invention comprises generating a resource tree to store connection information for P2P communications between the devices, receiving the connection information to generate resources related to the p2p communications between devices, and transmitting connection information stored in the resources of an m2m device which is requested according to a network load.
Abstract: A connection method between devices in an M2M system according to an embodiment of the present invention comprises generating a resource tree to store connection information for P2P communications between the devices; receiving the connection information to generate resources related to the P2P communications between the devices; and transmitting the connection information stored in the resources of an M2M device which is requested according to a network load when connection between the devices is requested by the P2P communications between the devices.

01 Jan 2014
TL;DR: VALUE as mentioned in this paper is an open European network to validate and compare downscaling methods for climate change research, aiming to foster collaboration and knowledge exchange between clima-tologists, impact modellers, statisticians, and stakeholders.
Abstract: VALUE is an open European network to validate and compare downscaling methods for climate change research. VALUE aims to foster collaboration and knowledge exchange between clima- tologists, impact modellers, statisticians, and stakeholders to establish an interdisciplinary downscaling community. A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. In this paper, we present the key ingredients of this framework. VALUE's main approach to validation is user- focused: start- ing from a specific user problem, a validation tree guides the selection of relevant validation indices and performance measures. Several experiments have been designed to isolate specific points in the down- scaling procedure where problems may occur: what is the isolated downscaling skill? How do statistical and dynamical methods compare? How do methods perform at different spatial scales? Do methods fail in representing regional climate change? How is the overall representation of regional climate, includ- ing errors inherited from global climate models? The framework will be the basis for a comprehensive community-open downscaling intercomparison study, but is intended also to provide general guidance for other validation studies.

Journal ArticleDOI
TL;DR: PTrees, a multi-scale dynamic point cloud segmentation dedicated to forest tree extraction from lidar point clouds is introduced, allowing to detect 82% of the trees with under 10% of false detection rate.

Journal ArticleDOI
TL;DR: Choi et al. as mentioned in this paper used CHAID method to perform the best classification fit for each conditioning factors, then combined it with logistic regression (LR) to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes.
Abstract: An ensemble algorithm of data mining decision tree (DT)-based CHi-squared Automatic Interaction Detection (CHAID) is widely used for prediction analysis in variety of applications. CHAID as a multivariate method has an automatic classification capacity to analyze large numbers of landslide conditioning factors. Moreover, it results two or more nodes for each independent variable, where every node contains numbers of presence or absence of landslides (dependent variable). Other DT methods such as Quick, Unbiased, Efficient Statistic Tree (QUEST) and Classification and Regression Trees (CRT) are not able to produce multi branches based tree. Thus, the main objective of this paper is to use CHAID method to perform the best classification fit for each conditioning factors, then, combined it with logistic regression (LR) to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. In the first step, a landslide inventory map with 296 landslide locations were extracted from various sources over the Pohang-Kyeong Joo catchment (South Korea). Then, the inventory was randomly split into two datasets, 70 % was used for training the models, and the remaining 30 % was used for validation purpose. Thirteen landslide conditioning factors were used for the susceptibility modeling. Then, CHAID was applied and revealed that some conditioning factors such as altitude, soil drain, soil texture and TWI, as terminal nodes and reflected the best classification fit. Then, a proposed ensemble technique was applied and the interpretations of the coefficients showed that the relationship between the decision tree branch nodes distance from drain, soil drain, and TWI, respectively, leads to better consequences assessment of landslides in the current study area. The validation results showed that both success and prediction rates, 75 and 79 %, respectively. This study proved the efficiency and reliability of ensemble DT and LR model in landslide susceptibility mapping.

Journal ArticleDOI
TL;DR: An interactive visual text analysis approach to allow users to progressively explore and analyze the complex evolutionary patterns of hierarchical topics by exploiting a tree cut to approximate each tree and allowing users to interactively modify the tree cuts based on their interests.
Abstract: Using a sequence of topic trees to organize documents is a popular way to represent hierarchical and evolving topics in text corpora. However, following evolving topics in the context of topic trees remains difficult for users. To address this issue, we present an interactive visual text analysis approach to allow users to progressively explore and analyze the complex evolutionary patterns of hierarchical topics. The key idea behind our approach is to exploit a tree cut to approximate each tree and allow users to interactively modify the tree cuts based on their interests. In particular, we propose an incremental evolutionary tree cut algorithm with the goal of balancing 1) the fitness of each tree cut and the smoothness between adjacent tree cuts; 2) the historical and new information related to user interests. A time-based visualization is designed to illustrate the evolving topics over time. To preserve the mental map, we develop a stable layout algorithm. As a result, our approach can quickly guide users to progressively gain profound insights into evolving hierarchical topics. We evaluate the effectiveness of the proposed method on Amazon's Mechanical Turk and real-world news data. The results show that users are able to successfully analyze evolving topics in text data.

Journal ArticleDOI
TL;DR: This work proposes an inverse modelling approach for stochastic trees that takes polygonal tree models as input and estimates the parameters of a procedural model so that it produces trees similar to the input.
Abstract: Procedural tree models have been popular in computer graphics for their ability to generate a variety of output trees from a set of input parameters and to simulate plant interaction with the environment for a realistic placement of trees in virtual scenes. However, defining such models and their parameters is a difficult task. We propose an inverse modelling approach for stochastic trees that takes polygonal tree models as input and estimates the parameters of a procedural model so that it produces trees similar to the input. Our framework is based on a novel parametric model for tree generation and uses Monte Carlo Markov Chains to find the optimal set of parameters. We demonstrate our approach on a variety of input models obtained from different sources, such as interactive modelling systems, reconstructed scans of real trees and developmental models.

Proceedings ArticleDOI
08 Dec 2014
TL;DR: A method to automatically identify binary code regions that are "similar" to code regions containing a reference bug to find bugs both in the same binary as the reference bug and in completely unrelated binaries (even compiled for different operating systems).
Abstract: Software vulnerabilities still constitute a high security risk and there is an ongoing race to patch known bugs. However, especially in closed-source software, there is no straightforward way (in contrast to source code analysis) to find buggy code parts, even if the bug was publicly disclosed.To tackle this problem, we propose a method called Tree Edit Distance Based Equational Matching (TEDEM) to automatically identify binary code regions that are "similar" to code regions containing a reference bug. We aim to find bugs both in the same binary as the reference bug and in completely unrelated binaries (even compiled for different operating systems). Our method even works on proprietary software systems, which lack source code and symbols.The analysis task is split into two phases. In a preprocessing phase, we condense the semantics of a given binary executable by symbolic simplification to make our approach robust against syntactic changes across different binaries. Second, we use tree edit distances as a basic block-centric metric for code similarity. This allows us to find instances of the same bug in different binaries and even spotting its variants (a concept called vulnerability extrapolation). To demonstrate the practical feasibility of the proposed method, we implemented a prototype of TEDEM that can find real-world security bugs across binaries and even across OS boundaries, such as in MS Word and the popular messengers Pidgin (Linux) and Adium (Mac OS).

Patent
30 Jan 2014
TL;DR: In this paper, a hierarchical tree is constructed by combining the respective sets of identifiers of neighbor computers in the cluster for each of multiple named resources, and a combination of the identifiers define a respective tree formed by the respective set of identifiers for a respective named resource in the set of resources.
Abstract: Multiple computers in a cluster maintain respective sets of identifiers of neighbor computers in the cluster for each of multiple named resource. A combination of the respective sets of identifiers define a respective tree formed by the respective sets of identifiers for a respective named resource in the set of named resources. Upon origination and detection of a request at a given computer in the cluster, a given computer forwards the request from the given computer over a network to successive computers in the hierarchical tree leading to the computers relevant in handling the request based on use of identifiers of neighbor computers. Thus, a combination of identifiers of neighbor computers identify potential paths to related computers in the tree.

Journal ArticleDOI
TL;DR: This paper introduces a spectral divisive clustering algorithm to efficiently extract a hierarchy over a large number of tracklets and provides an efficient positive definite kernel that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent–child relations.
Abstract: Complex activities, e.g. pole vaulting, are composed of a variable number of sub-events connected by complex spatio-temporal relations, whereas simple actions can be represented as sequences of short temporal parts. In this paper, we learn hierarchical representations of activity videos in an unsupervised manner. These hierarchies of mid-level motion components are data-driven decompositions specific to each video. We introduce a spectral divisive clustering algorithm to efficiently extract a hierarchy over a large number of tracklets (i.e. local trajectories). We use this structure to represent a video as an unordered binary tree. We model this tree using nested histograms of local motion features. We provide an efficient positive definite kernel that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent---child relations. We present experimental results on four recent challenging benchmarks: the High Five dataset (Patron-Perez et al., High five: recognising human interactions in TV shows, 2010), the Olympics Sports dataset (Niebles et al., Modeling temporal structure of decomposable motion segments for activity classification, 2010), the Hollywood 2 dataset (Marszalek et al., Actions in context, 2009), and the HMDB dataset (Kuehne et al., HMDB: A large video database for human motion recognition, 2011). We show that per-video hierarchies provide additional information for activity recognition. Our approach improves over unstructured activity models, baselines using other motion decomposition algorithms, and the state of the art.

Journal ArticleDOI
TL;DR: A new method to increase the diversity of each tree in the forests and thereby improve the overall accuracy of the Random Forests in most cases is proposed.

Proceedings ArticleDOI
24 Aug 2014
TL;DR: The problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network is considered, using graph-theoretic formulations.
Abstract: With the fast growth of smart devices and social networks, a lot of computing systems collect data that record different types of activities. An important computational challenge is to analyze these data, extract patterns, and understand activity trends. We consider the problem of mining activity networks to identify interesting events, such as a big concert or a demonstration in a city, or a trending keyword in a user community in a social network.We define an event to be a subset of nodes in the network that are close to each other and have high activity levels. We formalize the problem of event detection using two graph-theoretic formulations. The first one captures the compactness of an event using the sum of distances among all pairs of the event nodes. We show that this formulation can be mapped to the maxcut problem, and thus, it can be solved by applying standard semidefinite programming techniques. The second formulation captures compactness using a minimum-distance tree. This formulation leads to the prize-collecting Steiner-tree problem, which we solve by adapting existing approximation algorithms. For the two problems we introduce, we also propose efficient and effective greedy approaches and we prove performance guarantees for one of them. We experiment with the proposed algorithms on real datasets from a public bicycling system and a geolocation-enabled social network dataset collected from twitter. The results show that our methods are able to detect meaningful events.