scispace - formally typeset
Search or ask a question

Showing papers on "Interval tree published in 2015"


Journal ArticleDOI
TL;DR: Comparative experiments demonstrate that the uses of waveform representation and deep Boltzmann machines contribute to the improvement of classification accuracies of tree species.
Abstract: Our work addresses the problem of extracting and classifying tree species from mobile LiDAR data. The work includes tree preprocessing and tree classification. In tree preprocessing, voxel-based upward-growing filtering is proposed to remove ground points from the mobile LiDAR data, followed by a tree segmentation that extracts individual trees via Euclidean distance clustering and voxel-based normalized cut segmentation. In tree classification, first, a waveform representation is developed to model geometric structures of trees. Then, deep learning techniques are used to generate high-level feature abstractions of the trees’ waveform representations. Quantitative analysis shows that our algorithm achieves an overall accuracy of 86.1% and a kappa coefficient of 0.8 in classifying urban tree species using mobile LiDAR data. Comparative experiments demonstrate that the uses of waveform representation and deep Boltzmann machines contribute to the improvement of classification accuracies of tree species.

155 citations


Journal ArticleDOI
TL;DR: RTED is shown optimal among all algorithms that use LRH (left-right-heavy) strategies, which include RTED and the fastest tree edit distance algorithms presented in literature.
Abstract: We consider the classical tree edit distance between ordered labelled trees, which is defined as the minimum-cost sequence of node edit operations that transform one tree into another. The state-of-the-art solutions for the tree edit distance are not satisfactory. The main competitors in the field either have optimal worst-case complexity but the worst case happens frequently, or they are very efficient for some tree shapes but degenerate for others. This leads to unpredictable and often infeasible runtimes. There is no obvious way to choose between the algorithms.In this article we present RTED, a robust tree edit distance algorithm. The asymptotic complexity of our algorithm is smaller than or equal to the complexity of the best competitors for any input instance, that is, our algorithm is both efficient and worst-case optimal. This is achieved by computing a dynamic decomposition strategy that depends on the input trees. RTED is shown optimal among all algorithms that use LRH (left-right-heavy) strategies, which include RTED and the fastest tree edit distance algorithms presented in literature. In our experiments on synthetic and real-world data we empirically evaluate our solution and compare it to the state-of-the-art.

112 citations


Journal ArticleDOI
TL;DR: A heuristic search algorithm is presented to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image using a generative, parametric tree-growth model.
Abstract: Tree-like structures are fundamental in nature, and it is often useful to reconstruct the topology of a tree—what connects to what—from a two-dimensional image of it. However, the projected branches often cross in the image: the tree projects to a planar graph, and the inverse problem of reconstructing the topology of the tree from that of the graph is ill-posed. We regularize this problem with a generative, parametric tree-growth model. Under this model, reconstruction is possible in linear time if one knows the direction of each edge in the graph—which edge endpoint is closer to the root of the tree—but becomes NP-hard if the directions are not known. For the latter case, we present a heuristic search algorithm to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image. Experimental results on retinal vessel, plant root, and synthetic tree data sets show that our methodology is both accurate and efficient.

68 citations


Proceedings ArticleDOI
Qi Mao1, Le Yang1, Li Wang2, Steve Goodison3, Yijun Sun1 
01 Jan 2015
TL;DR: A principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data that compares favorably with baselines and can discover a breast cancer progression path with multiple branches.
Abstract: Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

29 citations


Proceedings ArticleDOI
11 Jul 2015
TL;DR: The Brando (BRANDO) operator is introduced, which selects from the parent tree the overall best subtree for applying RDO, using a small randomly drawn static library.
Abstract: Semantic Backpropagation (SB) was introduced in GP so as to take into account the semantics of a GP tree at all intermediate states of the program execution, i.e., at each node of the tree. The idea is to compute the optimal "should-be" values each subtree should return, whilst assuming that the rest of the tree is unchanged, so as to minimize the fitness of the tree. To this end, the Random Desired Output (RDO) mutation operator, proposed in [17], uses SB in choosing, from a given library, a tree whose semantics are preferred to the semantics of a randomly selected subtree from the parent tree. Pushing this idea one step further, this paper introduces the Brando (BRANDO) operator, which selects from the parent tree the overall best subtree for applying RDO, using a small randomly drawn static library. Used within a simple Iterated Local Search framework, BRANDO can find the exact solution of many popular Boolean benchmarks in reasonable time whilst keeping solution trees small, thus paving the road for truly memetic GP algorithms.

28 citations


Proceedings ArticleDOI
04 Jan 2015
TL;DR: This work presents a new lock-free algorithm for concurrent manipulation of a binary search tree in an asynchronous shared memory system that supports search, insert and delete operations and uses internal representation of a search tree and is based on marking edges instead of nodes.
Abstract: We present a new lock-free algorithm for concurrent manipulation of a binary search tree in an asynchronous shared memory system that supports search, insert and delete operations. It combines ideas from two recently proposed lock-free algorithms: one of them provides good performance for a read-dominated workload and the other one for a write-dominated workload. Specifically, it uses internal representation of a search tree (as in the first one) and is based on marking edges instead of nodes (as in the second one). Our experiments indicate that our new lock-free algorithm outperforms other lock-free algorithms in most cases providing up to 35% improvement in some cases over the next best algorithm.

26 citations


Patent
Samuli Laine1, Timo Aila1, Tero Karras1
05 Jan 2015
TL;DR: In this article, a tree traversal algorithm for a tree data structure is described, and a system, method, and computer program product for implementing a Tree traversal operation for tree data structures is described.
Abstract: A system, method, and computer program product for implementing a tree traversal operation for a tree data structure is disclosed. The method includes the steps of receiving at least a portion of a tree data structure that represents a tree having a plurality of nodes and processing, via a tree traversal operation algorithm executed by a processor, one or more nodes of the tree data structure by intersecting the one or more nodes of the tree data structure with a query data structure. A first node of the tree data structure is associated with a first local coordinate system and a second node of the tree data structure is associated with a second local coordinate system, the first node being an ancestor of the second node, and the first local coordinate system and the second local coordinate system are both specified relative to a global coordinate system.

22 citations


Patent
21 Aug 2015
TL;DR: In this article, the authors describe a method for key compression and cached-locking in a tree data structure, which can be used to reduce duplicated storage of shared portions of the keys.
Abstract: System, method, and computer program product key compression and cached-locking are described. A computer system can store database files or operating system files in a tree data structure. The system can store data or metadata as key-value pairs in nodes of the tree data structure. The keys in the key-value pairs can have a hierarchical structure, which may or may not correspond to the tree data structure. The system can compress the keys by reducing duplicated storage of shared portions of the keys. The system can use an index in a tree node to represent the hierarchical structure of the key-value pairs stored in that tree node. To access a value in a key-value pair, the system can identify the tree node to search, query the index in that tree node to locate the value, and then access the value at the indexed location.

21 citations


Book ChapterDOI
27 May 2015
TL;DR: This paper proposes several algorithms to compute efficiently some attribute information, including incremental computation of information on region, contour, and context, and depicts computation of extinction-based saliency map using tree-based image representations.
Abstract: Tree-based image representations are popular tools for many applications in mathematical morphology and image processing. Classically, one computes an attribute on each node of a tree and decides whether to preserve or remove some nodes upon the attribute function. This attribute function plays a key role for the good performance of tree-based applications. In this paper, we propose several algorithms to compute efficiently some attribute information. The first one is incremental computation of information on region, contour, and context. Then we show how to compute efficiently extremal information along the contour (e.g., minimal gradient’s magnitude along the contour). Lastly, we depict computation of extinction-based saliency map using tree-based image representations. The computation complexity and the memory cost of these algorithms are analyzed. To the best of our knowledge, except information on region, none of the other algorithms is presented explicitly in any state-of-the-art paper.

20 citations


Journal ArticleDOI
TL;DR: An efficient algorithm for injecting positional information into a tree kernel is described and ways to enlarge its feature space without affecting its worst case complexity are presented.
Abstract: Tree kernels proposed in the literature rarely use information about the relative location of the substructures within a tree. As this type of information is orthogonal to the one commonly exploited by tree kernels, the two can be combined to enhance state-of-the-art accuracy of tree kernels. In this brief, our attention is focused on subtree kernels. We describe an efficient algorithm for injecting positional information into a tree kernel and present ways to enlarge its feature space without affecting its worst case complexity. The experimental results on several benchmark datasets are presented showing that our method is able to reach state-of-the-art performances, obtaining in some cases better performance than computationally more demanding tree kernels.

20 citations


Proceedings ArticleDOI
12 Jul 2015
TL;DR: This work derives accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour.
Abstract: Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. Our confidence intervals depend in a more detailed way on the tree parameters. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by other techniques.

Journal ArticleDOI
TL;DR: Results indicate that in almost all cases, the Successive Scheme outperforms the commonly used binary query tree protocols in terms of system efficiency, message complexity, time, and time system efficiency.

Journal ArticleDOI
TL;DR: It is proved that, unless P = NP, no polynomial-time algorithm can approximate the problem with a factor strictly greater than 2/3, and this proposed algorithm is the first non-trivial exact algorithm to find an optimal spanning tree.
Abstract: In wireless sensor networks, maximizing the lifetime of a data gathering tree without aggregation has been proved to be NP-complete. In this paper, we prove that, unless P = NP, no polynomial-time algorithm can approximate the problem with a factor strictly greater than 2/3. The result even holds in the special case where all sensors have the same initial energy. Existing works for the problem focus on approximation algorithms, but these algorithms only find sub-optimal spanning trees and none of them can guarantee to find an optimal tree. We propose the first non-trivial exact algorithm to find an optimal spanning tree. Due to the NP-hardness nature of the problem, this proposed algorithm runs in exponential time in the worst case, but the consumed time is much less than enumerating all spanning trees. This is done by several techniques for speeding up the search. Featured techniques include how to grow the initial spanning tree and how to divide the problem into subproblems. The algorithm can handle small networks and be used as a benchmark for evaluating approximation algorithms.

Journal ArticleDOI
TL;DR: A new algorithm based on binary tree technique and some reduction operations is presented, which effectively reduces the number of prototypes while maintaining the same level of classification accuracy as the traditional KNN algorithm and other prototype algorithms.

Posted Content
TL;DR: Numerical experiments on tree networks, the ER random graphs and real world networks with different evaluation metrics show that the new source localization algorithm, called the Short-Fat Tree (SFT), outperforms existing algorithms.
Abstract: Information diffusion in networks can be used to model many real-world phenomena, including rumor spreading on online social networks, epidemics in human beings, and malware on the Internet. Informally speaking, the source localization problem is to identify a node in the network that provides the best explanation of the observed diffusion. Despite significant efforts and successes over last few years, theoretical guarantees of source localization algorithms were established only for tree networks due to the complexity of the problem. This paper presents a new source localization algorithm, called the Short-Fat Tree (SFT) algorithm. Loosely speaking, the algorithm selects the node such that the breadth-first search (BFS) tree from the node has the minimum depth but the maximum number of leaf nodes. Performance guarantees of SFT under the independent cascade (IC) model are established for both tree networks and the Erdos-Renyi (ER) random graph. On tree networks, SFT is the maximum a posterior (MAP) estimator. On the ER random graph, the following fundamental limits have been obtained: $(i)$ when the infection duration $ t_u,$ the probability of identifying the source approaches zero asymptotically under any algorithm; and $(iii)$ when infection duration $

Journal ArticleDOI
TL;DR: The two-level diameter constrained spanning tree problem (2-DMSTP), which generalizes the classical DMSTP by considering two sets of nodes with different latency requirements, is introduced and a novel modeling approach based on a three-dimensional layered graph is proposed.
Abstract: In this article, we introduce the two-level diameter constrained spanning tree problem (2-DMSTP), which generalizes the classical DMSTP by considering two sets of nodes with different latency requirements. We first observe that any feasible solution to the 2-DMSTP can be viewed as a DMST that contains a diameter constrained Steiner tree. This observation allows us to prove graph theoretical properties related to the centers of each tree which are then exploited to develop mixed integer programming formulations, valid inequalities, and symmetry breaking constraints. In particular, we propose a novel modeling approach based on a three-dimensional layered graph. In an extensive computational study we show that a branch-and-cut algorithm based on the latter model is highly effective in practice.

Book ChapterDOI
TL;DR: Two efficient linear-space solutions for both the weighted and the unweighted case, running in Om2 logαm,n and Omn logn time, respectively are provided, which improve on the time complexity of previous results provided for other related settings of the problem.
Abstract: Given a 2-edge connected, positively real-weighted graph G with n vertices and m edges, a tree i¾ź-spanner of G is a spanning tree T in which for every pair of vertices, the ratio of their distance in T over that in G is bounded by i¾ź, the so-called stretch factor of T. Tree spanners with provably good stretch factors find applications in communication networks, distributed systems, and network design, but unfortunately ---as any tree-based infrastructure--- they are highly sensitive to even a single link failure, since this results in a network disconnection. Thus, when such an event occurs, the overall effort that has to be afforded to rebuild an effective tree spanner i.e., computational costs, set-up of new links, updating of the routing tables, etc. can be prohibitive. However, if the edge failure is only transient, these costs can simply be avoided, by promptly reestablishing the connectivity through a careful selection of a temporary swap edge, i.e., an edge in G reconnecting the two subtrees of T induced by the edge failure. According to the tree spanner's nature, a best swap edge for a failing edge e is then a swap edge generating a reconnected tree of minimum stretch factor w.r.t. distances in the graph G deprived of edge e. For this problem we provide two efficient linear-space solutions for both the weighted and the unweighted case, running in Om2 logαm,n and Omn logn time, respectively. As discussed in the paper, our algorithms also improve on the time complexity of previous results provided for other related settings of the problem.

Patent
Samuli Laine1, Timo Aila1, Tero Karras1
05 Jan 2015
TL;DR: In this article, a tree traversal operation for a tree data structure divided into compression blocks is described, and a system, method, and computer program product for implementing a tree traverse operation for tree data structures is described.
Abstract: A system, method, and computer program product for implementing a tree traversal operation for a tree data structure divided into compression blocks is disclosed. The method includes the steps of receiving at least a portion of a tree data structure that represents a tree having a plurality of nodes, pushing a root node of the tree data structure onto a traversal stack data structure associated with an outer loop of a tree traversal operation algorithm, and, for each iteration of an outer loop of a tree traversal operation algorithm, popping a top element from the traversal stack data structure and processing, via an inner loop of the tree traversal operation algorithm, the compression block data structure that corresponds with the top element. The tree data structure may be encoded as a plurality of compression block data structures that each include data associated with a subset of nodes of the tree.

Proceedings ArticleDOI
24 Jan 2015
TL;DR: This work presents a new lock-based algorithm for concurrent manipulation of a binary search tree in an asynchronous shared memory system that supports search, insert and delete operations and it operates at edge-level rather than at node-level (locks nodes); this minimizes the contention window of a write operation and improves the system throughput.
Abstract: We present a new lock-based algorithm for concurrent manipulation of a binary search tree in an asynchronous shared memory system that supports search, insert and delete operations Some of the desirable characteristics of our algorithm are: (i) a search operation uses only read and write instructions, (ii) an insert operation does not acquire any locks, and (iii) a delete operation only needs to lock up to four edges in the absence of contention Our algorithm is based on an internal representation of a search tree and it operates at edge-level (locks edges) rather than at node-level (locks nodes); this minimizes the contention window of a write operation and improves the system throughput Our experiments indicate that our lock-based algorithm outperforms existing algorithms for a concurrent binary search tree for medium-sized and larger trees, achieving up to 59% higher throughput than the next best algorithm

Proceedings ArticleDOI
07 Sep 2015
TL;DR: This work proposes a method to optimize a tree based both on color distributions and shape priors that consists in pruning and regrafting tree branches in order to minimize the energy of the best segmentation that can be extracted from the tree.
Abstract: A partition tree is a hierarchical representation of an image. Once constructed, it can be repeatedly processed to extract information. Multi-object multi-class image segmentation with shape priors is one of the tasks that can be efficiently done upon an available tree. The traditional construction approach is a greedy clustering based on color similarities. However, not considering higher level cues during the construction phase leads to trees that might not accurately represent the underlying objects in the scene, inducing mistakes in the later segmentation. We propose a method to optimize a tree based both on color distributions and shape priors. It consists in pruning and regrafting tree branches in order to minimize the energy of the best segmentation that can be extracted from the tree. Theoretical guarantees help reducing the search space and make the optimization efficient. Our experiments show that we succeed in incorporating shape information to restructure a tree, which in turn enables to extract from it good quality multi-object segmentations with shape priors.

Proceedings ArticleDOI
08 Jun 2015
TL;DR: This work proposes FT-Index, a secondary indexing scheme for cloud system with switch-centric topology, and adopts the Interval tree to reorganize the global index and proposes two versions of FT- index with different publishing methods to lower the rate of false positives and reduce the cost of forwarding queries.
Abstract: Nowadays, cloud storage systems may contain tens of thousands of servers and large scale data sets, which significantly require efficient data management scheme and query processing mechanism. To fulfill these requirements in modern data centers, the infrastructure of cloud systems, we propose FT-Index, a secondary indexing scheme for cloud system with switch-centric topology. FT-Index has a two-layer design. The upper-layer index, called global index, is distributed across different hosts in the system, while the lower-layer index, named local index, is a B+-tree for local query. We further adopt the Interval tree to reorganize the global index and propose two versions of FT-Index with different publishing methods to lower the rate of false positives and reduce the cost of forwarding queries. We provide detailed theoretical analysis on the upper bound of false positives, physical hops per query, and the relationship between them. We also conduct abundant experiments to validate the efficiency of FT-Index.

Posted Content
TL;DR: In this article, it was shown that the tree alternative property conjecture of Bonato and Tardif holds for scattered trees and a conjecture of Tyomkin holds for locally finite scattered trees.
Abstract: A tree is scattered if no subdivision of the complete binary tree is a subtree. Building on results of Halin, Polat and Sabidussi, we identify four types of subtrees of a scattered tree and a function of the tree into the integers at least one of which is preserved by every embedding. With this result and a result of Tyomkyn, we prove that the tree alternative property conjecture of Bonato and Tardif holds for scattered trees and a conjecture of Tyomkin holds for locally finite scattered trees.

Journal ArticleDOI
TL;DR: Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data and this paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way.
Abstract: Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that haunted high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters.

Journal ArticleDOI
TL;DR: Experiments demonstrate that the proposed algorithm CFMIS can mine more frequent maximal induced subtrees in less time, and the traversal speed is faster than that of other algorithms.
Abstract: Save the information of the original tree in the compression tree by CTSFor each round of iterative, compression can reduce the size of the datasetOptimize maximal stageThe proposed algorithm CFMIS mines more frequent subtrees in less time Most complex data structures can be represented by a tree or graph structure, but tree structure mining is easier than graph structure mining With the extensive application of semi-structured data, frequent tree pattern mining has become a hot topic This paper proposes a compression tree sequence (CTS) to construct a compression tree model; and save the information of the original tree in the compression tree As any subsequence of the CTS corresponds to a subtree of the original tree, it is efficient for mining subtrees Furthermore, this paper proposes a frequent maximal induced subtrees mining method based on the compression tree sequence, CFMIS (compressed frequent maximal induced subtrees) The algorithm is primarily performed via four stages: firstly, the original data set is constructed as a compression tree model; then, a cut-edge reprocess is run for the edges in which the edge frequent is less than the threshold; next, the tree is compressed after the cut-edge based on the different frequent edge degrees; and, last, frequent subtree sets maximal processing is run such that, we can obtain the frequent maximal induced subtree set of the original data set For each iteration, compression can reduce the size of the data set, thus, the traversal speed is faster than that of other algorithms Experiments demonstrate that our algorithm can mine more frequent maximal induced subtrees in less time

Patent
Jun Zeng1, Pu Huang, Sebastia Cortes, Scott A. White, Gary J. Dispoto 
30 Jan 2015
TL;DR: In this article, a technique for generating slice data from the tree data structure representation of a 3D object can also include merging the shape specification and the material specification to create a tree representation of the 3-D object.
Abstract: An example technique for generating slice data from the tree data structure representation of a three dimensional (3-D) object can include obtaining a shape specification of the 3-D object and obtaining a material specification of the 3-D object. The example technique for generating slice data from the tree data structure representation of a 3-D object can also include merging the shape specification and the material specification to create a tree data structure representation of the 3-D object. The example technique for generating slice data from the tree data structure representation of a 3-D object can also include generating slice data from the tree data structure.

Journal ArticleDOI
TL;DR: A new similarity measure is proposed which extends the concept of unordered tree inclusion by taking the costs of insertion and substitution operations on the pattern tree into account, and an algorithm for computing it is presented.
Abstract: This paper considers the problem of identifying all locations of subtrees in a large tree or in a large collection of trees that are similar to a specified pattern tree, where all trees are assumed to be rooted and node-labeled. The tree edit distance is a widely-used measure of tree (dis-)similarity, but is NP-hard to compute for unordered trees. To cope with this issue, we propose a new similarity measure which extends the concept of unordered tree inclusion by taking the costs of insertion and substitution operations on the pattern tree into account, and present an algorithm for computing it. Our algorithm has the same time complexity as the original one for unordered tree inclusion, i.e., it runs in $O(|T_1||T_2|)$ time, where $T_1$ and $T_2$ denote the pattern tree and the text tree, respectively, when the maximum outdegree of $T_1$ is bounded by a constant. Our experimental evaluation using synthetic and real datasets confirms that the proposed algorithm is fast and scalable and very useful for bibliographic matching, which is a typical entity resolution problem for tree-structured data. Furthermore, we extend our algorithm to also allow a constant number of deletion operations on $T_1$ while still running in $O(|T_1||T_2|)$ time.

Patent
27 Jan 2015
TL;DR: In this article, a data structure consisting of a clump header table and an inline tree data structure is proposed for spatial searching and filtering of data records that include spatial coordinates as data fields and a dedicated, specifically adapted search and filter program is employed to list or enumerate retrieved data records.
Abstract: A data structure comprises a clump header table and an inline tree data structure. The inline tree, representing filterable data fields of hierarchically organized data records, comprises an alternating sequence of first-level binary string segments, each followed by one or more corresponding second-level binary string segments. Each clump header record includes an indicator of a location in the inline tree of corresponding binary string segments. A dedicated, specifically adapted conversion program generates the clump header file and the inline tree for storage on any computer-readable medium, and the inline tree can be read entirely into RAM to be searched or filtered. A dedicated, specifically adapted search and filter program is employed to list or enumerate retrieved data records. Run-time computer code generation can reduce time required for searching and filtering. One example includes spatial searching and filtering of data records that include spatial coordinates as data fields.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: This work proposes a dynamic programming (DP) algorithm for tree trimming problems whose running time is O(NLlogN), where N is the number of tree nodes andL is the length limit, and exploits the zero-suppressed binary decision diagram (ZDD) to represent the set of subtrees in a compact form.
Abstract: Tree trimming is the problem of extracting an optimal subtree from an input tree, and sentence extraction and sentence compression methods can be formulated and solved as tree trimming problems. Previous approaches require integer linear programming (ILP) solvers to obtain exact solutions. The problem of this approach is that ILP solvers are black-boxes and have no theoretical guarantee as to their computation complexity. We propose a dynamic programming (DP) algorithm for tree trimming problems whose running time is O(NLlogN), where N is the number of tree nodes andL is the length limit. Our algorithm exploits the zero-suppressed binary decision diagram (ZDD), a data structure that represents a family of sets as a directed acyclic graph, to represent the set of subtrees in a compact form; the structure of ZDD permits the application of DP to obtain exact solutions, and our algorithm is applicable to different tree trimming problems. Moreover, experiments show that our algorithm is faster than state-of-the-art ILP solvers, and that it scales well to handle large summarization problems.

Book ChapterDOI
TL;DR: In this article, a solution merging heuristic is proposed to solve the Steiner tree problem on graphs of much larger treewidth than would be tractable to solve exactly, by first reducing the input graph so that a small width tree decomposition can be found, and then solving the instance induced by this subgraph.
Abstract: Fixed parameter tractable algorithms for bounded treewidth are known to exist for a wide class of graph optimization problems. While most research in this area has been focused on exact algorithms, it is hard to find decompositions of treewidth sufficiently small to make these al- gorithms fast enough for practical use. Consequently, tree decomposition based algorithms have limited applicability to large scale optimization. However, by first reducing the input graph so that a small width tree decomposition can be found, we can harness the power of tree decomposi- tion based techniques in a heuristic algorithm, usable on graphs of much larger treewidth than would be tractable to solve exactly. We propose a solution merging heuristic to the Steiner Tree Problem that applies this idea. Standard local search heuristics provide a natural way to generate subgraphs with lower treewidth than the original instance, and subse- quently we extract an improved solution by solving the instance induced by this subgraph. As such the fixed parameter tractable algorithm be- comes an efficient tool for our solution merging heuristic. For a large class of sparse benchmark instances the algorithm is able to find small width tree decompositions on the union of generated solutions. Subsequently it can often improve on the generated solutions fast.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: The proposed algorithm relies on a tree data structure that is constructed based on the scores of Web services, which requires the successive use of both scores, leading to two different versions of the tree.
Abstract: The aim of this paper is to propose a new algorithm for Web services ranking. The proposed algorithm relies on a tree data structure that is constructed based on the scores of Web services. Two types of scores are considered, which are computed by respectively selecting the edge with the minimum or the edge with the maximum weight in the matching graph. The construction of the tree requires the successive use of both scores, leading to two different versions of the tree. The final ranking is obtained by applying a pre-order traversal on the tree and picks out all leaf nodes ordered from the left to the right. The performance evaluation shows that the proposed algorithm is most often better than similar ones.