scispace - formally typeset
Search or ask a question

Showing papers by "Nils M. Kriege published in 2014"


Proceedings ArticleDOI
14 Dec 2014
TL;DR: Efficiency and flexibility of implicit kernel functions and dot products of explicitly computed feature maps for widely used graph kernels such as random walk kernels, sub graph matching kernels, and shortest-path kernels are analyzed.
Abstract: As many real-world data can elegantly be represented as graphs, various graph kernels and methods for computing them have been proposed. Surprisingly, many of the recent graph kernels do not employ the kernel trick anymore but rather compute an explicit feature map and report higher efficiency. So, is there really no benefit of the kernel trick when it comes to graphs? Triggered by this question, we investigate under which conditions it is possible to compute a graph kernel explicitly and for which graph properties this computation is actually more efficient. We give a sufficient condition for R-convolution kernels that enables kernel computation by explicit mapping. We theoretically and experimentally analyze efficiency and flexibility of implicit kernel functions and dot products of explicitly computed feature maps for widely used graph kernels such as random walk kernels, sub graph matching kernels, and shortest-path kernels. For walk kernels we observe a phase transition when comparing runtime with respect to label diversity and walk lengths leading to the conclusion that explicit computations are only favourable for smaller label sets and walk lengths whereas implicit computation is superior for longer walk lengths and data sets with larger label diversity.

29 citations


Journal ArticleDOI
TL;DR: This work presents a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best-case runtime of O(nlogn) for the input size n and yields high-quality clusterings.
Abstract: Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best-case runtime of O(nlogn) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. This aspect is demonstrated in our extensive experimental evaluation, where we apply our algorithm to large graph databases in combination with computationally demanding graph distance metrics. We compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and runtime on real-world and synthetic instances including vector and graph data. The evaluations show a subquadratic runtime in practice and a very low memory footprint. Our approach yields high-quality clusterings and is able to rediscover planted cluster structures in synthetic data sets.

11 citations


Book ChapterDOI
13 Feb 2014
TL;DR: This work presents a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of \(\mathcal{O}(n\log n)\) for the input size n.
Abstract: Sequential agglomerative hierarchical non-overlapping (SAHN) clustering techniques belong to the classical clustering methods that are applied heavily in many application domains, e.g., in cheminformatics. Asymptotically optimal SAHN clustering algorithms are known for arbitrary dissimilarity measures, but their quadratic time and space complexity even in the best case still limits the applicability to small data sets. We present a new pivot based heuristic SAHN clustering algorithm exploiting the properties of metric distance measures in order to obtain a best case running time of \(\mathcal{O}(n\log n)\) for the input size n. Our approach requires only linear space and supports median and centroid linkage. It is especially suitable for expensive distance measures, as it needs only a linear number of exact distance computations. In extensive experimental evaluations on real-world and synthetic data sets, we compare our approach to exact state-of-the-art SAHN algorithms in terms of quality and running time. The evaluations show a subquadratic running time in practice and a very low memory footprint.

8 citations


Book ChapterDOI
25 Aug 2014
TL;DR: This work discusses key obstacles of tree decompositions arising for common subgraph problems that were ignored by previous algorithms and do not occur in outerplanar graphs and introduces the concept of potential separators, i.e., separators of a subgraph to be searched that not necessarily are separator of the input graph.
Abstract: The complexity of the maximum common subgraph problem in partial k-trees is still largely unknown. We consider the restricted case, where the input graphs are k-connected partial k-trees and the common subgraph is required to be k-connected. For biconnected outerplanar graphs this problem is solved and the general problem was reported to be tractable by means of tree decomposition techniques. We discuss key obstacles of tree decompositions arising for common subgraph problems that were ignored by previous algorithms and do not occur in outerplanar graphs. We introduce the concept of potential separators, i.e., separators of a subgraph to be searched that not necessarily are separators of the input graph. We characterize these separators and propose a polynomial time solution for series-parallel graphs based on SP-trees.

7 citations


Book ChapterDOI
15 Oct 2014
TL;DR: In this paper, a polynomial time algorithm for the maximum common connected subgraph problem in series-parallel graphs is presented, which utilizes a combination of BC-and SP-tree data structures to decompose both graphs.
Abstract: The complexity of the maximum common connected subgraph problem in partial k-trees is still not fully understood. Polynomial-time solutions are known for degree-bounded outerplanar graphs, a subclass of the partial 2-trees. On the contrary, the problem is known to be NP-hard in vertex-labeled partial 11-trees of bounded degree. We consider series-parallel graphs, i.e., partial 2-trees. We show that the problem remains NP-hard in biconnected series-parallel graphs with all but one vertex of degree bounded by three. A positive complexity result is presented for a related problem of high practical relevance which asks for a maximum common connected subgraph that preserves blocks and bridges of the input graphs. We present a polynomial time algorithm for this problem in series-parallel graphs, which utilizes a combination of BC- and SP-tree data structures to decompose both graphs.

6 citations


Book ChapterDOI
15 Dec 2014
TL;DR: This work presents the first polynomial-delay algorithm for the problem of enumerating all maximum common subtree isomorphisms between a given pair of trees, based on the algorithm of Edmonds for solving the maximum common subgraph problem using a dynamic programming approach in combination with bipartite matching problems.
Abstract: The maximum common subgraph problem asks for the maximum size of a common subgraph of two given graphs. The problem is \(\mathsf{NP}\)-hard, but can be solved in polynomial time if both, the input graphs and the common subgraph are restricted to trees. Since the optimal solution of the maximum common subtree problem is not unique, the problem of enumerating all solutions, i.e., the isomorphisms between the two subtrees, is of interest. We present the first polynomial-delay algorithm for the problem of enumerating all maximum common subtree isomorphisms between a given pair of trees. Our approach is based on the algorithm of Edmonds for solving the maximum common subtree problem using a dynamic programming approach in combination with bipartite matching problems. As a side result, we obtain a polynomial-delay algorithm for enumerating all maximum weight matchings in a complete bipartite graph. We show how to extend the new approach in order to enumerate all solutions of the maximum weighted common subtree problem and to the maximal common subtree problem. Our experimental evaluation on both, randomly generated as well as real-world instances, demonstrates the practical usefulness of our algorithm.

6 citations


Journal ArticleDOI
TL;DR: A new version of Scaffold Hunter is presented, a highly interactive tool that fosters the systematic visual exploration of compound and bioactivity data and features the scaffold tree algorithm to provide hierarchical classification schemes and offers several interconnected views reflecting different aspects of the data.
Abstract: The growing interest in chemogenomics approaches over the last years has led to a vast amount of data regarding chemical and the corresponding biological activity space. The discovery of new chemical entities is not suitable to a fully automated analysis, but can greatly benefit from tools that allow exploring this chemical and biological space. We present a new version of Scaffold Hunter [1,2], a highly interactive tool that fosters the systematic visual exploration of compound and bioactivity data. The software supports the integration of data from various sources and provides several complementary analysis and visualization modules (Figure ​(Figure11). Figure 1 Iterative Workflow. Scaffold Hunter features the scaffold tree algorithm to provide hierarchical classification schemes and offers several interconnected views reflecting different aspects of the data. As a further extension state of the art clustering techniques are now included that allow, for example, to create subsets based on fingerprint similarity. A key concept of Scaffold Hunter is to support a cyclic, iterative knowledge discovery process, where it is possible to refine subsets, adjust the parameters of analysis algorithms or the mapping of property values to visual attributes. We give an overview over the various views, the workflow concept and present an exemplary analysis of screening datasets targeting T. cruzi and T. brucei, the causative agent of sleeping sickness and Chagas disease, respectively. Scaffold Hunter is platform-independent and freely available under the terms of the GNU GPL v3 at http://scaffoldhunter.sourceforge.net/.