scispace - formally typeset
Search or ask a question
Author

Nicolas Klodt

Other affiliations: University of Potsdam
Bio: Nicolas Klodt is an academic researcher from Hasso Plattner Institute. The author has contributed to research in topics: Heuristics & Decidability. The author has co-authored 3 publications. Previous affiliations of Nicolas Klodt include University of Potsdam.

Papers
More filters
Proceedings ArticleDOI
14 Aug 2021
TL;DR: In this article, a color-sensitive, practical heuristic called Greedy Expansion was proposed, which empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances.
Abstract: Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.

4 citations

TL;DR: In this paper , the authors studied the SIS and SIRS models on stars and cliques, respectively, and showed that the survival time of the two processes behaves fundamentally dierent on stars, while it behaves fairly similar on cliques.
Abstract: We study two continuous-time Markov chains modeling the spread of infections on graphs, namely the SIS and the SIRS model. In the SIS model, vertices are either susceptible or infected; each infected vertex becomes susceptible at rate 1 and infects each of its neighbors independently at rate 𝜆 . In the SIRS model, vertices are either susceptible, infected, or recovered; each infected vertex becomes recovered at rate 1 and infects each of its susceptible neighbors independently at rate 𝜆 ; each recovered vertex becomes susceptible at a rate 𝜚 , which we assume to be independent of the size of the graph. The survival time of the SIS process, i.e., the time until no vertex of the host graph is infected, is fairly well understood for a variety of graph classes. Stars are an important graph class for the SIS model, as the survival time of SIS on stars has been used to show that the process survives on real-world graphs for a long time. For the SIRS model, however, to the best of our knowledge, there are no rigorous results, even for simple graphs such as stars. We analyze the survival time of the SIS and the SIRS process on stars and cliques. We determine three threshold values for 𝜆 such that when 𝜆 < 𝜆 ℓ , the expected survival time of the process is at most logarithmic, when 𝜆 < 𝜆 𝑝 , it is at most polynomial, and when 𝜆 > 𝜆 𝑠 , it is at least super-polynomial in the number of vertices. Our results show that the survival time of the two processes behaves fundamentally dierent on stars, while it behaves fairly similar on cliques. Our analyses bound the drift of potential functions with globally stable equilibrium points. On the SIRS process, our two-state potential functions are inspired by Lyapunov functions used in mean-eld theory.

2 citations

Posted Content
TL;DR: This paper establishes a hierarchy of learning power depending on whether $C$-indices are required on all outputs; (a) only on outputs relevant for the class to be learned and (c) only in the limit as final, correct hypotheses.
Abstract: In language learning in the limit, the most common type of hypothesis is to give an enumerator for a language. This so-called $W$-index allows for naming arbitrary computably enumerable languages, with the drawback that even the membership problem is undecidable. In this paper we use a different system which allows for naming arbitrary decidable languages, namely programs for characteristic functions (called $C$-indices). These indices have the drawback that it is now not decidable whether a given hypothesis is even a legal $C$-index. In this first analysis of learning with $C$-indices, we give a structured account of the learning power of various restrictions employing $C$-indices, also when compared with $W$-indices. We establish a hierarchy of learning power depending on whether $C$-indices are required (a) on all outputs; (b) only on outputs relevant for the class to be learned and (c) only in the limit as final, correct hypotheses. Furthermore, all these settings are weaker than learning with $W$-indices (even when restricted to classes of computable languages). We analyze all these questions also in relation to the mode of data presentation. Finally, we also ask about the relation of semantic versus syntactic convergence and derive the map of pairwise relations for these two kinds of convergence coupled with various forms of data presentation.
Posted Content
TL;DR: Several maps (depictions of all pairwise relations) of various groups of learning criteria are provided, including a map for monotonicity restrictions and similar criteria and amap for restrictions on data presentation, to consider, for various learning criteria, whether learners can be assumed consistent.
Abstract: We study learning of indexed families from positive data where a learner can freely choose a hypothesis space (with uniformly decidable membership) comprising at least the languages to be learned. This abstracts a very universal learning task which can be found in many areas, for example learning of (subsets of) regular languages or learning of natural languages. We are interested in various restrictions on learning, such as consistency, conservativeness or set-drivenness, exemplifying various natural learning restrictions. Building on previous results from the literature, we provide several maps (depictions of all pairwise relations) of various groups of learning criteria, including a map for monotonicity restrictions and similar criteria and a map for restrictions on data presentation. Furthermore, we consider, for various learning criteria, whether learners can be assumed consistent.
Journal ArticleDOI
TL;DR: In this article , the authors focus on a simplified setting where a complete temporal host graph is given and the agents, corresponding to its nodes, selfishly create incident edges to ensure that they can reach all other nodes via temporal paths in the created network.
Abstract: Most networks are not static objects, but instead they change over time. This observation has sparked rigorous research on temporal graphs within the last years. In temporal graphs, we have a fixed set of nodes and the connections between them are only available at certain time steps. This gives rise to a plethora of algorithmic problems on such graphs, most prominently the problem of finding temporal spanners, i.e., the computation of subgraphs that guarantee all pairs reachability via temporal paths. To the best of our knowledge, only centralized approaches for the solution of this problem are known. However, many real-world networks are not shaped by a central designer but instead they emerge and evolve by the interaction of many strategic agents. This observation is the driving force of the recent intensive research on game-theoretic network formation models. In this work we bring together these two recent research directions: temporal graphs and game-theoretic network formation. As a first step into this new realm, we focus on a simplified setting where a complete temporal host graph is given and the agents, corresponding to its nodes, selfishly create incident edges to ensure that they can reach all other nodes via temporal paths in the created network. This yields temporal spanners as equilibria of our game. We prove results on the convergence to and the existence of equilibrium networks, on the complexity of finding best agent strategies, and on the quality of the equilibria. By taking these first important steps, we uncover challenging open problems that call for an in-depth exploration of the creation of temporal graphs by strategic agents.

Cited by
More filters
Journal ArticleDOI
TL;DR: A theorem is provided, applicable to many optimization algorithms, that links the run time of M AJORITY with its symmetric version H AS M AJorITY , where a sufficient majority is needed to optimize the subset.
Abstract: —Run time analysis of evolutionary algorithms re- cently makes significant progress in linking algorithm performance to algorithm parameters. However, settings that study the impact of problem parameters are rare. The recently proposed W-model provides a good framework for such analyses, generating pseudo-Boolean optimization problems with tunable properties. We initiate theoretical research of the W-model by studying how one of its properties—neutrality—influences the run time of random local search. Neutrality creates plateaus in the search space by first performing a majority vote for subsets of the solution candidate and then evaluating the smaller-dimensional string via a low-level fitness function. We prove upper bounds for the expected run time of random local search on this M AJORITY problem for its entire parameter spectrum. To this end, we provide a theorem, applicable to many optimization algorithms, that links the run time of M AJORITY with its symmetric version H AS M AJORITY , where a sufficient majority is needed to optimize the subset. We also introduce a generalized version of classic drift theorems as well as a generalized version of Wald’s equation, both of which we believe to be of independent interest.

3 citations

Journal ArticleDOI
TL;DR: The approximability of a recently introduced framework for clustering edge-colored hypergraphs is studied, and it is proved that the canonical relaxation is always at least as tight as the node-weighted multiway cut relaxation, and can be strictly tighter.
Abstract: We study the approximability of a recently introduced framework for clustering edge-colored hypergraphs, where goal is to color nodes in a way that minimizes the number of hyperedges containing a node with a different color than the hyperedge. This problem is closely related to chromatic correlation clustering and various generalized multiway cut problems. We first of all provide a min { 2 − 2 /k, 2 − 2 / ( r + 1) } -approximation by rounding a natural linear programming relaxation, where r is the maximum hyperedge size and k is the number of colors. This improves on the best previous rounding scheme that achieves an approximation of min { 2 − 1 /k, 2 − 1 / ( r + 1) } . We show our rounding scheme is optimal by proving a matching integrality gap. When r is large, our approximation matches a known 2(1 − 1 /k )-approximation based on reducing to node-weighted multiway cut and rounding a different linear program. The exact relationship between the two linear programs was not previously known; we prove that the canonical relaxation is always at least as tight as the node-weighted multiway cut relaxation, and can be strictly tighter. We also show that when r and k are arbitrary, the edge-colored clustering objective is approximation equivalent to vertex cover. This immediately implies several refined hardness results, as well as fast combinatorial 2-approximation algorithms.

2 citations

12 Aug 2022
TL;DR: In this article , the authors study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications.
Abstract: We study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications where the goal is to cluster a set of objects based on multiway interactions of different categories or types. We present improved approximation guarantees based on linear programming, and show they are tight by proving a matching integrality gap. Our results also include new approximation hardness results, a combinatorial 2-approximation whose runtime is linear in the hypergraph size, and several new connections to well-studied objectives such as vertex cover and hypergraph multiway cut.

1 citations

Journal ArticleDOI
TL;DR: In this paper , the Edge-Colored Clustering (ECC) algorithm is generalized to handle overlapping cluster assignments or node deletions. But the model does not allow for a budgeted number of overlapping cluster assignment or node deletion.
Abstract: A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results.
07 Jul 2023
TL;DR: In this article , the authors provide practical run time improvements for correlation clustering solvers when the number of input partitions is large, by reducing the time complexity of Pivot from O(|V|^2 k)$ to O( |V| k) and reducing the space complexity.
Abstract: Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either $k$ or $V$ gets large. In this paper we provide practical run time improvements for correlation clustering solvers when $V$ is large. We reduce the time complexity of Pivot from $O(|V|^2 k)$ to $O(|V| k)$, and its space complexity from $O(|V|^2)$ to $O(|V| k)$ -- a significant savings since in practice $k$ is much less than $|V|$. We also analyze a sampling method for these algorithms when $k$ is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions.