scispace - formally typeset
Search or ask a question
Author

Lars Seifert

Other affiliations: University of Potsdam
Bio: Lars Seifert is an academic researcher from Hasso Plattner Institute. The author has contributed to research in topics: Heuristics & Decidability. The author has co-authored 3 publications. Previous affiliations of Lars Seifert include University of Potsdam.

Papers
More filters
Proceedings ArticleDOI
14 Aug 2021
TL;DR: In this article, a color-sensitive, practical heuristic called Greedy Expansion was proposed, which empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances.
Abstract: Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.

4 citations

Proceedings ArticleDOI
23 Feb 2023
TL;DR: In this article , the authors investigated the existence of equilibria and showed that even on simple topologies like paths or rings such stable states are not guaranteed to exist, and they also showed that computing a beneficial state with high integration is NP-complete and, as a novel conceptual contribution, it is also hard to decide if an equilibrium state can be found via improving response dynamics starting from a given initial state.
Abstract: Schelling games model the wide-spread phenomenon of residential segregation in metropolitan areas from a game-theoretic point of view. In these games agents of different types each strategically select a node on a given graph that models the residential area to maximize their individual utility. The latter solely depends on the types of the agents on neighboring nodes and it has been a standard assumption to consider utility functions that are monotone in the number of same-type neighbors. This simplifying assumption has recently been challenged since sociological poll results suggest that real-world agents actually favor diverse neighborhoods. We contribute to the recent endeavor of investigating residential segregation models with realistic agent behavior by studying Jump Schelling Games with agents having a single-peaked utility function. In such games, there are empty nodes in the graph and agents can strategically jump to such nodes to improve their utility. We investigate the existence of equilibria and show that they exist under specific conditions. Contrasting this, we prove that even on simple topologies like paths or rings such stable states are not guaranteed to exist. Regarding the game dynamics, we show that improving response cycles exist independently of the position of the peak in the utility function. Moreover, we show high almost tight bounds on the Price of Anarchy and the Price of Stability with respect to the recently proposed degree of integration, which counts the number of agents with a diverse neighborhood and which serves as a proxy for measuring the segregation strength. Last but not least, we show that computing a beneficial state with high integration is NP-complete and, as a novel conceptual contribution, we also show that it is NP-hard to decide if an equilibrium state can be found via improving response dynamics starting from a given initial state.

1 citations

Posted Content
TL;DR: This paper establishes a hierarchy of learning power depending on whether $C$-indices are required on all outputs; (a) only on outputs relevant for the class to be learned and (c) only in the limit as final, correct hypotheses.
Abstract: In language learning in the limit, the most common type of hypothesis is to give an enumerator for a language. This so-called $W$-index allows for naming arbitrary computably enumerable languages, with the drawback that even the membership problem is undecidable. In this paper we use a different system which allows for naming arbitrary decidable languages, namely programs for characteristic functions (called $C$-indices). These indices have the drawback that it is now not decidable whether a given hypothesis is even a legal $C$-index. In this first analysis of learning with $C$-indices, we give a structured account of the learning power of various restrictions employing $C$-indices, also when compared with $W$-indices. We establish a hierarchy of learning power depending on whether $C$-indices are required (a) on all outputs; (b) only on outputs relevant for the class to be learned and (c) only in the limit as final, correct hypotheses. Furthermore, all these settings are weaker than learning with $W$-indices (even when restricted to classes of computable languages). We analyze all these questions also in relation to the mode of data presentation. Finally, we also ask about the relation of semantic versus syntactic convergence and derive the map of pairwise relations for these two kinds of convergence coupled with various forms of data presentation.
Posted Content
TL;DR: Several maps (depictions of all pairwise relations) of various groups of learning criteria are provided, including a map for monotonicity restrictions and similar criteria and amap for restrictions on data presentation, to consider, for various learning criteria, whether learners can be assumed consistent.
Abstract: We study learning of indexed families from positive data where a learner can freely choose a hypothesis space (with uniformly decidable membership) comprising at least the languages to be learned. This abstracts a very universal learning task which can be found in many areas, for example learning of (subsets of) regular languages or learning of natural languages. We are interested in various restrictions on learning, such as consistency, conservativeness or set-drivenness, exemplifying various natural learning restrictions. Building on previous results from the literature, we provide several maps (depictions of all pairwise relations) of various groups of learning criteria, including a map for monotonicity restrictions and similar criteria and a map for restrictions on data presentation. Furthermore, we consider, for various learning criteria, whether learners can be assumed consistent.

Cited by
More filters
Journal ArticleDOI
TL;DR: The approximability of a recently introduced framework for clustering edge-colored hypergraphs is studied, and it is proved that the canonical relaxation is always at least as tight as the node-weighted multiway cut relaxation, and can be strictly tighter.
Abstract: We study the approximability of a recently introduced framework for clustering edge-colored hypergraphs, where goal is to color nodes in a way that minimizes the number of hyperedges containing a node with a different color than the hyperedge. This problem is closely related to chromatic correlation clustering and various generalized multiway cut problems. We first of all provide a min { 2 − 2 /k, 2 − 2 / ( r + 1) } -approximation by rounding a natural linear programming relaxation, where r is the maximum hyperedge size and k is the number of colors. This improves on the best previous rounding scheme that achieves an approximation of min { 2 − 1 /k, 2 − 1 / ( r + 1) } . We show our rounding scheme is optimal by proving a matching integrality gap. When r is large, our approximation matches a known 2(1 − 1 /k )-approximation based on reducing to node-weighted multiway cut and rounding a different linear program. The exact relationship between the two linear programs was not previously known; we prove that the canonical relaxation is always at least as tight as the node-weighted multiway cut relaxation, and can be strictly tighter. We also show that when r and k are arbitrary, the edge-colored clustering objective is approximation equivalent to vertex cover. This immediately implies several refined hardness results, as well as fast combinatorial 2-approximation algorithms.

2 citations

12 Aug 2022
TL;DR: In this article , the authors study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications.
Abstract: We study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications where the goal is to cluster a set of objects based on multiway interactions of different categories or types. We present improved approximation guarantees based on linear programming, and show they are tight by proving a matching integrality gap. Our results also include new approximation hardness results, a combinatorial 2-approximation whose runtime is linear in the hypergraph size, and several new connections to well-studied objectives such as vertex cover and hypergraph multiway cut.

1 citations

Journal ArticleDOI
TL;DR: In this paper , the Edge-Colored Clustering (ECC) algorithm is generalized to handle overlapping cluster assignments or node deletions. But the model does not allow for a budgeted number of overlapping cluster assignment or node deletion.
Abstract: A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results.
07 Jul 2023
TL;DR: In this article , the authors provide practical run time improvements for correlation clustering solvers when the number of input partitions is large, by reducing the time complexity of Pivot from O(|V|^2 k)$ to O( |V| k) and reducing the space complexity.
Abstract: Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either $k$ or $V$ gets large. In this paper we provide practical run time improvements for correlation clustering solvers when $V$ is large. We reduce the time complexity of Pivot from $O(|V|^2 k)$ to $O(|V| k)$, and its space complexity from $O(|V|^2)$ to $O(|V| k)$ -- a significant savings since in practice $k$ is much less than $|V|$. We also analyze a sampling method for these algorithms when $k$ is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions.
Journal ArticleDOI
TL;DR: Resource selection games with heterogeneous agents are studied in this article , where a tolerance threshold is defined for the number of same-type agents on a resource in order to maximize social welfare.
Abstract: The strategic selection of resources by selfish agents is a classic research direction, with Resource Selection Games and Congestion Games as prominent examples. In these games, agents select available resources and their utility then depends on the number of agents using the same resources. This implies that there is no distinction between the agents, i.e., they are anonymous. We depart from this very general setting by proposing Resource Selection Games with heterogeneous agents that strive for joint resource usage with similar agents. So, instead of the number of other users of a given resource, our model considers agents with different types and the decisive feature is the fraction of same-type agents among the users. More precisely, similarly to Schelling Games, there is a tolerance threshold $\tau \in [0,1]$ which specifies the agents' desired minimum fraction of same-type agents on a resource. Agents strive to select resources where at least a $\tau$-fraction of those resources' users have the same type as themselves. For $\tau=1$, our model generalizes Hedonic Diversity Games with a peak at $1$. For our general model, we consider the existence and quality of equilibria and the complexity of maximizing social welfare. Additionally, we consider a bounded rationality model, where agents can only estimate the utility of a resource, since they only know the fraction of same-type agents on a given resource, but not the exact numbers. Thus, they cannot know the impact a strategy change would have on a target resource. Interestingly, we show that this type of bounded rationality yields favorable game-theoretic properties and specific equilibria closely approximate equilibria of the full knowledge setting.