Home
/
Authors
/
Frederik Mallmann-Trenn

Author

Frederik Mallmann-Trenn

Other affiliations: Simon Fraser University, University of Paderborn, University of Cambridge ...read more

Bio: Frederik Mallmann-Trenn is an academic researcher from King's College London. The author has contributed to research in topics: Computer science & Load balancing (computing). The author has an hindex of 13, co-authored 62 publications receiving 589 citations. Previous affiliations of Frederik Mallmann-Trenn include Simon Fraser University & University of Paderborn.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hierarchical Clustering: Objective Functions and Algorithms

[...]

Vincent Cohen-Addad¹, Varun Kanade², Frederik Mallmann-Trenn, Claire Mathieu³•Institutions (3)

University of Paris¹, University of Oxford², Paris Diderot University³

05 Jun 2019

TL;DR: Equipped with a suitable objective function, the performance of practical algorithms, as well as better and faster algorithms for hierarchical clustering are analyzed, and a beyond worst-case analysis of the complexity of the problem is initiated to design algorithms for this scenario.

...read moreread less

145 citations

Book Chapter•DOI•

Hierarchical clustering: objective functions and algorithms

[...]

Vincent Cohen-Addad¹, Varun Kanada², Frederik Mallmann-Trenn, Claire Mathieu³•Institutions (3)

University of Copenhagen¹, University of Oxford², PSL Research University³

07 Jan 2018

TL;DR: A set of admissible objective functions (that includes the one introduced by Dasgupta) that have the property that when the input admits a 'natural' ground-truth hierarchical clustering, the ground- Truth clustering has an optimal value are characterized.

...read moreread less

Abstract: Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, [19] framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a 'good' hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties, such as in order to achieve optimal cost, disconnected components must be separated first and that in 'structureless' graphs, i.e., cliques, all clusterings achieve the same cost. We take an axiomatic approach to defining 'good' objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of admissible objective functions (that includes the one introduced by Dasgupta) that have the property that when the input admits a 'natural' ground-truth hierarchical clustering, the ground-truth clustering has an optimal value. Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better and faster algorithms for hierarchical clustering. For similarity-based hierarchical clustering, [19] showed that a simple recursive sparsest-cut based approach achieves an O(log3/2 n)-approximation on worst-case inputs. We give a more refined analysis of the algorithm and show that it in fact achieves an [Equation]-approximation1. This improves upon the LP-based O(log n)-approximation of [33]. For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approximation, and provide a simple and better algorithm that gives a factor 3/2 approximation. This aims at explaining the success of these heuristics in practice. Finally, we consider a 'beyond-worst-case' scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta's cost function also has desirable properties for these inputs and we provide a simple algorithm that for graphs generated according to this model yields a 1 + o(1) factor approximation.

...read moreread less

78 citations

Posted Content•

Hierarchical Clustering: Objective Functions and Algorithms

[...]

Vincent Cohen-Addad¹, Varun Kanade², Frederik Mallmann-Trenn, Claire Mathieu³•Institutions (3)

University of Paris¹, University of Oxford², Paris Diderot University³

07 Apr 2017-arXiv: Data Structures and Algorithms

TL;DR: In this article, the authors take an axiomatic approach to defining ''good'' objective functions for both similarity and dissimilarity-based hierarchical clustering, and characterize a set of "admissible" objective functions (that includes Dasgupta's one) that have the property that when the input admits a ''natural'' hierarchical cluster, it has an optimal value.

...read moreread less

Abstract: Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a `good' hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties. We take an axiomatic approach to defining `good' objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of "admissible" objective functions (that includes Dasgupta's one) that have the property that when the input admits a `natural' hierarchical clustering, it has an optimal value. Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better algorithms. For similarity-based hierarchical clustering, Dasgupta showed that the divisive sparsest-cut approach achieves an $O(\log^{3/2} n)$-approximation. We give a refined analysis of the algorithm and show that it in fact achieves an $O(\sqrt{\log n})$-approx. (Charikar and Chatziafratis independently proved that it is a $O(\sqrt{\log n})$-approx.). This improves upon the LP-based $O(\log n)$-approx. of Roy and Pokutta. For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approx., and provide a simple and better algorithm that gives a factor 3/2 approx.. Finally, we consider `beyond-worst-case' scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta's cost function has desirable properties for these inputs and we provide a simple 1 + o(1)-approximation in this setting.

...read moreread less

62 citations

Proceedings Article•

Hierarchical Clustering Beyond the Worst-Case

[...]

Vincent Cohen-Addad¹, Varun Kanade², Frederik Mallmann-Trenn³•Institutions (3)

École Normale Supérieure¹, University of California, Berkeley², Massachusetts Institute of Technology³

01 Jan 2017

TL;DR: A fairly general random graph model for hierarchical clustering is considered, called the hierarchical stochastic blockmodel (HSBM), and it is shown that in certain regimes the SVD approach of McSherry combined with specific linkage methods results in a clustering that give an O(1)-approximation to Dasgupta’s cost function.

...read moreread less

Abstract: Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, recently Dasgupta [1] proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective (see also [2, 3, 4]). In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic blockmodel (HSBM), and show that in certain regimes the SVD approach of McSherry [5] combined with specific linkage methods results in a clustering that give an O(1)-approximation to Dasgupta’s cost function. We also show that an approach based on SDP relaxations for balanced cuts based on the work of Makarychev et al. [6], combined with the recursive sparsest cut algorithm of Dasgupta, yields an O(1) approximation in slightly larger regimes and also in the semi-random setting, where an adversary may remove edges from the random graph generated according to an HSBM. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.

...read moreread less

48 citations

Proceedings Article•

On coalescence time in graphs: When is coalescing as fast as meeting?: Extended Abstract.

[...]

Varun Kanade¹, Frederik Mallmann-Trenn², Thomas Sauerwald³•Institutions (3)

University of Oxford¹, Massachusetts Institute of Technology², University of Cambridge³

01 Jan 2019

43 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Benefits of depth in neural networks

[...]

Matus Telgarsky¹•Institutions (1)

University of Michigan¹

06 Jun 2016-Journal of Machine Learning Research

TL;DR: This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with reLU and maximization gates, sum-product networks, and boosted decision trees.

...read moreread less

Abstract: For any positive integer $k$, there exist neural networks with $\Theta(k^3)$ layers, $\Theta(1)$ nodes per layer, and $\Theta(1)$ distinct parameters which can not be approximated by networks with $\mathcal{O}(k)$ layers unless they are exponentially large --- they must possess $\Omega(2^k)$ nodes. This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: $\Omega(2^{k^3})$ total tree nodes are required).

...read moreread less

288 citations

Journal Article•DOI•

Hierarchical Clustering: Objective Functions and Algorithms

[...]

Vincent Cohen-Addad¹, Varun Kanade², Frederik Mallmann-Trenn, Claire Mathieu³•Institutions (3)

University of Paris¹, University of Oxford², Paris Diderot University³

05 Jun 2019

...read moreread less

145 citations

Proceedings Article•DOI•

Time-space trade-offs in population protocols

[...]

Dan Alistarh¹, James Aspnes², David Eisenstat³, Rati Gelashvili⁴, Ronald L. Rivest⁴ - Show less +1 more•Institutions (4)

ETH Zurich¹, Yale University², Google³, Massachusetts Institute of Technology⁴

16 Jan 2017

TL;DR: In this paper, it was shown that the complexity of leader election and majority election can be reduced to O(log log n) and O(n/poly logn) expected time, respectively, by using a super-constant number of states per node.

...read moreread less

Abstract: Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as leader election (which requires convergence to a single agent in a special "leader" state), and majority (in which agents must converge to a decision as to which of two possible initial states had higher initial count). Known results point towards an inherent trade-off between the time complexity of such algorithms, and the space complexity, i.e. size of the memory available to each agent.In this paper, we explore this trade-off and provide new upper and lower bounds for majority and leader election. First, we prove a unified lower bound, which relates the space available per node with the time complexity achievable by a protocol: for instance, our result implies that any protocol solving either of these tasks for n agents using O(log log n) states must take Ω(n/polylogn) expected time. This is the first result to characterize time complexity for protocols which employ super-constant number of states per node, and proves that fast, poly-logarithmic running times require protocols to have relatively large space costs.On the positive side, we give algorithms showing that fast, poly-logarithmic convergence time can be achieved using O(log2n) space per node, in the case of both tasks. Overall, our results highlight a time complexity separation between O (log log n) and Θ(log2n) state space size for both majority and leader election in population protocols, and introduce new techniques, which should be applicable more broadly.

...read moreread less

105 citations

Proceedings Article•

Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search

[...]

Benjamin Moseley¹, Joshua R. Wang²•Institutions (2)

Washington University in St. Louis¹, Stanford University²

01 Jan 2017

TL;DR: This paper establishes that using bisecting k-means divisive clustering has a very poor lower bound on its approximation ratio for the same objective and shows that there are divisive algorithms that perform well with respect to this objective by giving two constant approximation algorithms.

...read moreread less

Abstract: Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main result is that one of the most popular algorithms used in practice, average linkage agglomerative clustering, has a small constant approximation ratio for this objective. Furthermore, this paper establishes that using bisecting k-means divisive clustering has a very poor lower bound on its approximation ratio for the same objective. However, we show that there are divisive algorithms that perform well with respect to this objective by giving two constant approximation algorithms. This paper is some of the first work to establish guarantees on widely used hierarchical algorithms for a natural objective function. This objective and analysis give insight into what these popular algorithms are optimizing and when they will perform well.

...read moreread less

102 citations

Journal Article•

Are stable instances easy

[...]

Yonatan Bilu, Nathan Linial¹•Institutions (1)

Hebrew University of Jerusalem¹

01 Jan 2009-Electronic Colloquium on Computational Complexity

TL;DR: The notion of a stable instance for a discrete optimization problem is introduced, and it is argued that in many practical situations only sufficiently stable instances are of interest, and that this is indeed the case for the Max-Cut problem.

...read moreread less

Abstract: We introduce the notion of a stable instance for a discrete optimization problem, and argue that in many practical situations only sufficiently stable instances are of interest. The question then arises whether stable instances of NP-hard problems are easier to solve, and in particular, whether there exist algorithms that solve in polynomial time all sufficiently stable instances of some NP-hard problem. The paper focuses on the Max-Cut problem, for which we show that this is indeed the case.

...read moreread less

93 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

Collapse