$k$-means clustering of extremes

doi:10.1214/20-EJS1689

Open AccessJournal ArticleDOI

$k$-means clustering of extremes

Anja Janßen, +1 more

- 01 Jan 2020 -

Electronic Journal of Statistics

- Vol. 14, Iss: 1, pp 1211-1233

Chats0

TLDR

This paper explores how the spherical-means algorithm can be applied in the analysis of only the extremal observations from a data set and shows how it can be adopted to find "prototypes" of extremal dependence by making use of multivariate extreme value analysis.

Abstract:

The k-means clustering algorithm and its variant, the spherical k-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical k-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find “prototypes” of extremal dependence and derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.

Citations

PDF

Open Access

More filters

Posted Content

Sparse Structures for Multivariate Extremes

Sebastian Engelke, +1 more

- 25 Apr 2020 -

arXiv: Methodology

TL;DR: The different forms of extremal dependence that can arise between the largest observations of a multivariate random vector are described and identification of groups of variables which can be concomitantly extreme is addressed.

...read moreread less

Journal ArticleDOI

Principal component analysis for multivariate extremes

Holger Drees, +1 more

- 01 Jan 2021 -

Electronic Journal of Statistics

TL;DR: In this article, Principal Component Analysis (PCA) is applied to a re-scaled version of radially thresholded observations to analyze the squared reconstruction error for the exceedances over large radial thresholds, and it is shown that the empirical risk converges to the true risk uniformly over all projection subspaces.

...read moreread less

Journal ArticleDOI

Estimating an extreme Bayesian network via scalings

Claudia Klüppelberg, +1 more

- 01 Jan 2021 -

Journal of Multivariate Analysis

TL;DR: A scaling technique is proposed in order to determine a causal order of the node variables and all dependence parameters are estimated from the estimated scalings and dependence parameters based on asymptotic normality of the empirical spectral measure.

...read moreread less

Journal ArticleDOI

Cluster Analysis in Practice: Dealing with Outliers in Managerial Research

Humberto Elias Garcia Lopes, +1 more

TL;DR: This tutorial paper contributes to this discussion by presenting four clustering techniques and their respective advantages and disadvantages in the treatment of outliers, and concluded that researchers need to have a more diversified repertoire of clustering Techniques.

...read moreread less

Journal ArticleDOI

Sparse regular variation

Nicolas Meyer, +1 more

- 01 Dec 2021 -

Advances in Applied Probability

TL;DR: In this paper, the authors introduce the notion of sparse regular variation which allows to better learn the dependence structure of extreme events by using the Euclidean projection onto the simplex for which efficient algorithms are known.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Some methods for classification and analysis of multivariate observations

James B. MacQueen

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Journal ArticleDOI

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

David Ruppert

- 01 Jun 2004 -

Journal of the American Statistical Asso...

TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.

...read moreread less

Book

Extreme value theory : an introduction

Laurens de Haan, +1 more

TL;DR: This paper presented an excellent introduction to extreme value theory at the graduate level, requiring only some mathematical maturity, focusing on the probabilistic and statistical aspects of extreme values without major emphasis on such related topics as regular variation, point processes, empirical distribution functions, and Brownian motion.

...read moreread less

Journal ArticleDOI

Concept Decompositions for Large Sparse Text Data Using Clustering

Inderjit S. Dhillon, +1 more

- 01 Jan 2001 -

Machine Learning

TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.

...read moreread less

Book

Data Clustering: Theory, Algorithms, and Applications

Guojun Gan, +2 more

TL;DR: Clustering, Data and Similarity Measures: 1. data clustering 2. data types 3. scale conversion 4. data standardization and transformation 5. data visualization 6. Similarity and dissimilarity measures 7. clustering Algorithms.

...read moreread less

Collapse

$k$-means clustering of extremes

Citations

Sparse Structures for Multivariate Extremes

Principal component analysis for multivariate extremes

Estimating an extreme Bayesian network via scalings

Cluster Analysis in Practice: Dealing with Outliers in Managerial Research

Sparse regular variation

References

Some methods for classification and analysis of multivariate observations

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Extreme value theory : an introduction

Concept Decompositions for Large Sparse Text Data Using Clustering

Data Clustering: Theory, Algorithms, and Applications

Related Papers (5)

Extreme Values, Regular Variation, and Point Processes

Dimension reduction in multivariate extreme value analysis

Sparse representation of multivariate extremes with applications to anomaly detection

Decompositions of dependence for high-dimensional extremes

Extreme value theory : an introduction