scispace - formally typeset
Open AccessJournal ArticleDOI

$k$-means clustering of extremes

Reads0
Chats0
TLDR
This paper explores how the spherical-means algorithm can be applied in the analysis of only the extremal observations from a data set and shows how it can be adopted to find "prototypes" of extremal dependence by making use of multivariate extreme value analysis.
Abstract
The k-means clustering algorithm and its variant, the spherical k-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical k-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find “prototypes” of extremal dependence and derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Sparse Structures for Multivariate Extremes

TL;DR: The different forms of extremal dependence that can arise between the largest observations of a multivariate random vector are described and identification of groups of variables which can be concomitantly extreme is addressed.
Journal ArticleDOI

Principal component analysis for multivariate extremes

TL;DR: In this article, Principal Component Analysis (PCA) is applied to a re-scaled version of radially thresholded observations to analyze the squared reconstruction error for the exceedances over large radial thresholds, and it is shown that the empirical risk converges to the true risk uniformly over all projection subspaces.
Journal ArticleDOI

Estimating an extreme Bayesian network via scalings

TL;DR: A scaling technique is proposed in order to determine a causal order of the node variables and all dependence parameters are estimated from the estimated scalings and dependence parameters based on asymptotic normality of the empirical spectral measure.
Journal ArticleDOI

Cluster Analysis in Practice: Dealing with Outliers in Managerial Research

TL;DR: This tutorial paper contributes to this discussion by presenting four clustering techniques and their respective advantages and disadvantages in the treatment of outliers, and concluded that researchers need to have a more diversified repertoire of clustering Techniques.
Journal ArticleDOI

Sparse regular variation

TL;DR: In this paper, the authors introduce the notion of sparse regular variation which allows to better learn the dependence structure of extreme events by using the Euclidean projection onto the simplex for which efficient algorithms are known.
References
More filters

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Book

Extreme value theory : an introduction

TL;DR: This paper presented an excellent introduction to extreme value theory at the graduate level, requiring only some mathematical maturity, focusing on the probabilistic and statistical aspects of extreme values without major emphasis on such related topics as regular variation, point processes, empirical distribution functions, and Brownian motion.
Journal ArticleDOI

Concept Decompositions for Large Sparse Text Data Using Clustering

TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.
Book

Data Clustering: Theory, Algorithms, and Applications

TL;DR: Clustering, Data and Similarity Measures: 1. data clustering 2. data types 3. scale conversion 4. data standardization and transformation 5. data visualization 6. Similarity and dissimilarity measures 7. clustering Algorithms.