$k$-means clustering of extremes
Anja Janßen,Phyllis Wan +1 more
Reads0
Chats0
TLDR
This paper explores how the spherical-means algorithm can be applied in the analysis of only the extremal observations from a data set and shows how it can be adopted to find "prototypes" of extremal dependence by making use of multivariate extreme value analysis.Abstract:
The k-means clustering algorithm and its variant, the spherical k-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical k-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find “prototypes” of extremal dependence and derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.read more
Citations
More filters
Posted Content
Sparse Structures for Multivariate Extremes
TL;DR: The different forms of extremal dependence that can arise between the largest observations of a multivariate random vector are described and identification of groups of variables which can be concomitantly extreme is addressed.
Journal ArticleDOI
Principal component analysis for multivariate extremes
Holger Drees,Anne Sabourin +1 more
TL;DR: In this article, Principal Component Analysis (PCA) is applied to a re-scaled version of radially thresholded observations to analyze the squared reconstruction error for the exceedances over large radial thresholds, and it is shown that the empirical risk converges to the true risk uniformly over all projection subspaces.
Journal ArticleDOI
Estimating an extreme Bayesian network via scalings
Claudia Klüppelberg,Mario Krali +1 more
TL;DR: A scaling technique is proposed in order to determine a causal order of the node variables and all dependence parameters are estimated from the estimated scalings and dependence parameters based on asymptotic normality of the empirical spectral measure.
Journal ArticleDOI
Cluster Analysis in Practice: Dealing with Outliers in Managerial Research
TL;DR: This tutorial paper contributes to this discussion by presenting four clustering techniques and their respective advantages and disadvantages in the treatment of outliers, and concluded that researchers need to have a more diversified repertoire of clustering Techniques.
Journal ArticleDOI
Sparse regular variation
TL;DR: In this paper, the authors introduce the notion of sparse regular variation which allows to better learn the dependence structure of extreme events by using the Euclidean projection onto the simplex for which efficient algorithms are known.
References
More filters
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Book
Extreme value theory : an introduction
Laurens de Haan,Ana Ferreira +1 more
TL;DR: This paper presented an excellent introduction to extreme value theory at the graduate level, requiring only some mathematical maturity, focusing on the probabilistic and statistical aspects of extreme values without major emphasis on such related topics as regular variation, point processes, empirical distribution functions, and Brownian motion.
Journal ArticleDOI
Concept Decompositions for Large Sparse Text Data Using Clustering
TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.
Book
Data Clustering: Theory, Algorithms, and Applications
TL;DR: Clustering, Data and Similarity Measures: 1. data clustering 2. data types 3. scale conversion 4. data standardization and transformation 5. data visualization 6. Similarity and dissimilarity measures 7. clustering Algorithms.