scispace - formally typeset
Search or ask a question
Topic

Cluster analysis

About: Cluster analysis is a research topic. Over the lifetime, 146546 publications have been published within this topic receiving 2962017 citations. The topic is also known as: clustering & cluster analysis in marketing.


Papers
More filters
Proceedings ArticleDOI
10 Apr 2011
TL;DR: A simple and efficient spectral clustering algorithm is applied to perform network-aware clustering of end hosts in the same prefixes into different behavior clusters that exhibit distinct traffic characteristics which provides improved interpretations of the separated traffic compared with the aggregated traffic of the prefixes.
Abstract: This paper explores the behavior similarity of Internet end hosts in the same network prefixes. We use bipartite graphs to model network traffic, and then construct one-mode projection graphs for capturing social-behavior similarity of end hosts. By applying a simple and efficient spectral clustering algorithm, we perform network-aware clustering of end hosts in the same prefixes into different behavior clusters. Based on information-theoretical measures, we find that the clusters exhibit distinct traffic characteristics which provides improved interpretations of the separated traffic compared with the aggregated traffic of the prefixes. Finally, we demonstrate the applications of exploring behavior similarity in profiling network behaviors and detecting anomalous behaviors through synthetic traffic that combines Internet backbone traffic and packet traces from real scenarios of worm propagations and denial of service attacks.

57 citations

Journal ArticleDOI
TL;DR: This study discusses each of the algorithms in great detail and offers a thorough comparative analysis and compares the performances of these algorithms in a medical diagnosis classification problem, namely Aachen Aphasia Test.

57 citations

Posted Content
TL;DR: This paper provides precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation.
Abstract: While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering.

57 citations

Journal ArticleDOI
TL;DR: This paper aims to explore dimensionality reduction on a real telecom dataset and evaluate customers’ clustering in reduced and latent space, compared to original space in order to achieve better quality clustering results.
Abstract: Telecom Companies logs customer’s actions which generate a huge amount of data that can bring important findings related to customer’s behavior and needs. The main characteristics of such data are the large number of features and the high sparsity that impose challenges to the analytics steps. This paper aims to explore dimensionality reduction on a real telecom dataset and evaluate customers’ clustering in reduced and latent space, compared to original space in order to achieve better quality clustering results. The original dataset contains 220 features that belonging to 100,000 customers. However, dimensionality reduction is an important data preprocessing step in the data mining process specially with the presence of curse of dimensionality. In particular, the aim of data reduction techniques is to filter out irrelevant features and noisy data samples. To reduce the high dimensional data, we projected it down to a subspace using well known Principal Component Analysis (PCA) decomposition and a novel approach based on Autoencoder Neural Network, performing in this way dimensionality reduction of original data. Then K-Means Clustering is applied on both-original and reduced data set. Different internal measures were performed to evaluate clustering for different numbers of dimensions and then we evaluated how the reduction method impacts the clustering task.

57 citations

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This work builds a generative model for activities (in video) using a cascade of dynamical systems and shows that this model is able to capture and represent a diverse class of activities.
Abstract: Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities.

57 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
90% related
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
87% related
Software
130.5K papers, 2M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202416
20237,685
202217,389
20219,145
202010,460
201911,543