Book ChapterDOI
Pragmatic Evaluation of the Impact of Dimensionality Reduction in the Performance of Clustering Algorithms
Shini Renjith,A. Sreekumar,M. Jathavedan +2 more
- pp 499-512
TLDR
In this article, the impact of applying dimensionality reduction during the data transformation phase of the clustering process has been investigated for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES).Abstract:
With the huge volume of data available as input, modern-day statistical analysis leverages clustering techniques to limit the volume of data to be processed. These input data mainly sourced from social media channels and typically have high dimensions due to the diverse features it represents. This is normally referred to as the curse of dimensionality as it makes the clustering process highly computational intensive and less efficient. Dimensionality reduction techniques are proposed as a solution to address this issue. This paper covers an empirical analysis done on the impact of applying dimensionality reduction during the data transformation phase of the clustering process. We measured the impacts in terms of clustering quality and clustering performance for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES). The clustering quality is compared by using four internal evaluation criteria, namely Silhouette index, Dunn index, Calinski-Harabasz index, and Davies-Bouldin index, and average execution time is verified as a measure of clustering performance.read more
Citations
More filters
Journal ArticleDOI
Identifying and understanding road-constrained areas of interest (AOIs) through spatiotemporal taxi GPS data: A case study in New York City
TL;DR: In this article, a space-time analytical framework is proposed to identify and describe 31 road-constrained AOIs in terms of their spatiotemporal distribution and contextual characteristics.
Journal Article
Adaptive dimension reduction for clustering high dimensional data
TL;DR: Clustering analysis performed on highly overlapped Gaussians, DNA gene expression profiles and Internet newsgroups demonstrate the effectiveness of the proposed algorithm by repeated dimension reductions such that K-means or EM are performed only in very low dimensions.
Journal ArticleDOI
SemRec—An efficient ensemble recommender with sentiment based clustering for social media text corpus
TL;DR: This work proposes an ensemble multi‐stage recommender system with sentiment based clustering to deal with social media text corpus where each stage performing unique functionalities of information retrieval, natural language processing, user segmentation, prediction, and recommendation generation.
Book ChapterDOI
A Comparative Analysis of Clustering Quality Based on Internal Validation Indices for Dimensionally Reduced Social Media Data
TL;DR: An experimental analysis using four popular dimensionality reduction techniques – two linear and two nonlinear approaches – to verify the impact ofdimensionality reduction on cluster quality using internal clustering validation indices is covered.
Journal ArticleDOI
Recovery of Forest Vegetation in a Burnt Area in the Republic of Korea: A Perspective Based on Sentinel-2 Data
TL;DR: In this paper, the degree of vegetative regeneration using the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation index (EVI), Soil-Adjustment VegetationIndex (SAVI), and Normalized Burn Ratio (NBR) is estimated.
References
More filters
Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI
Nonlinear dimensionality reduction by locally linear embedding.
Sam T. Roweis,Lawrence K. Saul +1 more
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.
Journal ArticleDOI
Least squares quantization in PCM
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.