scispace - formally typeset
Book ChapterDOI

Pragmatic Evaluation of the Impact of Dimensionality Reduction in the Performance of Clustering Algorithms

TLDR
In this article, the impact of applying dimensionality reduction during the data transformation phase of the clustering process has been investigated for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES).
Abstract
With the huge volume of data available as input, modern-day statistical analysis leverages clustering techniques to limit the volume of data to be processed. These input data mainly sourced from social media channels and typically have high dimensions due to the diverse features it represents. This is normally referred to as the curse of dimensionality as it makes the clustering process highly computational intensive and less efficient. Dimensionality reduction techniques are proposed as a solution to address this issue. This paper covers an empirical analysis done on the impact of applying dimensionality reduction during the data transformation phase of the clustering process. We measured the impacts in terms of clustering quality and clustering performance for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES). The clustering quality is compared by using four internal evaluation criteria, namely Silhouette index, Dunn index, Calinski-Harabasz index, and Davies-Bouldin index, and average execution time is verified as a measure of clustering performance.

read more

Citations
More filters
Journal ArticleDOI

Identifying and understanding road-constrained areas of interest (AOIs) through spatiotemporal taxi GPS data: A case study in New York City

TL;DR: In this article, a space-time analytical framework is proposed to identify and describe 31 road-constrained AOIs in terms of their spatiotemporal distribution and contextual characteristics.
Journal Article

Adaptive dimension reduction for clustering high dimensional data

TL;DR: Clustering analysis performed on highly overlapped Gaussians, DNA gene expression profiles and Internet newsgroups demonstrate the effectiveness of the proposed algorithm by repeated dimension reductions such that K-means or EM are performed only in very low dimensions.
Journal ArticleDOI

SemRec—An efficient ensemble recommender with sentiment based clustering for social media text corpus

TL;DR: This work proposes an ensemble multi‐stage recommender system with sentiment based clustering to deal with social media text corpus where each stage performing unique functionalities of information retrieval, natural language processing, user segmentation, prediction, and recommendation generation.
Book ChapterDOI

A Comparative Analysis of Clustering Quality Based on Internal Validation Indices for Dimensionally Reduced Social Media Data

TL;DR: An experimental analysis using four popular dimensionality reduction techniques – two linear and two nonlinear approaches – to verify the impact ofdimensionality reduction on cluster quality using internal clustering validation indices is covered.
Journal ArticleDOI

Recovery of Forest Vegetation in a Burnt Area in the Republic of Korea: A Perspective Based on Sentinel-2 Data

TL;DR: In this paper, the degree of vegetative regeneration using the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation index (EVI), Soil-Adjustment VegetationIndex (SAVI), and Normalized Burn Ratio (NBR) is estimated.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.
Journal ArticleDOI

Least squares quantization in PCM

TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Related Papers (5)