Performance Analysis of Clustering Algorithm in Data Mining in R Language

doi:10.1007/978-981-13-1936-5_39

Book ChapterDOI

Performance Analysis of Clustering Algorithm in Data Mining in R Language

- pp 364-372

TLDR

The aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more.

Abstract:

Data mining is the extraction of different data of intriguing as such (constructive, relevant, constructive, previously unexplored and considerably valuable) patterns or information from very large stack of data or different dataset. In other words, it is the experimental exploration of associations, links, and mainly the overall patterns that prevails in large datasets but is hidden or unknown. So, to explore the performance analysis using different clustering techniques we used R Language. This R language is a tool, which allows the user to analyse the data from various and different perspective and angles, in order to get a proper experimental results and in order to derive a meaningful relationships. In this paper, we are studying, analysing and comparing various algorithms and their techniques used for cluster analysis using R language. Our aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more. Finally as per the basics of the results that obtained we analyzed and compared, validated the efficiency of many different algorithms with respect to one another.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Evaluation of web service clustering using Dirichlet Multinomial Mixture model based approach for Dimensionality Reduction in service representation

Neha Agarwal, +2 more

- 01 Jul 2020 -

Information Processing and Management

TL;DR: Results show that GSDMM with K-Means or Agglomerative clustering is outperforming all other methods and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) model is proposed as a dimensionality reduction and feature representation of services to overcome the limitations of short text clustering.

...read moreread less

Journal ArticleDOI

Analysis of a new spatial interpolation weighting method to estimate missing data applied to rainfall records

Jorge Luis Morales, +4 more

- 28 Jun 2019 -

Atmosfera

TL;DR: In this paper, two new generalized weighted methods of imputation of missing data are developed and tested using a daily rainfall series, and the choice of optimal parameters for the proposed formulae is based on minimizing the mean absolute error via an evolutionary strategy.

...read moreread less

Journal ArticleDOI

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Syed Imtiyaz Hassan, +3 more

- 01 Dec 2020 -

International Journal of Information Tec...

TL;DR: Based on the experiments it may be concluded that K-means algorithm produces more promising result than hierarchical algorithm except in accuracy.

...read moreread less

Journal ArticleDOI

Contamination assessment and potential sources of heavy metals and other elements in sediments of a basin impacted by 500 years of mining in central Mexico

Luisa Fernanda Rueda-Garzon, +5 more

- 06 Sep 2022 -

Environmental Monitoring and Assessment

Journal ArticleDOI

Multivariate geotechnical zonation of seismic site effects with clustering-blended model for a city area, South Korea

Han-Saem Kim, +3 more

- 05 Dec 2021 -

Engineering Geology

TL;DR: This study proposes a new approach for multivariate site classification blended with geographic information system (GIS)-based spatial clustering and machine learning (ML)-based clustering ensemble technologies to develop cluster-oriented zonation considering the spatial heterogeneity of the different site response parameters.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981 -

Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

Book

The Grid 2: Blueprint for a New Computing Infrastructure

Ian Foster, +1 more

TL;DR: The Globus Toolkit as discussed by the authors is a toolkit for high-throughput resource management for distributed supercomputing applications, focusing on real-time wide-distributed instrumentation systems.

...read moreread less

Journal ArticleDOI

An efficient k-means clustering algorithm: analysis and implementation

Tapas Kanungo, +5 more

- 01 Jul 2002 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.

...read moreread less

The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration

Ian Foster, +3 more

TL;DR: This presentation complements an earlier foundational article, “The Anatomy of the Grid,” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how the architecture can be applied within commercial computing as a basis for distributed system integration.

...read moreread less

Journal ArticleDOI

Grid services for distributed system integration

Ian Foster, +3 more

- 01 Jun 2002 -

IEEE Computer

TL;DR: In this paper, the authors focus on the nature of the services that respond to protocol messages and propose a set of services that can be aggregated in various ways to meet the needs of virtual organizations, which themselves can be defined by the services they operate and share.

...read moreread less

Performance Analysis of Clustering Algorithm in Data Mining in R Language

Citations

Evaluation of web service clustering using Dirichlet Multinomial Mixture model based approach for Dimensionality Reduction in service representation

Analysis of a new spatial interpolation weighting method to estimate missing data applied to rainfall records

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Contamination assessment and potential sources of heavy metals and other elements in sediments of a basin impacted by 500 years of mining in central Mexico

Multivariate geotechnical zonation of seismic site effects with clustering-blended model for a city area, South Korea

References

Identification of common molecular subsequences.

The Grid 2: Blueprint for a New Computing Infrastructure

An efficient k-means clustering algorithm: analysis and implementation

The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration

Grid services for distributed system integration

Related Papers (5)

Comprehensive Analysis of Data Clustering Algorithms

The k-means clustering technique: General considerations and implementation in Mathematica

A Data Set Oriented Approach for Clustering Algorithm Selection

Empirical Analysis of Data Clustering Algorithms

Survey of Recent Clustering Techniques in Data Mining

Trending Questions (1)