scispace - formally typeset
Book ChapterDOI

Performance Analysis of Clustering Algorithm in Data Mining in R Language

TLDR
The aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more.
Abstract
Data mining is the extraction of different data of intriguing as such (constructive, relevant, constructive, previously unexplored and considerably valuable) patterns or information from very large stack of data or different dataset. In other words, it is the experimental exploration of associations, links, and mainly the overall patterns that prevails in large datasets but is hidden or unknown. So, to explore the performance analysis using different clustering techniques we used R Language. This R language is a tool, which allows the user to analyse the data from various and different perspective and angles, in order to get a proper experimental results and in order to derive a meaningful relationships. In this paper, we are studying, analysing and comparing various algorithms and their techniques used for cluster analysis using R language. Our aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more. Finally as per the basics of the results that obtained we analyzed and compared, validated the efficiency of many different algorithms with respect to one another.

read more

Citations
More filters
Journal ArticleDOI

Evaluation of web service clustering using Dirichlet Multinomial Mixture model based approach for Dimensionality Reduction in service representation

TL;DR: Results show that GSDMM with K-Means or Agglomerative clustering is outperforming all other methods and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) model is proposed as a dimensionality reduction and feature representation of services to overcome the limitations of short text clustering.
Journal ArticleDOI

Analysis of a new spatial interpolation weighting method to estimate missing data applied to rainfall records

TL;DR: In this paper, two new generalized weighted methods of imputation of missing data are developed and tested using a daily rainfall series, and the choice of optimal parameters for the proposed formulae is based on minimizing the mean absolute error via an evolutionary strategy.
Journal ArticleDOI

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

TL;DR: Based on the experiments it may be concluded that K-means algorithm produces more promising result than hierarchical algorithm except in accuracy.
Journal ArticleDOI

Multivariate geotechnical zonation of seismic site effects with clustering-blended model for a city area, South Korea

TL;DR: This study proposes a new approach for multivariate site classification blended with geographic information system (GIS)-based spatial clustering and machine learning (ML)-based clustering ensemble technologies to develop cluster-oriented zonation considering the spatial heterogeneity of the different site response parameters.
References
More filters
Journal ArticleDOI

Identification of common molecular subsequences.

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
Book

The Grid 2: Blueprint for a New Computing Infrastructure

TL;DR: The Globus Toolkit as discussed by the authors is a toolkit for high-throughput resource management for distributed supercomputing applications, focusing on real-time wide-distributed instrumentation systems.
Journal ArticleDOI

An efficient k-means clustering algorithm: analysis and implementation

TL;DR: This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.

The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration

TL;DR: This presentation complements an earlier foundational article, “The Anatomy of the Grid,” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how the architecture can be applied within commercial computing as a basis for distributed system integration.
Journal ArticleDOI

Grid services for distributed system integration

TL;DR: In this paper, the authors focus on the nature of the services that respond to protocol messages and propose a set of services that can be aggregated in various ways to meet the needs of virtual organizations, which themselves can be defined by the services they operate and share.
Trending Questions (1)
Evaluating the effectivness of data clustering using python, r and julia?

The provided paper is about performance analysis of clustering algorithms in data mining using R language. It does not mention anything about evaluating the effectiveness of data clustering using Python, R, and Julia.