Book ChapterDOI
Performance Analysis of Clustering Algorithm in Data Mining in R Language
Avulapalli Jayaram Reddy,B. K. Tripathy,Seema Nimje,Gopalam Sree Ganga,Kamireddy Varnasree +4 more
- pp 364-372
TLDR
The aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more.Abstract:
Data mining is the extraction of different data of intriguing as such (constructive, relevant, constructive, previously unexplored and considerably valuable) patterns or information from very large stack of data or different dataset. In other words, it is the experimental exploration of associations, links, and mainly the overall patterns that prevails in large datasets but is hidden or unknown. So, to explore the performance analysis using different clustering techniques we used R Language. This R language is a tool, which allows the user to analyse the data from various and different perspective and angles, in order to get a proper experimental results and in order to derive a meaningful relationships. In this paper, we are studying, analysing and comparing various algorithms and their techniques used for cluster analysis using R language. Our aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more. Finally as per the basics of the results that obtained we analyzed and compared, validated the efficiency of many different algorithms with respect to one another.read more
Citations
More filters
Journal ArticleDOI
Evaluation of web service clustering using Dirichlet Multinomial Mixture model based approach for Dimensionality Reduction in service representation
TL;DR: Results show that GSDMM with K-Means or Agglomerative clustering is outperforming all other methods and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) model is proposed as a dimensionality reduction and feature representation of services to overcome the limitations of short text clustering.
Journal ArticleDOI
Analysis of a new spatial interpolation weighting method to estimate missing data applied to rainfall records
Jorge Luis Morales,Francisco Antonio Horta Rangel,Ignacio Segovia-Dominguez,Agustín Robles Morua,Jesús Horacio Hernández +4 more
TL;DR: In this paper, two new generalized weighted methods of imputation of missing data are developed and tested using a daily rainfall series, and the choice of optimal parameters for the proposed formulae is based on minimizing the mean absolute error via an evolutionary strategy.
Journal ArticleDOI
Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time
TL;DR: Based on the experiments it may be concluded that K-means algorithm produces more promising result than hierarchical algorithm except in accuracy.
Journal ArticleDOI
Contamination assessment and potential sources of heavy metals and other elements in sediments of a basin impacted by 500 years of mining in central Mexico
Luisa Fernanda Rueda-Garzon,Raúl Miranda-Avilés,Alejandro Carrillo-Chávez,María Jesús Puy-Alquiza,Jorge Luis Morales-Martínez,Gabriela Ana Zanor +5 more
Journal ArticleDOI
Multivariate geotechnical zonation of seismic site effects with clustering-blended model for a city area, South Korea
TL;DR: This study proposes a new approach for multivariate site classification blended with geographic information system (GIS)-based spatial clustering and machine learning (ML)-based clustering ensemble technologies to develop cluster-oriented zonation considering the spatial heterogeneity of the different site response parameters.
References
More filters
Journal ArticleDOI
Identification of common molecular subsequences.
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
Book
The Grid 2: Blueprint for a New Computing Infrastructure
Ian Foster,Carl Kesselman +1 more
TL;DR: The Globus Toolkit as discussed by the authors is a toolkit for high-throughput resource management for distributed supercomputing applications, focusing on real-time wide-distributed instrumentation systems.
Journal ArticleDOI
An efficient k-means clustering algorithm: analysis and implementation
Tapas Kanungo,David M. Mount,Nathan S. Netanyahu,Christine D. Piatko,Ruth Silverman,Angela Y. Wu +5 more
TL;DR: This work presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it calls the filtering algorithm, and establishes the practical efficiency of the algorithm's running time.
The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration
TL;DR: This presentation complements an earlier foundational article, “The Anatomy of the Grid,” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how the architecture can be applied within commercial computing as a basis for distributed system integration.
Journal ArticleDOI
Grid services for distributed system integration
TL;DR: In this paper, the authors focus on the nature of the services that respond to protocol messages and propose a set of services that can be aggregated in various ways to meet the needs of virtual organizations, which themselves can be defined by the services they operate and share.