scispace - formally typeset
Open Access

Review on determining number of Cluster in K-Means Clustering

TLDR
Six different approaches to determine the right number of clusters in a dataset are explored, including k-means method, a simple and fast clustering technique that addresses the problem of cluster number selection by using a k-Means approach.
Abstract
Clustering is widely used in different field such as biology, psychology, and economics. The result of clustering varies as number of cluster parameter changes hence main challenge of cluster analysis is that the number of clusters or the number of model parameters is seldom known, and it must be determined before clustering. The several clustering algorithm has been proposed. Among them k-means method is a simple and fast clustering technique. We address the problem of cluster number selection by using a k-means approach We can ask end users to provide a number of clusters in advance, but it is not feasible end user requires domain knowledge of each data set. There are many methods available to estimate the number of clusters such as statistical indices, variance based method, Information Theoretic, goodness of fit method etc...The paper explores six different approaches to determine the right number of clusters in a dataset

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Joint Communication, Computation, Caching, and Control in Big Data Multi-Access Edge Computing

TL;DR: In this paper, the problem of joint computing, caching, communication, and control (4C) in big data MEC is formulated as an optimization problem whose goal is to jointly optimize a linear combination of the bandwidth consumption and network latency.
Journal ArticleDOI

Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach

TL;DR: In this article, a prediction model that can be used with different types of RNN models on subgroups of similar time series, which are identified by time series clustering techniques is presented.
Journal ArticleDOI

Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM - a new approach.

TL;DR: In this article, the authors evaluated the performance of artificial neural network (ANN) and support vector machine (SVM) models for the estimation of daily ETo across the entirety of Brazil using measured data on temperature and relative humidity or only temperature.
Journal ArticleDOI

Physics-guided convolutional neural network (PhyCNN) for data-driven seismic response modeling

TL;DR: In this paper, a physics-guided convolutional neural network (PhyCNN) is proposed to predict building seismic response in a data-driven fashion without the need of a physicsbased analytical/numerical model.
Journal ArticleDOI

Prediction of Blast-Induced Ground Vibration in an Open-Pit Mine by a Novel Hybrid Model Based on Clustering and Artificial Neural Network

TL;DR: The proposed HKM–ANN model was the most superior model in estimating PPV caused by blasting operations in this study and contributed a new computational model in predicting blast-induced PPV for the science community and practical engineering with high accuracy level.
References
More filters
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

Estimating the dimension of a model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Book

Finding Groups in Data: An Introduction to Cluster Analysis

TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.
Related Papers (5)