Showing papers on "Dunn index published in 2020"

PDF

Open Access

Journal Article•DOI•

En-ABC: An ensemble artificial bee colony based anomaly detection scheme for cloud environment

[...]

Sahil Garg¹, Sahil Garg², Kuljeet Kaur¹, Kuljeet Kaur², Shalini Batra², Gagangeet Singh Aujla³, Gagangeet Singh Aujla⁴, Graham Morgan⁴, Neeraj Kumar², Albert Y. Zomaya⁵, Rajiv Ranjan⁴ - Show less +7 more•Institutions (5)

École de technologie supérieure¹, Thapar University², Chandigarh University³, Newcastle University⁴, University of Sydney⁵

01 Jan 2020-Journal of Parallel and Distributed Computing

TL;DR: An Ensemble Artificial Bee Colony based Anomaly Detection Scheme (En-ABC) for multi-class datasets in cloud environment is proposed and the performance of the proposed scheme has been compared with the existing schemes using various parameters such as-detection, false alarm, and accuracy rates.

...read moreread less

59 citations

Journal Article•DOI•

Development of new seed with modified validity measures for k-means clustering

[...]

S. Manochandar¹, Murugesan Punniyamoorthy¹, R. K. Jeyachitra¹•Institutions (1)

National Institute of Technology, Tiruchirappalli¹

01 Mar 2020-Computers & Industrial Engineering

TL;DR: The comparative analysis, based on the modified Dunn Index, and silhouette validity ratio have proved that the proposed initialization algorithm has performed better than the other initialization algorithms.

...read moreread less

37 citations

Journal Article•DOI•

On Perfect Clustering of High Dimension, Low Sample Size Data

[...]

Soham Sarkar¹, Anil K. Ghosh²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Indian Statistical Institute²

01 Sep 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new data-driven dissimilarity measure, called MADD, is used, which uses the distance concentration phenomenon to its advantage, and as a result, clustering algorithms based on MADD usually perform well for high dimensional data.

...read moreread less

Abstract: Popular clustering algorithms based on usual distance functions (e.g., the Euclidean distance) often suffer in high dimension, low sample size (HDLSS) situations, where concentration of pairwise distances and violation of neighborhood structure have adverse effects on their performance. In this article, we use a new data-driven dissimilarity measure, called MADD, which takes care of these problems. MADD uses the distance concentration phenomenon to its advantage, and as a result, clustering algorithms based on MADD usually perform well for high dimensional data. We establish it using theoretical as well as numerical studies. We also address the problem of estimating the number of clusters. This is a challenging problem in cluster analysis, and several algorithms are available for it. We show that many of these existing algorithms have superior performance in high dimensions when they are constructed using MADD. We also construct a new estimator based on a penalized version of the Dunn index and prove its consistency in the HDLSS asymptotic regime. Several simulated and real data sets are analyzed to demonstrate the usefulness of MADD for cluster analysis of high dimensional data.

...read moreread less

36 citations

Journal Article•DOI•

Bayesian neural network–based thermal error modeling of feed drive system of CNC machine tool

[...]

Hu Shi¹, Chunping Jiang¹, Zongzhuo Yan¹, Tao Tao¹, Xuesong Mei¹ - Show less +1 more•Institutions (1)

Xi'an Jiaotong University¹

01 Jun 2020-The International Journal of Advanced Manufacturing Technology

TL;DR: The results show that compared with the BP neural network and multiple linear regression model, the Bayesian neural network not only has higher prediction accuracy but also can guarantee excellent prediction performance under different working conditions.

...read moreread less

Abstract: It is well known that thermal error has a significant impact on the accuracy of CNC machine tools. In order to decrease the thermally induced positioning error of machine tools, a novel thermal error modeling approach based on Bayesian neural network is proposed in this paper. The relationship between the temperature rise and positioning error of the feed drive system is investigated by simultaneously measuring the thermal characteristics that include the temperature field and positioning error of the CNC machine tool. Fuzzy c-means (FCM) clustering and correlation analysis are used to select temperature-sensitive points, and the Dunn index is introduced to determine the optimal number of clustering groups, which can inhibit the multicollinearity problem among temperature measuring points effectively. The least-square linear fitting is applied to explore the feature of the positioning error data. The results show that compared with the BP neural network and multiple linear regression model, the Bayesian neural network not only has higher prediction accuracy but also can guarantee excellent prediction performance under different working conditions. The prediction results obtained under different operating conditions indicate that the maximum thermal error can be reduced from around 18.2 to 5.14 μm by using the Bayesian neural network, which represents a 71% reduction in the thermally induced error of the feed drive system of machine tool.

...read moreread less

27 citations

Journal Article•DOI•

A hybrid modified step Whale Optimization Algorithm with Tabu Search for data clustering

[...]

Kareem Kamal A. Ghany¹, Kareem Kamal A. Ghany², Amr Mohamed AbdelAziz¹, Taysir Hassan A. Soliman³, Adel Abu El-Magd Sewisy³ - Show less +1 more•Institutions (3)

Beni-Suef University¹, Saudi Electronic University², Assiut University³

05 Feb 2020-Journal of King Saud University - Computer and Information Sciences

TL;DR: This paper proposes combining WOA with TS (WOATS) for data clustering, a meta-heuristic method which uses memory components to explore and exploit search space and uses an objective function inspired by partitional clustering to maintain the quality of clustering solutions.

...read moreread less

20 citations

Journal Article•DOI•

Partial discharges and noise classification under HVDC using unsupervised and semi-supervised learning

[...]

Nathalie Morette¹, L.C. Castro Heredia², Thierry Ditchi¹, A. Rodrigo Mor², Yacine Oussar¹ - Show less +1 more•Institutions (2)

PSL Research University¹, Delft University of Technology²

14 May 2020-International Journal of Electrical Power & Energy Systems

TL;DR: The results using this methodology showed a high classification accuracy and proved that both learning frameworks can be combined to optimize the selection of classification features.

...read moreread less

19 citations

Journal Article•DOI•

[...]

B.A. Sabarish¹, R. Karthi¹, T. Gireesh Kumar¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2020-Procedia Computer Science

TL;DR: This paper proposes to cluster and identify similar trajectories based on paths traversed by moving object based on graph model, which has two phases graph generation and clustering.

...read moreread less

16 citations

Journal Article•DOI•

Distributed clustering in peer to peer networks using multi-objective whale optimization

[...]

Dinesh Kumar Kotary¹, Satyasai Jagannath Nanda¹•Institutions (1)

Malaviya National Institute of Technology, Jaipur¹

13 Aug 2020-Applied Soft Computing

TL;DR: The proposed distributed clustering algorithm using multi-objective whale optimization (DMOWOA) for peer to peer network outperforms the existing techniques in terms of statistical measures Minkowski Score, Dunn index and Silhouette index.

...read moreread less

16 citations

Journal Article•DOI•

Clustering of unknown protocol messages based on format comparison

[...]

Fanghui Sun¹, Shen Wang¹, Chunrui Zhang², Chunrui Zhang¹, Hongli Zhang¹ - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, China Academy of Engineering Physics²

09 Oct 2020-Computer Networks

TL;DR: This paper designs an unsupervised clustering strategy with Silhouette Coefficient and Dunn Index applied to parameter selecting of DBSCAN to cluster unknown protocol messages into classes with different formats.

...read moreread less

14 citations

Journal Article•DOI•

Development of the WEEE grouping system in South Korea using the hierarchical and non-hierarchical clustering algorithms

[...]

Jihwan Park¹, Keon Vin Park¹, Soohyun Yoo, Sang Ok Choi¹, Sung Won Han¹ - Show less +1 more•Institutions (1)

Korea University¹

01 Oct 2020-Resources Conservation and Recycling

TL;DR: The new grouping suggested by the hierarchical method with four clusters can be utilized as a political decision-making tool and showed that clustering accuracy was the best for the classification that used the hierarchicalmethod.

...read moreread less

Abstract: South Korea has been operating an extended producer responsibility system (EPR) since 2003 to collect, transport, and dispose of e-waste Until 2019, the EPR system was operated with a total number of 27 electronic products classified into five categories based on weight and volume, but 23 items will be added in 2020 along with a change to five categories based on the function of the products In this study that used actual operational data related to the collection, transport, and recycling steps from recycling plants in South Korea, we have analyzed how well the new five-category grouping appropriately reflected actual recycling industrial conditions and have provided optimal classification alternatives The results showed that clustering accuracy was the best for the classification that used the hierarchical method In particular, the evaluation index, silhouettes, showed the best accuracy with three clusters (04155), and the Dunn index indicated the best performance with four clusters (02333) Based these results, ANOVA tests were implemented, and showed that the three clusters in the relevant models were significantly different with regard to takt-time, weight, volume, and no of recycling processes (p ≤ 001) and to both recycling cost and value of material (p ≤ 005) In contrast, with regard to the grouping suggested by the South Korean government, the overall results of the clustering accuracy using silhouettes and Dunn indices were –02028 and 0058, respectively In conclusion, the new grouping suggested by the hierarchical method with four clusters can be utilized as a political decision-making tool

...read moreread less

9 citations

Journal Article•DOI•

Data-Driven Analytics for Personalized Medical Decision Making

[...]

Nataliia Melnykova, Nataliya Shakhovska, Michal Greguš, Volodymyr Melnykov, Mariana Zakharchuk, Olena Vovk - Show less +2 more

01 Jul 2020

TL;DR: The formalization of the medical data preprocessing stage was proposed in order to find personalized solutions based on current standards and pharmaceutical protocols and to determine deviations of parameters from the normative parameters of the group, as well as the average parameters.

...read moreread less

Abstract: The study was conducted by applying machine learning and data mining methods to treatment personalization. This allows individual patient characteristics to be investigated. The personalization method was built on the clustering method and associative rules. It was suggested to determine the average distance between instances in order to find the optimal performance metrics. The formalization of the medical data preprocessing stage was proposed in order to find personalized solutions based on current standards and pharmaceutical protocols. The patient data model was built using time-dependent and time-independent parameters. Personalized treatment is usually based on the decision tree method. This approach requires significant computation time and cannot be parallelized. Therefore, it was proposed to group people by conditions and to determine deviations of parameters from the normative parameters of the group, as well as the average parameters. The novelty of the paper is the new clustering method, which was built from an ensemble of cluster algorithms, and the usage of the new distance measure with Hopkins metrics, which were 0.13 less than for the k-means method. The Dunn index was 0.03 higher than for the BIRCH (balanced iterative reducing and clustering using hierarchies) algorithm. The next stage was the mining of associative rules provided separately for each cluster. This allows a personalized approach to treatment to be created for each patient based on long-term monitoring. The correctness level of the proposed medical decisions is 86%, which was approved by experts.

...read moreread less

Book Chapter•DOI•

Pragmatic Evaluation of the Impact of Dimensionality Reduction in the Performance of Clustering Algorithms

[...]

Shini Renjith¹, A. Sreekumar¹, M. Jathavedan¹•Institutions (1)

Cochin University of Science and Technology¹

01 Jan 2020

TL;DR: In this article, the impact of applying dimensionality reduction during the data transformation phase of the clustering process has been investigated for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES).

...read moreread less

Abstract: With the huge volume of data available as input, modern-day statistical analysis leverages clustering techniques to limit the volume of data to be processed. These input data mainly sourced from social media channels and typically have high dimensions due to the diverse features it represents. This is normally referred to as the curse of dimensionality as it makes the clustering process highly computational intensive and less efficient. Dimensionality reduction techniques are proposed as a solution to address this issue. This paper covers an empirical analysis done on the impact of applying dimensionality reduction during the data transformation phase of the clustering process. We measured the impacts in terms of clustering quality and clustering performance for three most common clustering algorithms k-means clustering, clustering large applications (CLARA), and agglomerative hierarchical clustering (AGNES). The clustering quality is compared by using four internal evaluation criteria, namely Silhouette index, Dunn index, Calinski-Harabasz index, and Davies-Bouldin index, and average execution time is verified as a measure of clustering performance.

...read moreread less

Journal Article•DOI•

Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets

[...]

Md. Bipul Hossen, Md. Rabiul Auwul

03 Mar 2020

TL;DR: This study provides practical evaluation frameworks for accessing clustering results on gene expression cancer datasets and determines that PAM isbest for Affymetrix data set and DIANA is best for cDNA dataset among these four clustering algorithms.

...read moreread less

Abstract: Clustering plays a particularly fundamental role in exploring data, creating predictions and to overcome the anomalies in the data. Clusters that contain parallel, identical characteristics in a dataset are grouped using reiterative algorithms. As the data in real world is rising day by day so the challenges of perceiving and interpreting the consequential mass of data, which often consists of millions of measurements are increased by the intricacy of a huge number of genes of biological networks. To addressing this challenge, we use clustering algorithms. In this study, we provided a comparative study of the four most popular clustering algorithms: K-Means, PAM, Agglomerative Hierarchical and DIANA and these are evaluated on eight real cancer (four Affymetrix and four cDNA) gene data and simulated data set. The comparative results based upon seven popular cluster validity indices: Average Silhouette Index, Corrected rand Index, Variation of Information, Dunn Index, Calinski-Harabasz Index, Separation Index, and Pearson Gamma. We determine that PAM is best for Affymetrix data set and DIANA is best for cDNA dataset among these four clustering algorithms. This study provides practical evaluation frameworks for accessing clustering results on gene expression cancer datasets.

...read moreread less

Proceedings Article•DOI•

Clustering Research Papers Using Genetic Algorithm Optimized Self-Organizing Maps

[...]

Reham Fathy M. Ahmed¹, Cherif Salama¹, Hani Mahdi¹•Institutions (1)

Ain Shams University¹

15 Dec 2020

TL;DR: In this article, the authors proposed an approach for automatic clustering for text document using a Self-Organizing Map (SOM) which is one of unsupervised artificial neural network that widely used for data analysis, data compression, clustering, and data mining.

...read moreread less

Abstract: With the huge amount of published research papers, retrieving relevant information is a difficult task for any researcher Effective clustering algorithms can help improve and simplify the retrieval process Here, we propose an approach for automatic clustering for text document using a Self-Organizing Map (SOM) It is one of unsupervised artificial neural network that widely used for data analysis, data compression, clustering, and data mining The quality and accuracy of a SOM algorithm depends on the selection of values for some of its parameters which are its initial learning rate, SOM matrix dimensions, and the number of iterations Best values are typically selected using trial and error; however, in the current paper we suggest a more systematic approach to parameters optimization using the genetic algorithm The proposed method is applied to cluster 3 scientific papers datasets using their keywords Similar research papers were mapped closer to each other Clustering results were validated using the Dunn index

...read moreread less

DOI•

Cluster Validation for Mixed-Type Data

[...]

Rabea Aschenbruck, Gero Szepannek

01 Jan 2020

TL;DR: This paper examines the transferability of validation indices, such as the Gamma index, Average Silhouette Width or Dunn index to mixed-type data, and the R package clustMixType is extended by these indices and their application is demonstrated.

...read moreread less

Abstract: For cluster analysis based on mixed-type data (ie data consisting of numerical and categorical variables), comparatively few clustering methods are available One popular approach to deal with this kind of problems is an extension of the k-means algorithm (Huang, 1998), the so-called k-prototype algorithm, which is implemented in the R package clustMixType (Szepannek and Aschenbruck, 2019) It is further known that the selection of a suitable number of clusters k is particularly crucial in partitioning cluster procedures Many implementations of cluster validation indices in R are not suitable for mixed-type data This paper examines the transferability of validation indices, such as the Gamma index, Average Silhouette Width or Dunn index to mixed-type data Furthermore, the R package clustMixType is extended by these indices and their application is demonstrated Finally, the behaviour of the adapted indices is tested by a short simulation study using different data scenarios

...read moreread less

Proceedings Article•DOI•

Handwriting Styles Clustering: Feature Selection and Feature Space Analysis based on Online Input

[...]

Karyna Korovai¹, Oleksandr Marchenko¹•Institutions (1)

Samsung¹

01 Aug 2020

TL;DR: A number of handcrafted features for handwritten text are considered, a feature space analysis is performed and an expert system for identifying and clustering handwriting styles using unsupervised methods based on this set of features is built.

...read moreread less

Abstract: In this paper, we perform a detailed analysis of feature extraction approaches and existing solutions for the problem of handwriting styles clustering. We consider a number of handcrafted features for handwritten text, perform a feature space analysis for finding the best set of extracted features and build an expert system for identifying and clustering handwriting styles using unsupervised methods based on this set. We observed an improvement in clustering results by analyzing the described in this paper metrics for clustering evaluation (such as the Dunn index, silhouette, within- and between-cluster sums of squared errors), therefore we can conclude that the clustering based on such a subset is much more efficient. The results outlined below can be used for handwriting style determination, which is an important step in designing systems for text recognition and localization, authentication and verification tasks, and also can be applied to the problem of the detection of the mental or nervous disorders.

...read moreread less

Journal Article•DOI•

A novel distance measure for microarray dataset using entropy

[...]

S. Sumathi, G. Hannah Grace

02 Dec 2020-Materials Today: Proceedings

TL;DR: A novel distance measure for microarray dataset is proposed and the bench mark algorithm k-medoids is used for clustering task and Dunn Index is used to analyze the cluster validation of the results obtained from the distance measure.

...read moreread less

Journal Article•DOI•

Validity Test of Self-Organizing Map (SOM) and K-Means Algorithm for Employee Grouping

[...]

Titik Susilowati¹, Dedy Sugiarto¹, Is Mardianto¹•Institutions (1)

Trisakti University¹

26 Dec 2020

TL;DR: This study aims to group employees based on their level of discipline using the Self Organizing Map (SOM) and K-Means algorithm to make it easier to manage employee work discipline.

...read moreread less

Abstract: Managing employee work discipline needs to be done to support the development of an organization. One way to make it easier to manage employee work discipline is to group employees based on their level of discipline. This study aims to group employees based on their level of discipline using the Self Organizing Map (SOM) and K-Means algorithm. This grouping begins with collecting employee attendance data, then processing attendance data where one of them is determining the parameters to be used, then ending by implementing the clustering algorithm using the SOM and K-Means algorithms. The results of grouping that have been obtained from the implementation of the SOM and K-Means algorithms are then validated using an internal validation test consisting of the Dunn Index, the Silhouette Index and the Connectivity Index to obtain the best number of clusters and algorithms. The results of the validation test obtained 3 best clusters for the level of discipline, namely the disciplinary cluster, the moderate cluster and the undisciplined cluster.

...read moreread less

Posted Content•DOI•

Data Driven Analytics for Personalized Medical Decision Making

[...]

Nataliia Melnykova, Nataliya Shakhovska, Michal Greguš, Volodymyr Melnykov, Mariana Zakharchuk, Olena Vovk - Show less +2 more

05 Jul 2020

...read moreread less

Book Chapter•DOI•

A Comprehensive Analysis of Kernelized Hybrid Clustering Algorithms with Firefly and Fuzzy Firefly Algorithms

[...]

B. K. Tripathy¹, Anmol Agrawal¹•Institutions (1)

VIT University¹

01 Jan 2020

TL;DR: This paper uses both Firefly and Fuzzy Firefly algorithms separately along with algorithms like FCM, IFCM and RFCM and analyses their efficiency using two measures DB and D to conclude that RFCM with Hyper-tangent kernel and fuzzy firefly produce the best results with fastest convergence rate.

...read moreread less

Abstract: In order to handle the problem of linear separability in the early data clustering algorithms, Euclidean distance is being replaced with Kernel functions as measures of similarity. Another problem with the clustering algorithms is the selection of initial centroids randomly, which affects not only the final result but also decreases the convergence rate. Optimal selection of initial centroids through optimization algorithms like Firefly or Fuzzy Firefly algorithms provide partial solution to this problem. In this paper, we focus on two kernels; Gaussian and Hyper-tangent and use both Firefly and Fuzzy Firefly algorithms separately along with algorithms like FCM, IFCM and RFCM and analyse their efficiency using two measures DB and D. Our analysis concludes that RFCM with Hyper-tangent kernel and fuzzy firefly produce the best results with fastest convergence rate. We use the two images; MRI scan of a human brain and blood cancer cells for our analysis.

...read moreread less

Proceedings Article•DOI•

Topic-Based Tweet Clustering for Public Figures Using Ant Clustering

[...]

Diaz Harizky Firdaus¹, Suyanto Suyanto¹•Institutions (1)

Telkom University¹

10 Dec 2020

TL;DR: In this paper, a tweet clustering system to determine topics from many documents in the form of text through text mining method using an ant clustering (AC) technique is presented.

...read moreread less

Abstract: The aspects of life on a public figure, discussed by the community, are often exploited by the news media as topic information to create an article that can attract the attention of the reader. Efficiently, the media only needs to pay attention to social media to get some of the information. The more information needed leads to a vast amount of data involved, so the process becomes hard. In this paper, a tweet clustering system to determine topics from many documents in the form of text through text mining method using an ant clustering (AC) technique. AC is one of swarm intelligence algorithms inspired by the behavior of the ant colony in sorting corpses. Evaluation of a small dataset of text documents shows that four topics are successfully concluded: economy, social, politics, and government. The developed AC-based tweet clustering system produces an average cluster quality of the Dunn Index up to 0,3455.

...read moreread less

Proceedings Article•DOI•

Improved random forest classification approach based on hybrid clustering selection

[...]

Dong Yuan¹, Jian Huang¹, Xu Yang¹, Jiarui Cui¹•Institutions (1)

University of Science and Technology Beijing¹

06 Nov 2020

TL;DR: Wang et al. as mentioned in this paper proposed a random forest method of clustering ensemble selection with Dunn index, which is based on the random forest algorithm of cluster integration selection and personal indoor thermal preference model, considering the shortcomings of the irrevocable merging strategy of hierarchical clustering algorithm.

...read moreread less

Abstract: The random forest algorithm is an ensemble learning method, with the decision tree as its base classifier In the ensemble model, it is not always true that the more the base classifiers, the better the classification effect, since if there are more base classifiers with poor performance in the model, they may have negative impacts on the final classification result In order to modify the random forest classification method under the premise of ensuring the diversity of the random forest model, based on the random forest algorithm of cluster integration selection and personal indoor thermal preference model, this paper proposes a random forest method of clustering ensemble selection with Dunn index Considering the shortcomings of the irrevocable merging strategy of hierarchical clustering algorithm, a random forest method of hybrid clustering ensemble selection based on hierarchical clustering and k- medoids partition clustering is developed The effectiveness of the proposed methods is verified by classifying personal indoor thermal preferences

...read moreread less

Proceedings Article•DOI•

Analysis of Expression Data Using Unsupervised Techniques

[...]

M. A. I. Perera¹, C. R. Wijesinghe¹, A. R. Weerasinghe¹•Institutions (1)

University of Colombo¹

04 Nov 2020

TL;DR: In this paper, an unsupervised learning technique was used to identify subtypes of cancer using gene expression data obtained from CBioportal, which can help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics.

...read moreread less

Abstract: This study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Gene expression data obtained from CBioportal was analyzed in this research using unsupervised learning techniques. Partitioning around medoids, K-means and Hierarchical clustering techniques with different distance and linkage measures were used in initial clustering of expression data. After the cluster identification, cluster validation was conducted according to internal measures like Silhouette, Dunn index. Relative measures were used to identify optimal number of clusters. External validations like comparing the classes with clinical variables and visual analysis of the classes using heatmaps were conducted. After heatmap filtering, it was identified that the three cluster analysis results have meaningful clusters. The cluster analysis with 3 clusters identified using k means clustering has significant expression patterns in each cluster.

...read moreread less

Book Chapter•DOI•

Impact of Dimensionality on the Evaluation of Stream Data Clustering Algorithms

[...]

Naresh Kumar Nagwani¹•Institutions (1)

National Institute of Technology, Raipur¹

20 Feb 2020

TL;DR: In this article, the authors explore the impact of dimensionality over the existing standard data stream clustering algorithms and compare them for different dimensions of stream using six performance parameters, namely adjusted Rand index, Dunn index, entropy, F1 measure, purity and within cluster sum of square measure.

...read moreread less

Abstract: Handling stream data is a tedious task. Recently numerous techniques are presented for analysing stream data. Stream data clustering is one of the important tasks in stream data mining. A number of application programming interfaces (APIs) are available for implementing the stream data clustering. These APIs can handle the stream data of any dimension. The objective of the presented paper is to explore the impact of dimensionality over the existing standard data stream clustering algorithms. Selected standard data stream clustering algorithms are compared for different dimensions of stream using six performance parameters, namely adjusted Rand index, Dunn index, entropy, F1 measure, purity and within cluster sum of square measure.

...read moreread less

Book Chapter•DOI•

Optimal Number of Seed Point Selection Algorithm of Unknown Dataset.

[...]

Kuntal Chowdhury¹, Debasis Chaudhuri², Arup Kumar Pal¹•Institutions (2)

Indian Institutes of Technology¹, Defence Research and Development Organisation²

01 Jan 2020

TL;DR: The optimal number of seed points selection algorithm of an unknown data based on two important internal cluster validity indices, namely, Dunn Index and Silhouette Index is described, where Shannon’s entropy with the threshold value of distance has been used to calculate the position of the seed point.

...read moreread less

Abstract: In the present world, clustering is considered to be the most important data mining tool which is applied to huge data to help the futuristic decision-making processes. It is an unsupervised classification technique by which the data points are grouped to form the homogeneous entity. Cluster analysis is used to find out the clusters from a unlabeled data. The position of the seed points primarily affects the performances of most partitional clustering techniques. The correct number of clusters in a dataset plays an important role to judge the quality of the partitional clustering technique. Selection of initial seed of K-means clustering is a critical problem for the formation of the optimal number of the cluster with the benefit of fast stability. In this paper, we have described the optimal number of seed points selection algorithm of an unknown data based on two important internal cluster validity indices, namely, Dunn Index and Silhouette Index. Here, Shannon’s entropy with the threshold value of distance has been used to calculate the position of the seed point. The algorithm is applied to different datasets and the results are comparatively better than other methods. Moreover, the comparisons have been done with other algorithms in terms of different parameters to distinguish the novelty of our proposed method.

...read moreread less

Book Chapter•DOI•

Cycle Based Clustering Using Reversible Cellular Automata

[...]

Sukanya Mukherjee, Kamalika Bhattacharjee¹, Sukanta Das²•Institutions (2)

Indian Institutes of Information Technology¹, Indian Institute of Engineering Science and Technology, Shibpur²

10 Aug 2020

TL;DR: In this paper, cycle based clustering technique using reversible cellular automata (CAs) where closeness among objects is represented as objects belonging to the same cycle, that is reachable from each other.

...read moreread less

Abstract: This work proposes cycle based clustering technique using reversible cellular automata (CAs) where ‘closeness’ among objects is represented as objects belonging to the same cycle, that is reachable from each other. The properties of such CAs are exploited for grouping the objects with minimum intra-cluster distance while ensuring that limited number of cycles exist in the configuration-space. The proposed algorithm follows an iterative strategy where the clusters with closely reachable objects of previous level are merged in the present level using an unique auxiliary CA. Finally, it is observed that, our algorithm is at least at par with the best algorithm existing today.

...read moreread less

Book Chapter•DOI•

A Method for Community Partition Based on Information Granularity

[...]

Tianchu Hang¹, Yang Bai², Guishi Deng³•Institutions (3)

University of California, Berkeley¹, Eastern Liaoning University², Dalian University of Technology³

26 Sep 2020

TL;DR: This article argues a method for community partition based on information granularity, which optimize the social relationship model by using the link prediction method and establish the similarity model of user social relationship and obtains better I index, and Dunn index evaluation results compared with K-means.

...read moreread less

Abstract: The social network community partition is conducive to obtaining hidden and valuable knowledge and rules, which is currently a hot research perspective. Traditional community mining often analyzes network structure information from a static point of view, but ignores the analysis of individual actors’ initiative, which limits the construction of community concept model and the effect of community partition. This article argues a method for community partition based on information granularity. First, we optimize the social relationship model by using the link prediction method and establish the similarity model of user social relationship. Second, aiming at the deficiency of K-means clustering algorithm and the defect of high dimension and sparsity of data, the principle of information granularity is introduced in user clustering analysis, and membership degree and generalized equivalence relation of user equivalence relation are given respectively. On this basis, we propose a social community partition method based on the information granularity. Finally, experiments show that, because of the effective integration of the important information of users’ social relations and the introduction of information granularity method, the proposed model obtains better I index, and Dunn index evaluation results compared with K-means.

...read moreread less

Journal Article•DOI•

Cluster Analysis of Earthquake's Data Clustering in Indonesia using Fuzzy K-means Clustering

[...]

Ayyubi Ahmad¹, Nursyiva Irsalinda¹•Institutions (1)

Universitas Ahmad Dahlan¹

30 Apr 2020

Abstract: The earthquake is shocks or vibrations in the earth's surface because of shifting layers of rock at the base of the earth's surface. This natural phenomenon is common in Indonesia because it lies between Australian, Eurasian, Pacific plates, and it location surrounded by a ring of fire precisely. Therefore, this study aims to cluster earthquake events in Indonesia and describe the characteristics of each group based on clustering results. The method used is the Fuzzy K-Means Clustering. The clustering results obtained from clustering based on the depth, longitude, and latitude. In this study, the data used is the earthquake's data, which has a magnitude greater than or equal to 5 SR and only clumped by depth. Based on the Davies-Bouldin and Dunn index, the best clustering is 2 clusters which researchers cluster earthquake data in Indonesia into deep and shallow clusters.

...read moreread less