scispace - formally typeset
Search or ask a question
Book ChapterDOI

Rough K-means Algorithm Based on the Boundary Object Difference Metric

22 Oct 2021-pp 309-318
TL;DR: Zhang et al. as discussed by the authors proposed a new rough k-means algorithm to measure the weight of boundary objects, which considers the distance from boundary objects to their neighbor points and the number of neighbor points together to dynamically calculate the weights of boundary object to clusters that may belong to.
Abstract: Rough k-means algorithm can effectively deal with the problem of the fuzzy boundaries. But traditional rough k-means algorithm set unified weight for boundary object, ignoring the differences between individual objects. Membership degree method of rough fuzzy k-means algorithm is used to measure the membership degree of boundary object to the clusters that they may belong to, ignoring the distribution of neighbor points of the boundary object. So, according to the distribution of neighbor points of the boundary object, we put forward a new rough k-means algorithm to measure the weight of boundary objects. The proposed algorithm considers the distance from boundary objects to their neighbor points and the number of neighbor points of boundary objects together to dynamically calculate the weights of boundary object to clusters that may belong to. Simulation and experiment, through examples verify the effectiveness of the proposed method.
References
More filters
Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations

Journal ArticleDOI
01 Jul 2004
TL;DR: A variation of the K-means clustering algorithm based on properties of rough sets is proposed, which represents clusters as interval or rough sets.
Abstract: Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.

493 citations

Journal ArticleDOI
01 Aug 2006
TL;DR: A novel clustering architecture is introduced, in which several subsets of patterns can be processed together with an objective of finding a common structure, and the required communication links are established at the level of cluster prototypes and partition matrices.
Abstract: In this study, we introduce a novel clustering architecture, in which several subsets of patterns can be processed together with an objective of finding a common structure. The structure revealed at the global level is determined by exchanging prototypes of the subsets of data and by moving prototypes of the corresponding clusters toward each other. Thereby, the required communication links are established at the level of cluster prototypes and partition matrices, without hampering the security concerns. A detailed clustering algorithm is developed by integrating the advantages of both fuzzy sets and rough sets, and a measure of quantitative analysis of the experimental results is provided for synthetic and real-world data

241 citations

Journal ArticleDOI
TL;DR: A refined rough cluster algorithm is presented that is applied to synthetic, forest and microarray gene expression data and successfully applied to web mining.

191 citations