scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Decision Theory, an Unprecedented Validation Scheme for Rough-Fuzzy Clustering

21 Apr 2016-International Journal on Artificial Intelligence Tools (World Scientific Publishing Company)-Vol. 25, Iss: 02, pp 1650003
TL;DR: Decision theory is propagated, an unprecedented validation scheme for Rough-Fuzzy clustering by resolving loss and probability calculations to predict the risk measure in clustering techniques, proven to deduce the optimal number of clusters overcoming the downsides of traditional validation frameworks.
Abstract: Cluster validation is an essential technique in all cluster applications. Several validation methods measure the accuracy of cluster structure. Typical methods are geometric, where only distance and membership form the core of validation. Yao's decision theory is a novel approach for cluster validation, which evolved loss calculations and probabilistic based measure for determining the cluster quality. Conventional rough set algorithms have utilized this validity measure. This paper propagates decision theory, an unprecedented validation scheme for Rough-Fuzzy clustering by resolving loss and probability calculations to predict the risk measure in clustering techniques. Experiments with synthetic and UCI datasets have been performed, proven to deduce the optimal number of clusters overcoming the downsides of traditional validation frameworks. The proposed index can also be applied to other clustering algorithms and extends the usefulness in business oriented data mining.
Citations
More filters
Proceedings ArticleDOI
22 Mar 2018
TL;DR: An approach based on the combination of Kapur's entropy and K-means clustering is considered here to mine the optic disc region from the RGB retinal picture to assess the Retinal-Optic-Disc to assess its condition.
Abstract: Generally, retinal picture valuation is commonly executed to appraise the diseases. In this paper, an image examination technique is implemented to extract the Retinal-Optic-Disc (ROD) to assess its condition. An approach based on the combination of Kapur's entropy and K-means clustering is considered here to mine the optic disc region from the RGB retinal picture. During the experimental implementation, this approach is tested with the DRIVE and RIM-ONE databases. Initially, the DRIVE pictures are considered to appraise the proposed approach and later, the RIM-ONE dataset is considered for the testing. After extracting the ROD, comparative analyses with the expert's Ground-Truths are carried out and the image similarity values are then recorded. This approach is then validated against the Otsu's+levelset existing in the literature. All these experiments are implemented using Matlab2010. The outcome of this procedure confirms that, proposed work provides better picture similarity values compared to Otsu's+levelset. Hence, in future, this procedure can be considered to evaluate the clinical retinal images.

8 citations


Cites background from "Decision Theory, an Unprecedented V..."

  • ..., Gk) for (k ≤ n) in order to reduce the within-cluster sum of squares [24-28]....

    [...]

Journal ArticleDOI
TL;DR: This work provides alternative clustering algorithms that have been applied to the same dataset and yielded diverse clustering outcomes and chooses the most appropriate one to handle the situation at hand.
Abstract: In clustering problem analysis, Ensemble Cluster is proven to be a viable solution. Creating a cluster for such a comparable dataset and combining it into a separate grouping the clustering quality may be improved by using the combining clustering technique. Consensus clustering is another term for Ensemble clustering. Cluster Ensemble is a potential technique for clustering heterogeneous or multisource data. The findings of spectral ensemble clustering were utilized to reduce the algorithm's complexity. We now provide alternative clustering algorithms that have been applied to the same dataset and yielded diverse clustering outcomes. Because the many strategies were all described, it was easier to choose the most appropriate one to handle the situation at hand. To forecast the degree of student achievement in placement, clustering is created on the preprocessed information using clustering's specifically normalized k-means comparing with K-Medoids and Clarans algorithms.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.
Abstract: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.

6,757 citations


"Decision Theory, an Unprecedented V..." refers background in this paper

  • ...It contains 210 instances with 7 attributes such as (1) Area (2) Perimeter (3) Compactness (4) Length of kernel (5) Width of kernel (6) Asymmetry coefficient (7) Length of kernel In t. J....

    [...]

  • ...Seed data set It contains 210 instances with 7 attributes such as (1) Area (2) Perimeter (3) Compactness (4) Length of kernel (5) Width of kernel (6) Asymmetry coefficient (7) Length of kernel In t....

    [...]

  • ...The attribute information as follows: (1) animal name (2) hair (3) feathers (4) eggs (5) milk (6) air borne (7) aquatic (8) predator (9) toothed (10) backbone (11) breathes (12) venomous (13) fins (14) legs (15) tail (16) domestic (17) catsize....

    [...]

  • ...(3) e = ∑ "f gE∈i:; (4) = ∑ "f gE∈dj>(:;) (5)...

    [...]

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations


"Decision Theory, an Unprecedented V..." refers background in this paper

  • ...The attribute information as follows: (1) animal name (2) hair (3) feathers (4) eggs (5) milk (6) air borne (7) aquatic (8) predator (9) toothed (10) backbone (11) breathes (12) venomous (13) fins (14) legs (15) tail (16) domestic (17) catsize....

    [...]

Journal ArticleDOI
TL;DR: A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters to provide a variety of clustering solutions.
Abstract: A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.

3,551 citations


"Decision Theory, an Unprecedented V..." refers background in this paper

  • ...The attribute information as follows: (1) animal name (2) hair (3) feathers (4) eggs (5) milk (6) air borne (7) aquatic (8) predator (9) toothed (10) backbone (11) breathes (12) venomous (13) fins (14) legs (15) tail (16) domestic (17) catsize....

    [...]

  • ...Accordingly Loss # also represented as # " is given by # = 0, if ∈ (9) # = 1, if ∉ (10)...

    [...]

Journal ArticleDOI
TL;DR: The fundamental concepts of cluster validity are introduced, and a review of fuzzy cluster validity indices available in the literature are presented, and extensive comparisons of the mentioned indices are conducted in conjunction with the Fuzzy C-Means clustering algorithm.

489 citations


"Decision Theory, an Unprecedented V..." refers background in this paper

  • ...Seed data set It contains 210 instances with 7 attributes such as (1) Area (2) Perimeter (3) Compactness (4) Length of kernel (5) Width of kernel (6) Asymmetry coefficient (7) Length of kernel In t....

    [...]

  • ...The attribute information as follows: (1) animal name (2) hair (3) feathers (4) eggs (5) milk (6) air borne (7) aquatic (8) predator (9) toothed (10) backbone (11) breathes (12) venomous (13) fins (14) legs (15) tail (16) domestic (17) catsize....

    [...]

  • ...(3) e = ∑ "f gE∈i:; (4) = ∑ "f gE∈dj>(:;) (5)...

    [...]

Journal ArticleDOI
01 Aug 2006
TL;DR: A novel clustering architecture is introduced, in which several subsets of patterns can be processed together with an objective of finding a common structure, and the required communication links are established at the level of cluster prototypes and partition matrices.
Abstract: In this study, we introduce a novel clustering architecture, in which several subsets of patterns can be processed together with an objective of finding a common structure. The structure revealed at the global level is determined by exchanging prototypes of the subsets of data and by moving prototypes of the corresponding clusters toward each other. Thereby, the required communication links are established at the level of cluster prototypes and partition matrices, without hampering the security concerns. A detailed clustering algorithm is developed by integrating the advantages of both fuzzy sets and rough sets, and a measure of quantitative analysis of the experimental results is provided for synthetic and real-world data

241 citations


"Decision Theory, an Unprecedented V..." refers background in this paper

  • ...Seed data set It contains 210 instances with 7 attributes such as (1) Area (2) Perimeter (3) Compactness (4) Length of kernel (5) Width of kernel (6) Asymmetry coefficient (7) Length of kernel In t....

    [...]

  • ...The attribute information as follows: (1) animal name (2) hair (3) feathers (4) eggs (5) milk (6) air borne (7) aquatic (8) predator (9) toothed (10) backbone (11) breathes (12) venomous (13) fins (14) legs (15) tail (16) domestic (17) catsize....

    [...]