# Fundamentals of Fuzzy Clustering

## Summary (5 min read)

### 1.1 INTRODUCTION

- The goal is to divide the data-set in such a way that objects (or example cases) belonging to the same cluster are as similar as possible, whereas objects belonging to different clusters are as dissimilar as possible.
- By arranging similar objects into clusters one tries to reconstruct the unknown structure in the hope that every cluster found represents an actual type or category of objects.
- As a result one yields a partition of the data-set into clusters regarding the chosen dissimilarity relation.
- It is a generalization of the AO scheme for cluster model optimization, which offers more modeling flexibility without deriving parameter update equations from optimization constraints.

### 1.2 BASIC CLUSTERING ALGORITHMS

- The authors present the fuzzy C-means and possibilistic C-means, deriving them from the hard c-means clustering algorithm.
- All algorithms described in this section are based on objective functions J, which are mathematical criteria that quantify the goodness of cluster models that comprise prototypes and data partition.
- Thus, in their presentation of the hard, fuzzy, and possibilistic c-means the authors discuss their respective objective functions first.
- The authors address the most important of the proposed objective function variants in Section 1.4.
- Data points can belong to more than one cluster and even with different degrees of membership to the different clusters.

### 1.2.1 Hard c-means

- In the C-means such a data partition is said to be optimal when the sum of the squared distances between the cluster centers and the data points assigned to them is minimal (Krishnapuram and Keller, 1996).
- Therefore, the hard C-means clustering algorithm, also known as ISODATA algorithm (Ball and Hall, 1966; Krishnapuram and Keller, 1996), minimizes Jh using an alternating optimization (AO) scheme.
- By iterating the two (or more) steps the joint optimum is approached, although it cannot be guaranteed that the global optimum will be reached.
- This can be done randomly, i.e., by picking c random vectors that lie within the smallest (hyper-)box that encloses all data; or by initializing cluster centers with randomly chosen data points of the given data-set.
- Then the data partition U is held fixed and new cluster centers are computed as the mean of all data vectors assigned to them, since the mean minimizes the sum of the square distances in Jh.

### 1.2.3 Possibilistic c-means

- The ‘relative’ character of the probabilistic membership degrees can be misleading (Timm, Borgett, Döring and Kruse, 2004).
- Their membership values consequently affect the clustering results, since data point weight attracts cluster prototypes.
- Depending on the cluster’s shape the i have different geometrical interpretation.
- In that case these parameters must be estimated.
- Update equations jC for the prototypes are as well derived by simply setting the derivative of the objective function Jp w.r.t. the prototype parameters to optimize equal to zero (holding the membership degrees Up fixed).

### 1.3 DISTANCE FUNCTION VARIANTS

- In the previous section, the authors considered the case where the distance between cluster centers and data points is computed using the Euclidean distance, leading to the standard versions of fuzzy C-means and possibilistic C-means.
- This distance only makes it possible to identify spherical clusters.
- The authors review some of them, mentioning the fuzzy Gustafson–Kessel algorithm, fuzzy shell clustering algorithms and kernel-based variants.
- All of them can be applied both in the fuzzy probabilistic and possibilistic framework.
- The authors consider the variants that handle object data and do not present the relational approach.

### 1.3.1 Gustafson–Kessel Algorithm

- The Gustafson–Kessel algorithm (Gustafson and Kessel, 1979) replaces the Euclidean distance by a cluster-specific Mahalanobis distance, so as to adapt to various sizes and forms of the clusters.
- Specific constraints can be taken into account, for instance restricting to axis-parallel cluster shapes, by considering only diagonal matrices.
- The update equations for the membership degrees are identical to those indicated in Equation (1.13) and Equation (1.17) for the FCM and PCM variants respectively, replacing the Euclidean distance by the cluster specific distance given above in Equation (1.19).
- The Gustafson–Kessel algorithm tries to extract much more information from the data than the algorithms based on the Euclidean distance.

### 1.4 OBJECTIVE FUNCTION VARIANTS

- The previous variants of fuzzy C-means are obtained when considering different distance functions that lead to a rewrite of the objective functions and in some cases modify the update equations.
- The authors consider other variants that are based on deeper modifications of the objective functions.
- Others study at a theoretical level the role of the fuzzifier m in the objective function (see notations in Equation (1.10)) and propose some modifications.
- When giving update equations for cluster prototypes, the authors consider only the case where the Euclidean distance is used and when prototypes are reduced to cluster centers.
- The interested reader is referred to the original papers.

### 1.4.1 Noise Handling Variants

- The first variants of fuzzy C-means the authors consider aim at handling noisy data.
- When giving the considered objective functions, the authors do not recall the constraints indicated in Equations (1.8) and (1.9) that apply in all cases.
- The aim of these variants is then to define robust fuzzy clustering algorithms, i.e., algorithms whose results do not depend on the presence or absence of noisy data points or outliers1 in the data-set.
- Three approaches are mentioned here: the first one is based on the introduction of a specific cluster, the so-called noise cluster that is used to represent noisy data points.
- The second method is based on the use of robust estimators, and the third one reduces the influence of noisy data points by defining weights denoting the point representativeness.

### 1.4.3 Cluster Number Determination Variants

- Partitioning clustering algorithms consist of searching for the optimal fuzzy partition of the data-set into c clusters, where c is given as input to the algorithm.
- In most real data mining cases, this parameter is not known in advance and must be determined.
- Yet, as mentioned earlier, at a theoretical level, PCM relies on an ill-posed optimization problem and other approaches should be considered.
- Then the combination of terms in the objective function makes it possible to find the optimal partition in the smallest possible number of clusters.
- A robust extension to CA has been proposed in Frigui and Krishnapuram (1999): the first term in Equation (1.36) is then replaced by the term provided in Equation (1.28) to exploit the robust estimator properties.

### 1.4.4 Possibilistic c-means Variants

- As indicated in Section 1.24, the possibilistic C-means may lead to unsatisfactory results, insofar as the obtained clusters may be coincident.
- This is due to the optimized objective function, whose global minimum is obtained when all clusters are identical (see Section 1.2.4).
- Hence the possibilistic C-means can be improved by modifying its objective function.
- The authors mention here two PCM variants, based on the adjunction of a penalization term in the objective function and the combination of PCM with FCM.

### 1.5 UPDATE EQUATION VARIANTS: ALTERNATING CLUSTER ESTIMATION

- The authors study the fuzzy clustering variants that generalize the alternating optimization scheme used by the methods presented up to now.
- Thus, if fuzzy sets with limited support as in fuzzy controllers are desired, possibilistic membership functions are inadequate as well.
- Therefore ACE allows you to choose other membership functions aside from those that stem from an objective function-based AO scheme.
- In ACE, a large variety of parameterized equations stemming from defuzzification methods are offered for the re-estimation of cluster centers for fixed memberships.
- Notice that all conventional objective function-based algorithms can be represented as instances of the more general ACE framework by selecting their membership functions as well as their prototype update equations.

### 1.6 CONCLUDING REMARKS

- Starting from the basic algorithms and underlining the difference between the probabilistic and possibilistic paradigms.the authors.
- The authors then described variants of the basic algorithms, adapted to specific constraints or expectations.
- The authors further pointed out major research directions associated with fuzzy clustering.
- In this conclusion the authors briefly point out further research directions that they could not address in the main part of the chapter due to length constraints.

### 1.6.1 Clustering Evaluation

- An important topic related to clustering is that of cluster evaluation, i.e., the assessment of the obtained clusters quality: clustering is an unsupervised learning task, which means data points are not associated with labels or targets that indicate the desired output.
- Some criteria are specifically dedicated to fuzzy clustering: the partition entropy criterion for instance computes the entropy of the obtained membership degrees, PE ¼ X i;j uij log uij; and must be minimized (Bezdek, 1975).
- A data partition that is too fuzzy rather indicates a bad adequacy between the cluster number and the considered data-set and it should be penalized.
- Such criteria can be used to evaluate quantitatively the clustering quality and to compare algorithms one with another.
- They can also be applied to compare the results obtained with a single algorithm, when the parameter values are changed.

### 1.6.2 Shape and Size Regularization

- As presented in Section 1.3.1, some fuzzy clustering algorithms make it possible to identify clusters of ellipsoidal shapes and with various sizes.
- This flexibility implies that numerous cluster parameters are to be adjusted by the algorithms.
- The more parameters are involved the more sensitive the methods get to their initialization.
- Lately, a new approach has been proposed (Borgelt and Kruse, 2005) that relies on regularization to introduce shape and size constraints to handle the higher degrees of freedom effectively.
- With a timedependent shape regularization parameter, this method makes it possible to perform a soft transition from the fuzzy C-means (spherical clusters) to the Gustafson–Kessel algorithm (general ellipsoidal clusters).

### 1.6.3 Co-clustering

- Co-clustering, also called bi-clustering, two mode clustering, two way clustering or subspace clustering, has the specific aim of simultaneously identifying relevant subgroups in the data and relevant attributes for each subgroup: it aims at performing both clustering and local attribute selection.
- Other applications include text mining, e.g., for the identification of both document clusters and their characteristic keywords (Kummamuru, Dhawale, and Krishnapuram, 2003).
- Many dedicated clustering algorithms have been proposed, including fuzzy clustering methods as for instance Frigui and Nasraoui (2000).

### 1.6.4 Relational Clustering

- The methods described in this chapter apply to object data, i.e., consider the case where a description is provided for each data point individually.
- In other cases, this information is not available, the algorithm input takes the form of a pairwise dissimilarity matrix.
- The latter has size n n, each of its elements indicates the dissimilarity between point couples.
- Relational clustering aims at identifying clusters exploiting this input.
- The interested reader is also referred to the respective chapter in Bezdek, Keller, Krishnapuram, and Pal (1999).

### 1.6.5 Semisupervised Clustering

- Yet it may be the case that the user has some a priori knowledge about couples of points that should belong to the same cluster.
- Semisupervised clustering is concerned with this learning framework, where some partial information is available : the clustering results must then verify additional constraints, implied by these pieces of information.
- Specific clustering algorithms have been proposed to handle these cases; the interested reader is referred to chapter 7 in this book.

Did you find this useful? Give us your feedback

##### Citations

85 citations

64 citations

61 citations

59 citations

### Cites background from "Fundamentals of Fuzzy Clustering"

...Here cases can have fractional degrees of membership, analogous to fuzzy set memberships, in several clusters, with these memberships, in the basic so-called probabilistic variant of FCA, set to add to 1 7 (de Oliveira & Pedrycz, 2007; Kruse et al., 2007)....

[...]

...In fuzzy clustering, partitions of cases produced under this constraint can be misleading (Kruse et al., 2007, p. 10) given some distributions of cases in multidimensional space....

[...]

...This gives us confidence that the ‘sharing’ of memberships produced by probabilistic FCA is not greatly compromising the ‘typicality’ aspect here (on these features of membership, see Kruse et al. (2007))....

[...]

49 citations

##### References

[...]

52,705 citations

^{1}

40,147 citations

15,662 citations

14,009 citations