Book ChapterDOI

# Fundamentals of Fuzzy Clustering

11 May 2007-pp 1-30
About: The article was published on 2007-05-11 and is currently open access. It has received 78 citations till now. The article focuses on the topics: Fuzzy clustering & Cluster analysis.

### 1.1 INTRODUCTION

• The goal is to divide the data-set in such a way that objects (or example cases) belonging to the same cluster are as similar as possible, whereas objects belonging to different clusters are as dissimilar as possible.
• By arranging similar objects into clusters one tries to reconstruct the unknown structure in the hope that every cluster found represents an actual type or category of objects.
• As a result one yields a partition of the data-set into clusters regarding the chosen dissimilarity relation.
• It is a generalization of the AO scheme for cluster model optimization, which offers more modeling flexibility without deriving parameter update equations from optimization constraints.

### 1.2 BASIC CLUSTERING ALGORITHMS

• The authors present the fuzzy C-means and possibilistic C-means, deriving them from the hard c-means clustering algorithm.
• All algorithms described in this section are based on objective functions J, which are mathematical criteria that quantify the goodness of cluster models that comprise prototypes and data partition.
• Thus, in their presentation of the hard, fuzzy, and possibilistic c-means the authors discuss their respective objective functions first.
• The authors address the most important of the proposed objective function variants in Section 1.4.
• Data points can belong to more than one cluster and even with different degrees of membership to the different clusters.

### 1.2.1 Hard c-means

• In the C-means such a data partition is said to be optimal when the sum of the squared distances between the cluster centers and the data points assigned to them is minimal (Krishnapuram and Keller, 1996).
• Therefore, the hard C-means clustering algorithm, also known as ISODATA algorithm (Ball and Hall, 1966; Krishnapuram and Keller, 1996), minimizes Jh using an alternating optimization (AO) scheme.
• By iterating the two (or more) steps the joint optimum is approached, although it cannot be guaranteed that the global optimum will be reached.
• This can be done randomly, i.e., by picking c random vectors that lie within the smallest (hyper-)box that encloses all data; or by initializing cluster centers with randomly chosen data points of the given data-set.
• Then the data partition U is held fixed and new cluster centers are computed as the mean of all data vectors assigned to them, since the mean minimizes the sum of the square distances in Jh.

### 1.2.3 Possibilistic c-means

• The ‘relative’ character of the probabilistic membership degrees can be misleading (Timm, Borgett, Döring and Kruse, 2004).
• Their membership values consequently affect the clustering results, since data point weight attracts cluster prototypes.
• Depending on the cluster’s shape the i have different geometrical interpretation.
• In that case these parameters must be estimated.
• Update equations jC for the prototypes are as well derived by simply setting the derivative of the objective function Jp w.r.t. the prototype parameters to optimize equal to zero (holding the membership degrees Up fixed).

### 1.3 DISTANCE FUNCTION VARIANTS

• In the previous section, the authors considered the case where the distance between cluster centers and data points is computed using the Euclidean distance, leading to the standard versions of fuzzy C-means and possibilistic C-means.
• This distance only makes it possible to identify spherical clusters.
• The authors review some of them, mentioning the fuzzy Gustafson–Kessel algorithm, fuzzy shell clustering algorithms and kernel-based variants.
• All of them can be applied both in the fuzzy probabilistic and possibilistic framework.
• The authors consider the variants that handle object data and do not present the relational approach.

### 1.3.1 Gustafson–Kessel Algorithm

• The Gustafson–Kessel algorithm (Gustafson and Kessel, 1979) replaces the Euclidean distance by a cluster-specific Mahalanobis distance, so as to adapt to various sizes and forms of the clusters.
• Specific constraints can be taken into account, for instance restricting to axis-parallel cluster shapes, by considering only diagonal matrices.
• The update equations for the membership degrees are identical to those indicated in Equation (1.13) and Equation (1.17) for the FCM and PCM variants respectively, replacing the Euclidean distance by the cluster specific distance given above in Equation (1.19).
• The Gustafson–Kessel algorithm tries to extract much more information from the data than the algorithms based on the Euclidean distance.

### 1.4 OBJECTIVE FUNCTION VARIANTS

• The previous variants of fuzzy C-means are obtained when considering different distance functions that lead to a rewrite of the objective functions and in some cases modify the update equations.
• The authors consider other variants that are based on deeper modifications of the objective functions.
• Others study at a theoretical level the role of the fuzzifier m in the objective function (see notations in Equation (1.10)) and propose some modifications.
• When giving update equations for cluster prototypes, the authors consider only the case where the Euclidean distance is used and when prototypes are reduced to cluster centers.
• The interested reader is referred to the original papers.

### 1.4.1 Noise Handling Variants

• The first variants of fuzzy C-means the authors consider aim at handling noisy data.
• When giving the considered objective functions, the authors do not recall the constraints indicated in Equations (1.8) and (1.9) that apply in all cases.
• The aim of these variants is then to define robust fuzzy clustering algorithms, i.e., algorithms whose results do not depend on the presence or absence of noisy data points or outliers1 in the data-set.
• Three approaches are mentioned here: the first one is based on the introduction of a specific cluster, the so-called noise cluster that is used to represent noisy data points.
• The second method is based on the use of robust estimators, and the third one reduces the influence of noisy data points by defining weights denoting the point representativeness.

### 1.4.3 Cluster Number Determination Variants

• Partitioning clustering algorithms consist of searching for the optimal fuzzy partition of the data-set into c clusters, where c is given as input to the algorithm.
• In most real data mining cases, this parameter is not known in advance and must be determined.
• Yet, as mentioned earlier, at a theoretical level, PCM relies on an ill-posed optimization problem and other approaches should be considered.
• Then the combination of terms in the objective function makes it possible to find the optimal partition in the smallest possible number of clusters.
• A robust extension to CA has been proposed in Frigui and Krishnapuram (1999): the first term in Equation (1.36) is then replaced by the term provided in Equation (1.28) to exploit the robust estimator properties.

### 1.4.4 Possibilistic c-means Variants

• As indicated in Section 1.24, the possibilistic C-means may lead to unsatisfactory results, insofar as the obtained clusters may be coincident.
• This is due to the optimized objective function, whose global minimum is obtained when all clusters are identical (see Section 1.2.4).
• Hence the possibilistic C-means can be improved by modifying its objective function.
• The authors mention here two PCM variants, based on the adjunction of a penalization term in the objective function and the combination of PCM with FCM.

### 1.5 UPDATE EQUATION VARIANTS: ALTERNATING CLUSTER ESTIMATION

• The authors study the fuzzy clustering variants that generalize the alternating optimization scheme used by the methods presented up to now.
• Thus, if fuzzy sets with limited support as in fuzzy controllers are desired, possibilistic membership functions are inadequate as well.
• Therefore ACE allows you to choose other membership functions aside from those that stem from an objective function-based AO scheme.
• In ACE, a large variety of parameterized equations stemming from defuzzification methods are offered for the re-estimation of cluster centers for fixed memberships.
• Notice that all conventional objective function-based algorithms can be represented as instances of the more general ACE framework by selecting their membership functions as well as their prototype update equations.

### 1.6 CONCLUDING REMARKS

• Starting from the basic algorithms and underlining the difference between the probabilistic and possibilistic paradigms.the authors.
• The authors then described variants of the basic algorithms, adapted to specific constraints or expectations.
• The authors further pointed out major research directions associated with fuzzy clustering.
• In this conclusion the authors briefly point out further research directions that they could not address in the main part of the chapter due to length constraints.

### 1.6.1 Clustering Evaluation

• An important topic related to clustering is that of cluster evaluation, i.e., the assessment of the obtained clusters quality: clustering is an unsupervised learning task, which means data points are not associated with labels or targets that indicate the desired output.
• Some criteria are specifically dedicated to fuzzy clustering: the partition entropy criterion for instance computes the entropy of the obtained membership degrees, PE ¼ X i;j uij log uij; and must be minimized (Bezdek, 1975).
• A data partition that is too fuzzy rather indicates a bad adequacy between the cluster number and the considered data-set and it should be penalized.
• Such criteria can be used to evaluate quantitatively the clustering quality and to compare algorithms one with another.
• They can also be applied to compare the results obtained with a single algorithm, when the parameter values are changed.

### 1.6.2 Shape and Size Regularization

• As presented in Section 1.3.1, some fuzzy clustering algorithms make it possible to identify clusters of ellipsoidal shapes and with various sizes.
• This flexibility implies that numerous cluster parameters are to be adjusted by the algorithms.
• The more parameters are involved the more sensitive the methods get to their initialization.
• Lately, a new approach has been proposed (Borgelt and Kruse, 2005) that relies on regularization to introduce shape and size constraints to handle the higher degrees of freedom effectively.
• With a timedependent shape regularization parameter, this method makes it possible to perform a soft transition from the fuzzy C-means (spherical clusters) to the Gustafson–Kessel algorithm (general ellipsoidal clusters).

### 1.6.3 Co-clustering

• Co-clustering, also called bi-clustering, two mode clustering, two way clustering or subspace clustering, has the specific aim of simultaneously identifying relevant subgroups in the data and relevant attributes for each subgroup: it aims at performing both clustering and local attribute selection.
• Other applications include text mining, e.g., for the identification of both document clusters and their characteristic keywords (Kummamuru, Dhawale, and Krishnapuram, 2003).
• Many dedicated clustering algorithms have been proposed, including fuzzy clustering methods as for instance Frigui and Nasraoui (2000).

### 1.6.4 Relational Clustering

• The methods described in this chapter apply to object data, i.e., consider the case where a description is provided for each data point individually.
• In other cases, this information is not available, the algorithm input takes the form of a pairwise dissimilarity matrix.
• The latter has size n n, each of its elements indicates the dissimilarity between point couples.
• Relational clustering aims at identifying clusters exploiting this input.
• The interested reader is also referred to the respective chapter in Bezdek, Keller, Krishnapuram, and Pal (1999).

### 1.6.5 Semisupervised Clustering

• Yet it may be the case that the user has some a priori knowledge about couples of points that should belong to the same cluster.
• Semisupervised clustering is concerned with this learning framework, where some partial information is available : the clustering results must then verify additional constraints, implied by these pieces of information.
• Specific clustering algorithms have been proposed to handle these cases; the interested reader is referred to chapter 7 in this book.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Fundamentals of Fuzzy
Clustering
Rudolf Kruse, Christian Do
¨
ring, and Marie-Jeanne Lesot
Department of Knowledge Processing and Language Engineering,
University of Magdeburg, Germany
1.1 INTRODUCTION
Clustering is an unsupervised learning task that aims at decomposing a given set of objects into subgroups
or clusters based on similarity. The goal is to divide the data-set in such a way that objects (or example
cases) belonging to the same cluster are as similar as possible, whereas objects belonging to different
clusters are as dissimilar as possible. The motivation for ﬁnding and building classes in this way can be
manifold (Bock, 1974). Cluster analysis is primarily a tool for discovering previously hidden structure in a
set of unordered objects. In this case one assumes that a ‘true’ or natural grouping exists in the data.
However, the assignment of objects to the classes and the description of these classes are unknown. By
arranging similar objects into clusters one tries to reconstruct the unknown structure in the hope that every
cluster found represents an actual type or category of objects. Clustering methods can also be used for data
reduction purposes. Then it is merely aiming at a simpliﬁed representation of the set of objects which
allows for dealing with a manageable number of homogeneous groups instead of with a vast number of
single objects. Only some mathematical criteria can decide on the composition of clusters when classify-
ing data-sets automatically. Therefore clustering methods are endowed with distance functions that
measure the dissimilarity of presented example cases, which is equivalent to measuring their similarity.
As a result one yields a partition of the data-set into clusters regarding the chosen dissimilarity relation.
All clustering methods that we consider in this chapter are partitioning algorithms. Given a positive
integer c, they aim at ﬁnding the best partition of the data into c groups based on the given dissimilarity
measure and they regard the space of possible partitions into c subsets only. Therein partitioning clustering
methods are different from hierarchical techniques. The latter organize data in a nested sequence of
groups, which can be visualized in the form of a dendrogram or tree. Based on a dendrogram one can
decide on the number of clusters at which the data are best represented for a given purpose. Usually the
number of (true) clusters in the given data is unknown in advance. However, using the partitioning
methods one is usually required to specify the number of clusters c as an input parameter. Estimating the
actual number of clusters is thus an important issue that we do not leave untouched in this chapter.
Advances in Fuzzy Clustering and its Applications Edited by J. Valente de Oliveira and W. Pedrycz
# 2007 John Wiley & Sons, Ltd ISBNs: 0 470 85275 5 (cased) 0 470 85276 3 (Pbk)

A common concept of all described clustering approaches is that they are prototype-based, i.e., the
clusters are represented by cluster prototypes C
i
, i ¼ 1; ...; c. Prototypes are used to capture the structure
(distribution) of the data in each cluster. With this representation of the clusters we formally denote the set
of prototypes C ¼fC
1
; ...; C
c
g. Each prototype C
i
is an n-tuple of parameters that consists of a cluster
center c
i
(location parameter) and maybe some additional parameters about the size and the shape of the
cluster. The cluster center c
i
is an instantiation of the attributes used to describe the domain, just as the data
points in the data-set to divide. The size and shape parameters of a prototype determine the extension of
the cluster in different directions of the underlying domain. The prototypes are constructed by the
clustering algorithms and serve as prototypical representations of the data points in each cluster.
The chapter is organized as follows: Section 1.2 introduces the basic approaches to hard, fuzzy, and
possibilistic clustering. The objective function they minimize is presented as well as the minimization
method, the alternating optimization (AO) scheme. The respective partition types are discussed and
special emphasis is put on a thorough comparison between them. Further, an intuitive understanding of
the general properties that distinguish their results is presented. Then a systematic overview of more
sophisticated fuzzy clustering methods is presented. In Section 1.3, the variants that modify the used
distance functions for detecting specic cluster shapes or geometrical contours are discussed. In
Section 1.4 variants that modify the optimized objective functions for improving the results regarding
specic requirements, e.g., dealing with noise, are reviewed. Lastly, in Section 1.5, the alternating cluster
estimation framework is considered. It is a generalization of the AO scheme for cluster model optimiza-
tion, which offers more modeling exibility without deriving parameter update equations from opti-
mization constraints. Section 1.6 concludes the chapter pointing at related issues and selected
developements in the eld.
1.2 BASIC CLUSTERING ALGORITHMS
In this section, we present the fuzzy C-means and possibilistic C-means, deriving them from the hard
c-means clustering algorithm. The latter one is better known as k-means, but here we call it (hard) C-
means to unify the notation and to emphasize that it served as a starting point for the fuzzy extensions. We
further restrict ourselves to the simplest form of cluster prototypes at rst. That is, each prototype only
consists of the center vectors, C
i
¼ðc
i
Þ, such that the data points assigned to a cluster are represented by a
prototypical point in the data space. We consider as a distance measure d an inner product norm induced
distance as for instance the Euclidean distance. The description of the more complex prototypes and other
dissimilarity measures is postponed to Section 1.3, since they are extensions of the basic algorithms
discussed here.
All algorithms described in this section are based on objective functions J, which are mathematical
criteria that quantify the goodness of cluster models that comprise prototypes and data partition.
Objective functions serve as cost functions that have to be minimized to obtain optimal cluster solutions.
Thus, for each of the following cluster models the respective objective function expresses desired
properties of what should be regarded as ‘‘best’’ results of the cluster algorithm. Having dened such a
criterion of optimality, the clustering task can be formulated as a function optimization problem. That is,
the algorithms determine the best decomposition of a data-set into a predened number of clusters by
minimizing their objective function. The steps of the algorithms follow from the optimization scheme that
they apply to approach the optimum of J. Thus, in our presentation of the hard, fuzzy, and possibilistic
c-means we discuss their respective objective functions rst. Then we shed light on their specic
minimization scheme.
The idea of dening an objective function and have its minimization drive the clustering process is quite
universal. Aside from the basic algorithms many extensions and modications have been proposed that
aim at improvements of the clustering results with respect to particular problems (e.g., noise, outliers).
Consequently, other objective functions have been tailored for these specic applications. We address the
most important of the proposed objective function variants in Section 1.4. However, regardless of the
specic objective function that an algorithm is based on, the objective function is a goodness measure.
4 FUNDAMENTALS OF FUZZY CLUSTERING

Thus it can be used to compare several clustering models of a data-set that have been obtained by the same
algorithm (holding the number of clusters, i.e., the value of c, xed).
In their basic forms the hard, fuzzy, and possibilistic C-means algorithms look for a predened number
of c clusters in a given data-set, where each of the clusters is represented by its center vector. However,
hard, fuzzy, and possibilistic C-means differ in the way they assign data to clusters, i.e., what type of data
partitions they form. In classical (hard) cluster analysis each datum is assigned to exactly one cluster.
Consequently, the hard C-means yield exhaustive partitions of the example set into non-empty and
pairwise disjoint subsets. Such hard (crisp) assignment of data to clusters can be inadequate in the
presence of data points that are almost equally distant from two or more clusters. Such special data points
can represent hybrid-type or mixture objects, which are (more or less) equally similar to two or more
types. A crisp partition arbitrarily forces the full assignment of such data points to one of the clusters,
although they should (almost) equally belong to all of them. For this purpose the fuzzy clustering
approaches presented in Sections 1.2.2 and 1.2.3 relax the requirement that data points have to be assigned
to one (and only one) cluster. Data points can belong to more than one cluster and even with different
degrees of membership to the different clusters. These gradual cluster assignments can reect present
cluster structure in a more natural way, especially when clusters overlap. Then the memberships of data
points at the overlapping boundaries can express the ambiguity of the cluster assignment.
The shift from hard to gradual assignment of data to clusters for the purpose of more expressive data
partitions founded the eld of fuzzy cluster analysis. We start our presentation with the hard C-means and
later on we point out the relatedness to the fuzzy approaches that is evident in many respects.
1.2.1 Hard c-means
In the classical C-means model each data point x
j
in the given data-set X ¼fx
1
; ...; x
n
g,X R
p
is
assigned to exactly one cluster. Each cluster
i
is thus a subset of the given data-set,
i
X. The set of
clusters ¼f
1
; ...;
c
gis required to be an exhaustive partition of the data-set X into c non-empty and
pairwise disjoint subsets
i
,1< c < n. In the C-means such a data partition is said to be optimal when the
sum of the squared distances between the cluster centers and the data points assigned to them is minimal
(Krishnapuram and Keller, 1996). This denition follows directly from the requirement that clusters
should be as homogeneous as possible. Hence the objective function of the hard C-means can be written as
follows:
J
h
ðX; U
h
; CÞ¼
X
c
i¼1
X
n
j¼1
u
ij
d
2
ij
; ð1:1Þ
where C ¼fC
1
; ...; C
c
gis the set of cluster prototypes, d
ij
is the distance between x
j
and cluster center c
i
,
Uisac n binary matrix called partition matrix. The individual elements
u
ij
2f0; 11:2Þ
indicate the assignment of data to clusters: u
ij
¼ 1 if the data point x
j
is assigned to prototype C
i
, i.e.,
x
j
2
i
; and u
ij
¼ 0 otherwise. To ensure that each data point is assigned exactly to one cluster, it is
required that:
X
c
i¼1
u
ij
¼ 1; 8j 2f1; ...; ng: ð1:3Þ
This constraint enforces exhaustive partitions and also serves the purpose to avoid the trivial solution
when minimizing J
h
, which is that no data is assigned to any cluster: u
ij
¼ 0; 8i; j. Together with
u
ij
2f0; 1g it is possible that data are assigned to one or more clusters while there are some remaining
clusters left empty. Since such a situation is undesirable, one usually requires that:
X
n
j¼1
u
ij
> 0; 8i 2f1; ...; cg: ð 1:4Þ
BASIC CLUSTERING ALGORITHMS 5

J
h
depends on the two (disjoint) parameter sets, which are the cluster centers c and the assignment of data
points to clusters U. The problem of nding parameters that minimize the C-means objective function is
NP-hard (Drineas et al., 2004). Therefore, the hard C-means clustering algorithm, also known as
ISODATA algorithm (Ball and Hall, 1966; Krishnapuram and Keller, 1996), minimizes J
h
using an
alternating optimization (AO) scheme.
Generally speaking, AO can be applied when a criterion function cannot be optimized directly, or when
it is impractical. The parameters to optimize are split into two (or even more) groups. Then one group of
parameters (e.g., the partition matrix) is optimized holding the other group(s) (e.g., the current cluster
centers) xed (and vice versa). This iterative updating scheme is then repeated. The main advantage of
this method is that in each of the steps the optimum can be computed directly. By iterating the two (or
more) steps the joint optimum is approached, although it cannot be guaranteed that the global optimum
will be reached. The algorithm may get stuck in a local minimum of the applied objective function J.
However, alternating optimization is the commonly used parameter optimization method in clustering
algorithms. Thus for each of the algorithms in this chapter we present the corresponding parameter update
equations of their alternating optimization scheme.
In the case of the hard C-means the iterative optimization scheme works as follows: at rst initial cluster
centers are chosen. This can be done randomly, i.e., by picking c random vectors that lie within the
smallest (hyper-)box that encloses all data; or by initializing cluster centers with randomly chosen data
points of the given data-set. Alternatively, more sophisticated initialization methods can be used as well,
e.g., Latin hypercube sampling (McKay, Beckman and Conover, 1979). Then the parameters C are held
xed and cluster assignments U are determined that minimize the quantity of J
h
. In this step each data
point is assigned to its closest cluster center:
u
ij
¼
1; if i ¼ argmin
c
l¼1
d
lj
0; otherwise :
ð1:5Þ
Any other assignment of a data point than to its closest cluster would not minimize J
h
for xed clusters.
Then the data partition U is held xed and new cluster centers are computed as the mean of all data vectors
assigned to them, since the mean minimizes the sum of the square distances in J
h
. The calculation of the
mean for each cluster (for which the algorithm got its name) is stated more formally:
c
i
¼
P
n
j¼1
u
ij
x
j
P
n
j¼1
u
ij
: ð1:6Þ
The two steps (1.5) and (1.6) are iterated until no change in C or U can be observed. Then the hard C-
means terminates, yielding nal cluster centers and data partition that are possibly locally optimal only.
Concluding the presentation of the hard C-means we want to mention its expressed tendency to become
stuck in local minima, which makes it necessary to conduct several runs of the algorithm with different
initializations (Duda and Hart, 1973). Then the best result out of many clusterings can be chosen based on
the values of J
h
.
We now turn to the fuzzy approaches, that relax the requirement u
ij
2f0; 1gthat is placed on the cluster
assignments in classical clustering approaches. The extensions are based on the concepts of fuzzy sets
such that we arrive at gradual memberships. We will discuss two major types of gradual cluster assign-
ments and fuzzy data partitions altogether with their differentiated interpretations and standard algo-
rithms, which are the (probabilistic) fuzzy C-means (FCM) in the next section and the possibilistic fuzzy
C-means (PCM) in Section 1.2.3.
1.2.2 Fuzzy c-means
Fuzzy cluster analysis allows gradual memberships of data points to clusters measured as degrees in [0,1].
This gives the exibility to express that data points can belong to more than one cluster. Furthermore,
these membership degrees offer a much ner degree of detail of the data model. Aside from assigning a
data point to clusters in shares, membership degrees can also express how ambiguously or denitely a data
6 FUNDAMENTALS OF FUZZY CLUSTERING

point should belong to a cluster. The concept of these membership degrees is substantiated by the
denition and interpretation of fuzzy sets (Zadeh, 1965). Thus, fuzzy clustering allows ne grained
solution spaces in the form of fuzzy partitions of the set of given examples X ¼fx
1
; ...; x
n
g. Whereas the
clusters
i
of data partitions have been classical subsets so far, they are represented by the fuzzy sets
i
of
the data-set X in the following. Complying with fuzzy set theory, the cluster assignment u
ij
is now the
membership degree of a datum x
j
to cluster
i
, such that: u
ij
¼
i
ðx
j
Þ2½0; 1. Since memberships to
clusters are fuzzy, there is not a single label that is indicating to which cluster a data point belongs. Instead,
fuzzy clustering methods associate a fuzzy label vector to each data point x
j
that states its memberships to
the c clusters:
u
j
¼ðu
1j
; ...; u
cj
Þ
T
: ð1:7Þ
The c n matrix U ¼ðu
ij
Þ¼ðu
1
; ...; u
n
Þis then called a fuzzy partition matrix. Based on the fuzzy set
notion we are now better suited to handle ambiguity of cluster assignments when clusters are badly
delineated or overlapping.
So far, the general denition of fuzzy partition matrices leaves open how assignments of data to more
than one cluster should be expressed in form of membership values. Furthermore, it is still unclear what
degrees of belonging to clusters are allowed, i.e., the solution space (set of allowed fuzzy partitions) for
fuzzy clustering algorithms is not yet specied. In the eld of fuzzy clustering two types of fuzzy cluster
partitions have evolved. They differ in the constraints they place on the membership degrees and how the
membership values should be interpreted. In our discussion we begin with the most widely used type, the
probabilistic partitions, since they have been proposed rst. Notice, that in literature they are sometimes
just called fuzzy partitions (dropping the word probabilistic). We use the subscript f for the probabilis-
tic approaches and, in the next section, p for the possibilistic models. The latter constitute the second type
of fuzzy partitions.
Let X ¼fx
1
; ...; x
n
g be the set of given examples and let c be the number of clusters ð1 < c < nÞ
represented by the fuzzy sets
i
, ði ¼ 1; ...; cÞ. Then we call U
f
¼ðu
ij
Þ¼ð
i
ðx
j
ÞÞ a probabilistic
cluster partition of X if
X
n
j¼1
u
ij
> 0; 8i 2f1; ...; cg; and ð1:8Þ
X
c
i¼1
u
ij
¼ 1; 8j 2f1; ...; n1:9Þ
hold. The u
ij
0; 1are interpreted as the membership degree of datum x
j
to cluster
i
relative to all other
clusters.
Constraint (1.8) guarantees that no cluster is empty. This corresponds to the requirement in classical
cluster analysis that no cluster, represented as (classical) subset of X, is empty (see Equation (1.4)).
Condition (1.9) ensures that the sum of the membership degrees for each datum equals 1. This means that
each datum receives the same weight in comparison to all other data and, therefore, that all data are
(equally) included into the cluster partition. This is related to the requirement in classical clustering that
partitions are formed exhaustively (see Equation (1.3)). As a consequence of both constraints no cluster
can contain the full membership of all data points. Furthermore, condition (1.9) corresponds to a
normalization of the memberships per datum. Thus the membership degrees for a given datum formally
resemble the probabilities of its being a member of the corresponding cluster.
Example: Figure 1.1 shows a (probabilistic) fuzzy classication of a two-dimensional symmetric data-
set with two clusters. The grey scale indicates the strength of belonging to the clusters. The darker shading
in the image indicates a high degree of membership for data points close to the cluster centers, while
membership decreases for data points that lie further away from the clusters. The membership values of
the data points are shown in Table 1.1. They form a probabilistic cluster partition according to the
denition above. The following advantages over a conventional clustering representation can be noted:
points in the center of a cluster can have a degree equal to 1, while points close to boundaries can be
BASIC CLUSTERING ALGORITHMS 7

##### Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents an extension of the one-class support vector machines (OC-SVM) into an ensemble of soft OC-S VM classifiers which consists in prior clustering of the input data with a kernel version of the deterministically annealed fuzzy c-means.
Abstract: This paper presents an extension of the one-class support vector machines (OC-SVM) into an ensemble of soft OC-SVM classifiers. The idea consists in prior clustering of the input data with a kernel version of the deterministically annealed fuzzy c-means. This way partitioned data is trained with a number of soft OC-SVM classifiers which allow weight assignment to each of the training data. Weights are obtained from the cluster membership values, computed in the kernel fuzzy c-means. The method was designed and tested mostly in the tasks of image classification and segmentation, although it can be used for other one-class problems.

85 citations

Proceedings ArticleDOI
29 Nov 2011
TL;DR: It is found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-mean and the convergence speed of k-Means is not always faster.
Abstract: This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. We made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and Wikipedia's latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research we found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. We found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From our experience the use of the Apache Mahout is premature.

64 citations

Journal ArticleDOI
Pierpaolo D'Urso
TL;DR: The aim of this paper is to create a nexus between postmodern tourist and fuzzy clustering, and to propose a suitable clustering procedure to segment postmodern tourists.

61 citations

Journal ArticleDOI
TL;DR: The paper undertakes a comparison of Ragin’s fuzzy set Qualitative Comparative Analysis with cluster analysis, employing both correlational and set theoretic comparisons.
Abstract: The paper undertakes a comparison of Ragin’s fuzzy set Qualitative Comparative Analysis with cluster analysis. After describing key features of both methods, it uses a simple invented example to illustrate an important algorithmic difference in the way in which these methods classify cases. It then examines the consequences of this difference via analyses of data previously calibrated as fuzzy sets. The data, taken from the National Child Development Study, concern educational achievement, social class, ability and gender. The classifications produced by fsQCA and fuzzy cluster analysis (FCA) are compared and the reasons for the observed differences between them are discussed. The predictive power of both methods is also compared, employing both correlational and set theoretic comparisons, using highest qualification achieved as the outcome. In the main, using the real data, the two methods are found to produce similar results. A final discussion considers the generalisability or otherwise of this finding.

59 citations

### Cites background from "Fundamentals of Fuzzy Clustering"

• ...Here cases can have fractional degrees of membership, analogous to fuzzy set memberships, in several clusters, with these memberships, in the basic so-called probabilistic variant of FCA, set to add to 1 7 (de Oliveira & Pedrycz, 2007; Kruse et al., 2007)....

[...]

• ...In fuzzy clustering, partitions of cases produced under this constraint can be misleading (Kruse et al., 2007, p. 10) given some distributions of cases in multidimensional space....

[...]

• ...This gives us confidence that the ‘sharing’ of memberships produced by probabilistic FCA is not greatly compromising the ‘typicality’ aspect here (on these features of membership, see Kruse et al. (2007))....

[...]

Journal ArticleDOI
TL;DR: Fuzzy decision trees (FDTs) are used in this method to transform initial uncertain data about a real system into an exact-defined a system structure function.
Abstract: Structure function is one of the possible mathematical models of the real systems under study in reliability engineering. The structure function represents the correlation the system performance level and components states. The system performance level is defined from the states of all its components. It means that all possible components states and performance levels must be indicated and reflected in the structure function. Initial data for the analysis of real system is uncertain. New methods to construct a structure function based on initial uncertain data is proposed. Fuzzy decision trees (FDTs) are used in this method to transform initial uncertain data about a real system into an exact-defined a system structure function. The proposed method includes three principal steps for the structure function construction: 1) collection of data in the repository; 2) representation of the system model in the form of an FDT; and 3) construction of the structure function based on the FDT.

49 citations

##### References
More filters
Book
01 Aug 1996
TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
Abstract: A fuzzy set is a class of objects with a continuum of grades of membership. Such a set is characterized by a membership (characteristic) function which assigns to each object a grade of membership ranging between zero and one. The notions of inclusion, union, intersection, complement, relation, convexity, etc., are extended to such sets, and various properties of these notions in the context of fuzzy sets are established. In particular, a separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

52,705 citations

Book
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations

Book
31 Jul 1981
TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.
Abstract: New updated! The latest book from a very famous author finally comes out. Book of pattern recognition with fuzzy objective function algorithms, as an amazing reference becomes what you need to get. What's for is this book? Are you still thinking for what the book is? Well, this is what you probably will get. You should have made proper choices for your better life. Book, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with.

15,662 citations

Journal ArticleDOI

14,009 citations

01 Jan 1998

12,940 citations