scispace - formally typeset
Search or ask a question

Showing papers on "Fuzzy clustering published in 1992"


Journal ArticleDOI
01 Jun 1992
TL;DR: A document browsing technique that employs docum-ent clustering as its primary operation is presented and a fast (linear time) clustering algorithm is presented that provides a powerful new access paradigm.
Abstract: Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval.We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.

1,596 citations


Journal ArticleDOI
TL;DR: For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, inconsistency in rating among experts was observed, with fuzzy c-means approaches being slightly preferred over feedforward cascade correlation results.
Abstract: Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared. >

636 citations


Proceedings ArticleDOI
08 Mar 1992
TL;DR: The author shows that an additive fuzzy system can approximate any continuous function on a compact domain to any degree of accuracy.
Abstract: The author shows that an additive fuzzy system can approximate any continuous function on a compact domain to any degree of accuracy. Fuzzy systems are dense in the space of continuous functions. The fuzzy system approximates the function by covering its graph with fuzzy patches in the input-output state space. Each fuzzy rule defines a fuzzy patch and connects commonsense knowledge with state-space geometry. Neural or statistical clustering algorithms can approximate the unknown fuzzy patches and generate fuzzy systems from training data. >

411 citations


Journal ArticleDOI
01 Mar 1992
TL;DR: A hierarchical, agglomerative, symbolic clustering methodology based on a similarity measure that takes into consideration the position, span, and content of symbolic objects is proposed and is capable of discerning clusters in data sets made up of numeric as well as symbolic objects consisting of different types and combinations of qualitative and quantitative feature values.
Abstract: A hierarchical, agglomerative, symbolic clustering methodology based on a similarity measure that takes into consideration the position, span, and content of symbolic objects is proposed. The similarity measure used is of a new type in the sense that it is not just another aspect of dissimilarity. The clustering methodology forms composite symbolic objects using a Cartesian join operator when two symbolic objects are merged. The maximum and minimum similarity values at various merging levels permit the determination of the number of clusters in the data set. The composite symbolic objects representing different clusters give a description of the resulting classes and lead to knowledge acquisition. The algorithm is capable of discerning clusters in data sets made up of numeric as well as symbolic objects consisting of different types and combinations of qualitative and quantitative feature values. In particular, the algorithm is applied to fat-oil and microcomputer data. >

217 citations


Journal ArticleDOI
TL;DR: The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns online in a stable and efficient manner and successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets.
Abstract: A modular, unsupervised neural network architecture that can be used for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns online in a stable and efficient manner. The system used a control structure similar to that found in the adaptive resonance theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two-stage process: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid position from fuzzy C-means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The AFLC algorithm is applied to the Anderson iris data and laser-luminescent finger image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets. >

110 citations


Proceedings ArticleDOI
01 Jun 1992
TL;DR: This work investigates the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark and demonstrates that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system.
Abstract: We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload there was at least one cheaper clustering algorithm that matched or almost matched stochastic clustering. Unfortunately, for each workload, the algorithm that approximated stochastic clustering was different. Our experiments also demonstrated that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system. For example, if the goal is to perform well on traversals of small portions of the database starting with a cold cache, the important metric is the per-traversal expansion factor, and a well-chosen placement tree will be nearly optimal; if the goal is to achieve a high steady-state performance with a reasonably large cache, the appropriate metric is the number of pages to which the clustering algorithm maps the active portion of the database. For this metric, the PRP clustering algorithm, which only uses access probabilities achieves nearly optimal performance.

109 citations


Journal ArticleDOI
TL;DR: In this article, a procedure called minimum information ratio estimation and validation (MIREV) is introduced, which is based on a ratio of Fisher information matrices, and the smallest eigenvalue of the information ratio matrix is used to determine the number of components.
Abstract: Determining the number of components in a mixture of distributions is an important but difficult problem. This article introduces a procedure called minimum information ratio estimation and validation (MIREV), which is based on a ratio of Fisher information matrices. The smallest eigenvalue of the information ratio matrix is used to determine the number of components. A measure of uncertainty may be obtained using a bootstrap technique. Simulations illustrate the effectiveness of the procedure. For mixtures of exponential families, an expression for the observed information ratio matrix provides insight to the success of the procedure. Cluster analysis attempts to identify and characterize subpopulations believed to be present in a population. A wide variety of methods, are available, including criterion optimization, hierarchical methods, and various heuristic methods. Criterion optimization techniques, such as mixture analysis, fuzzy clustering, and partitioning methods are popular because they...

107 citations


Journal ArticleDOI
TL;DR: The AFCS algorithms consider hyper-ellipsoidal-shells as prototypes, hence the ability to characterize elliptical boundaries, and the generalization is achieved by allowing the distances to be measured through a norm inducing matrix that is symmetric, positive definite.

103 citations


Journal ArticleDOI
TL;DR: Two types of direct algorithms for solving multicriteria clustering problem are proposed: the modified relocation algorithm, and the modified agglomerative algorithm.
Abstract: In a multicriteria clustering problem, optimization over more than one criterion is required. The problem can be treated in different ways: by reduction to a clustering problem with the single criterion obtained as a combination of the given criteria; by constrained clustering algorithms where a selected critetion is considered as the clustering criterion and all others determine the constraints; or by direct algorithms. In this paper two types of direct algorithms for solving multicriteria clustering problem are proposed: the modified relocation algorithm, and the modified agglomerative algorithm. Different elaborations of these two types of algorithms are discussed and compared. Finally, two applications of the proposed algorithms are presented.

86 citations


Journal ArticleDOI
TL;DR: Use of the image auto-correlation property is identified for further speeding up of the proposed methods in the clustering of image data and use of the Lower triangular matrix approach to speed up Dynamic clustering is proposed.

71 citations


Proceedings ArticleDOI
08 Mar 1992
TL;DR: The authors propose a technique for determining the weighting exponent, m, a parameter in a fuzzy c-means algorithm, using the concept of fuzzy decision theory, and define a fuzzy goal as maximizing the number of data points in a cluster and a fuzzy constraint as the minimizing of the sum of square errors within a cluster.
Abstract: The authors propose a technique for determining the weighting exponent, m, a parameter in a fuzzy c-means algorithm, using the concept of fuzzy decision theory. They define a fuzzy goal as maximizing the number of data points in a cluster and a fuzzy constraint as the minimizing of the sum of square errors within a cluster. A decision about m is made by taking the intersection of the fuzzy goal and constraint such that given m, the fuzzy c-means algorithm produces good clusters. >

Journal ArticleDOI
01 Oct 1992-Proteins
TL;DR: F fuzzy clustering is proposed as a method to analyze molecular dynamics trajectories, especially of proteins and polypeptides, and the results were unambiguous, in terms of the optimal number of clusters of conformations, for the majority of the trajectories examined.
Abstract: We propose fuzzy clustering as a method to analyze molecular dynamics (MD) trajectories, especially of proteins and polypeptides. A fuzzy cluster analysis locates classes of similar three-dimensional conformations explored during a molecular dynamics simulation. The method can be readily applied to results from both equilibrium and nonequilibrium simulations, with clustering on either global or local structural parameters. The potential of this technique is illustrated by results from fuzzy cluster analyses of trajectories from MD simulations of various fragments of human parathyroid hormone (PTH). For large molecules, it is more efficient to analyze the clustering of root-mean-square distances between conformations comprising the trajectory. We found that the results of the clustering analysis were unambiguous, in terms of the optimal number of clusters of conformations, for the majority of the trajectories examined. The conformation closest to the cluster center can be chosen as being representative of the class of structures making up the cluster, and can be further analyzed, for example, in terms of its secondary structure. The CPU time used by the cluster analysis was negligible compared to the MD simulation time. © 1992 Wiley-Liss, Inc.

Proceedings ArticleDOI
R. Krovi1
07 Jan 1992
TL;DR: The primary goal of this research effort is to investigate the potential feasibility of using genetic algorithms for the purpose of clustering.
Abstract: Cluster analysis is a technique which is used to discover patterns and associations within data. More specifically, it is a multivariate statistical procedure that starts with a data set containing information on some variables and attempts to reorganize these data cases into relatively homogeneous groups. One of the major problems encountered by researchers, with regard to cluster analysis that different clustering methods can and do generate different solutions for the same data set. What is needed, is a technique that has discovered the most 'natural' groups in a data set. Genetic algorithms belong to a class of 'artificially intelligent' techniques, that are founded on principles of natural selection and natural genetics. The primary goal of this research effort is to investigate the potential feasibility of using genetic algorithms for the purpose of clustering. >

Journal ArticleDOI
TL;DR: A general methodology for fuzzy clustering analysis is developed and illustrated with a case study of water quality evaluation for Dianshan Lake, Shanghai, China, and the results can be interpreted to provide valuable information to support decision making and to aid water quality management.
Abstract: A general methodology for fuzzy clustering analysis is developed and illustrated with a case study of water quality evaluation for Dianshan Lake, Shanghai, China. Fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates multiple parameters. In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy clustering analysis, the classification is determined by a fuzzy relation. After a fuzzy similarity matrix has been established and the fuzzy relation stabilized, a dynamic clustering chart can be developed. Given a suitable threshold, the appropriate classification is worked out. The methodology is relatively simple, and the results can be interpreted to provide valuable information to support decision making and to aid water quality management.

Journal ArticleDOI
TL;DR: The modified single linkage and modified rank order clustering algorithm are compared and the strengths and weaknesses are discussed, and the flexibility provided by the fuzzy set-based method is presented.

Journal ArticleDOI
TL;DR: In this article, a fuzzy K-means clustering approach that uses a modified hat matrix H* as a similarity or information matrix is employed, which permits observations to be allocated to clusters in a fuzzy way by defining a membership function from 0 to 1.
Abstract: Comparing analytical approaches is crucial when important policy decisions of corporations or government agencies may be influenced by results that depend on the methodologies certain disciplines customarily use. Technical efficiency can be measured by a full-frontier production function model or by linear programming specifications. By using these modeling approaches observations pertaining to three linerboard manufacturing facilities are classified as efficient, inefficient, scale inefficient, and other. However, observations may or may not be consistently classified into these four groups when employing the two modeling approaches. In order to validate the efficiency designations of the two modeling approaches and to determine the uniqueness of observations, a fuzzy K-means clustering approach that uses a modified hat matrix H * as a similarity or information matrix is employed. This approach permits observations to be allocated to clusters in a fuzzy way by defining a membership function from 0 to 1. As the degree of fuzziness increases, a sensitivity analysis with respect to individual observations belonging to some cluster can be evaluated. At the same time, this fuzzy approach assists the analyst to assess the inconsistencies that can arise when using the mathematical programming and full-frontier modeling approaches of technical efficiency.

Proceedings ArticleDOI
08 Mar 1992
TL;DR: A survey of boundary detection techniques based on fuzzy clustering, including algorithms to detect linear, planar, and curved boundary detection, is presented.
Abstract: Boundary detection in digital images is viewed as a clustering problem. A survey of boundary detection techniques based on fuzzy clustering is presented. Algorithms to detect linear, planar, and curved boundary detection are surveyed. Limitations of these techniques are discussed, along with a review of approaches proposed to overcome these limitations. >

Journal ArticleDOI
TL;DR: The paper presents a methodology for quantifying the data that refer to the fuzzy features and identifies two types of fuzzy features: qualitative features, and quantitative ones with subjective meaning.
Abstract: SUMMARY The high potential of using group technology in manufacturing has attracted the interest of both practitioners and researchers. Group technology is based on clustering parts which have similar features. Very often it is very hard to quantify successfully data regarding these features. This is because in many real applications features are fuzzy. This paper identifies two types of fuzzy features: qualitative features, and quantitative ones with subjective meaning. The paper presents a methodology for quantifying the data that refer to the fuzzy features. The proposed methodology deals with crisp and fuzzy data in a unified manner. Finally, some clustering approaches which process the quantified features are also discussed

Journal ArticleDOI
TL;DR: The algorithm was adopted from fuzzy-c-mean and modifications made to take into account the extra information, i.e. some data samples already form clusters, and was applied on Chinese character recognition and an encouraging result was obtained.

Proceedings ArticleDOI
08 Mar 1992
TL;DR: A systematic design procedure for fuzzy linguistic controllers with adaptive or learning capability is introduced, based on stability and hierarchy of identification and control and guarantees the stability, convergence, and robustness of the closed-loop feedback system.
Abstract: A systematic design procedure for fuzzy linguistic controllers with adaptive or learning capability is introduced. The design is based on stability and hierarchy of identification and control. The fuzzy rule-base is stored in a fuzzy hypercube and the fuzzy control action is computed via a fuzzy inference mechanism. Initial conditions for the elements of a fuzzy hypercube are obtained by an offline fuzzy clustering mechanism with large-grain uncertainty. Two fuzzy algorithms are developed: the first one is a fuzzy identification-learning algorithm and the second is a fuzzy control-inferencing algorithm. The fuzzy identification-learning algorithm updates the membership functions on the action side of the rules and the fuzzy control-inferencing algorithm calculates fuzzy control data. This approach guarantees the stability, convergence, and robustness of the closed-loop feedback system. >

Journal ArticleDOI
TL;DR: An algorithm for efficient classification of data in R μ when there exists no a priori information about the number of clusters is developed, and a theorem on the convergence of this algorithm is proved.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: The discovery process for the number of clusters is studied and compared to k-means clustering, and the frequency of a clusters outcome is used as a measure of the validity of the clustering.
Abstract: Competing self-organizing maps are used to cluster data. Because maps are more complicated than single stereotypes, this clustering is different from k-means clustering in that the proper number of clusters will be discovered. This discovery process for the number of clusters is studied and compared to k-means clustering. Also, because self-organizing maps are probabilistic algorithms, the frequency of a clustering outcome is used as a measure of the validity of the clustering. >

Proceedings ArticleDOI
01 Feb 1992
TL;DR: It is shown that the conventional measures for the fuzzy partitions do not perform well for the FCS clustering and a new set of indices are introduced to evaluate the structure characterized by the F CS algorithms.
Abstract: New performance measures for evaluating fuzzy partitions obtained through c-shells clustering are introduced. It is shown that the conventional measures for the fuzzy partitions do not perform well for the FCS clustering. A new set of indices are introduced to evaluate the structure characterized by the FCS algorithms. Examples are presented to demonstrate the superiority of the criteria proposed over the existing ones.© (1992) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
01 Feb 1992
TL;DR: New hard and fuzzy clustering algorithms called the c-quadric shells (CQS) algorithms are introduced that can be used to cluster mixtures of all types of hyperquadrics such as hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids, and even hyperplanes.
Abstract: In this paper, we introduce new hard and fuzzy clustering algorithms called the c-quadric shells (CQS) algorithms. These algorithms are specifically designed to seek clusters that can be described by segments of second-degree curves, or more generally by segments of shells of hyperquadrics. Previous shell clustering algorithms have considered clusters of specific shapes such as circles (the fuzzy c-shells algorithm) or ellipses (the fuzzy c-ellipsoids algorithm). The advantage of our algorithm lies in the fact that it can be used to cluster mixtures of all types of hyperquadrics such as hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids, and even hyperplanes. Several examples of clustering in the two-dimensional case are shown.© (1992) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: Interesting properties of the points generated in the course of applying the fuzzy c-means algorithm are revealed using the concept of reduced objective function and empirically show that these quantities converge linearly.

Patent
21 Aug 1992
TL;DR: In this article, a multi-sensor system consisting of a heterogeneous sensor array, in which signals from the individual sensors, which are totally or partially constructively correlated, represent the overall condition of the media using pattern recognition methods such as fuzzy clustering or neural networks.
Abstract: Identification of the condition of gaseous and liq. media is carried out by multi-sensor systems consisting of a heterogeneous sensor array, in which signals from the individual sensors, which are totally or partially constructively correlated, represent the overall condition of the media using pattern recognition methods such as fuzzy clustering or neural networks. The novelty is that (a) the individual sensors undertake a condition characterisation of the media even when the individual sensor signals do not allow indication of condition change; (b) the signals provide adequate representation of fuzzy conditions of the media using pattern recognition in an adaptive learning phase in which the number and type of individual sensors are ascertained in relation to the problem; (c) automatic trend recognition and correction are effected by application of elements of fuzzy logic to evaluation criteria consisting of max. rate of measurement change, the regularity of the measurement sequence, the parallel relation of the constructively correlated individual sensors, the angle (theta) between the instantaneous vector and the connecting line between the different conditions and the magnitude of the signals; and (d) pattern recognition methods and data interpretation methods are implemented in program control of the multi-sensor system. ADVANTAGE - Process condition differentiation is obtd. from a group of time-dependent target values by heuristic methods even when the stochastically fluctuating measurements are additionally falsified by various trends and the process conditions merge in a poorly defined manner.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A cost function for unsupervised and supervised data clustering which comprises distortion costs, complexity costs and supervision costs is proposed and a maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their cluster probabilities.
Abstract: The authors discuss objective functions for unsupervised and supervised data clustering and the respective competitive neural networks which implement these clustering algorithms. They propose a cost function for unsupervised and supervised data clustering which comprises distortion costs, complexity costs and supervision costs. A maximum entropy estimation of the clustering cost function yields an optimal number of clusters, their positions and their cluster probabilities. A three-layer neural network with a winner-take-all connectivity in the clustering layer implements the proposed algorithm. >

Proceedings ArticleDOI
07 Jun 1992
TL;DR: The author presents the results of using fuzzy neural network modeling and learning techniques to search for fuzzy clusters of unlabeled patterns to embed fuzzy clustering into neural networks so that online learning and parallel implementation are feasible.
Abstract: The author presents the results of using fuzzy neural network modeling and learning techniques to search for fuzzy clusters of unlabeled patterns. The goal is to embed fuzzy clustering into neural networks so that online learning and parallel implementation are feasible. Fuzzy competitive learning networks are investigated based on the conventional competitive learning networks, and some implications of these results for interpreting fuzziness by the network are discussed. The derivation of such modeling and learning techniques illustrates how the idea of incorporating fuzziness into conventional neural networks might be realized. The necessity of dealing with the fuzzy features in pattern classification requires modifications of neural networks and associated learning methods. >

Posted Content
TL;DR: A fuzzy clustering technique is applied to identify possibie alliances among groups with different interests in environmental conflict management, and the methodology is illustrated by means of an empirical land use problem in the Netherlands.
Abstract: The paper provides a new methodology for conflict analysis in choice situations with multiple groups, where the information on the various choice possibilities is fuzzy in nature. The focus of the paper is on environmental conflict management. Starting from a multicriteria perspective, a fuzzy clustering technique is applied to identify possibie alliances among groups with different interests. After a brief survey of coalition formation theory, the methodology is illustrated by means of an empirical land use problem in the Netherlands.