scispace - formally typeset
Search or ask a question
Author

Raghu Krishnapuram

Bio: Raghu Krishnapuram is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Cluster analysis & Fuzzy set. The author has an hindex of 42, co-authored 139 publications receiving 10064 citations. Previous affiliations of Raghu Krishnapuram include University of Missouri & Indian Institutes of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: An appropriate objective function whose minimum will characterize a good possibilistic partition of the data is constructed, and the membership and prototype update equations are derived from necessary conditions for minimization of the criterion function.
Abstract: The clustering problem is cast in the framework of possibility theory. The approach differs from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values can be interpreted as degrees of possibility of the points belonging to the classes, i.e., the compatibilities of the points with the class prototypes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data is constructed, and the membership and prototype update equations are derived from necessary conditions for minimization of the criterion function. The advantages of the resulting family of possibilistic algorithms are illustrated by several examples. >

2,388 citations

Journal ArticleDOI
TL;DR: This paper analyzes several popular robust clustering methods and concludes that they have much in common, establishing a connection between fuzzy set theory and robust statistics, and pointing out the similarities between robust clusters methods and statistical methods.
Abstract: Clustering methods need to be robust if they are to be useful in practice. In this paper, we analyze several popular robust clustering methods and show that they have much in common. We also establish a connection between fuzzy set theory and robust statistics, and point out the similarities between robust clustering methods and statistical methods such as the weighted least-squares technique, the M estimator, the minimum volume ellipsoid algorithm, cooperative robust estimation, minimization of probability of randomness, and the epsilon contamination model. By gleaning the common principles upon which the methods proposed in the literature are based, we arrive at a unified view of robust clustering methods. We define several general concepts that are useful in robust clustering, state the robust clustering problem in terms of the defined concepts, and propose generic algorithms and guidelines for clustering noisy data. We also discuss why the generalized Hough transform is a suboptimal solution to the robust clustering problem.

737 citations

Journal ArticleDOI
TL;DR: The underlying principles of the PCM and the possibilistic approach, in general are examined and the results reported by Barni et al. are interpreted in the light of their findings.
Abstract: Recently, the possibilistic C-means algorithm (PCM) was proposed to address the drawbacks associated with the constrained memberships used in algorithms such as the fuzzy C-means (FCM). In this issue, Barni et al. (1996) report a difficulty they faced while applying the PCM, and note that it exhibits an undesirable tendency to converge to coincidental clusters. The purpose of this paper is not just to address the issues raised by Barni et al., but to go further and analytically examines the underlying principles of the PCM and the possibilistic approach, in general. We analyze the data sets used by Barni et al. and interpret the results reported by them in the light of our findings.

684 citations

Journal ArticleDOI
TL;DR: This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers with the proposed robust competitive agglomeration (RCA).
Abstract: This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed robust competitive agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive agglomeration. Noise immunity is achieved by incorporating concepts from robust statistics into the algorithm. RCA assigns two different sets of weights for each data point: the first set of constrained weights represents degrees of sharing, and is used to create a competitive environment and to generate a fuzzy partition of the data set. The second set corresponds to robust weights, and is used to obtain robust estimates of the cluster prototypes. By choosing an appropriate distance measure in the objective function, RCA can be used to find an unknown number of clusters of various shapes in noisy data sets, as well as to fit an unknown number of parametric models simultaneously. Several examples, such as clustering/mixture decomposition, line/plane fitting, segmentation of range images, and estimation of motion parameters of multiple objects, are shown.

526 citations

Journal ArticleDOI
TL;DR: A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows thatFCMdd is more efficient, and several applications of these algorithms to Web mining, including Web document clustering, snippet clusters, and Web access log analysis are presented.
Abstract: This paper presents new algorithms-fuzzy c-medoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)-for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis.

378 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations

Journal ArticleDOI
01 Jun 2010
TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.
Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

6,601 citations

Journal ArticleDOI
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

6,527 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations

Book ChapterDOI
15 Sep 2008
TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.
Abstract: The practice of classifying objects according to perceived similarities is the basis for much of science. Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms in to taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes cluster analysis (unsupervised learning) from discriminant analysis (supervised learning). The objective of cluster analysis is to simply find a convenient and valid organization of the data, not to establish rules for separating future data into categories.

4,255 citations