# An interval weighed fuzzy c-means clustering by genetically guided alternating optimization

TL;DR: In this article, an interval number is introduced for attribute weighting in the weighted fuzzy c-means (WFCM) clustering, and it is illustrated that interval weighting can obtain appropriate weights more easily from the viewpoint of geometric probability.

Abstract: The fuzzy c-means (FCM) algorithm is a widely applied clustering technique, but the implicit assumption that each attribute of the object data has equal importance affects the clustering performance. At present, attribute weighted fuzzy clustering has became a very active area of research, and numerous approaches that develop numerical weights have been combined into fuzzy clustering. In this paper, interval number is introduced for attribute weighting in the weighted fuzzy c-means (WFCM) clustering, and it is illustrated that interval weighting can obtain appropriate weights more easily from the viewpoint of geometric probability. Moreover, a genetic heuristic strategy for attribute weight searching is proposed to guide the alternating optimization (AO) of WFCM, and improved attribute weights in interval-constrained ranges and reasonable data partition can be obtained simultaneously. The experimental results demonstrate that the proposed algorithm is superior in clustering performance. It reveals that the interval weighted clustering can act as an optimization operator on the basis of the traditional numerical weighted clustering, and the effects of interval weight perturbation on clustering performance can be decreased.

##### Citations

More filters

••

TL;DR: Two new hybrids of FCM and improved self-adaptive PSO are presented, which combine FCM with a recent version of PSO, the IDPSO, which adjusts PSO parameters dynamically during execution, aiming to provide better balance between exploration and exploitation, avoiding falling into local minima quickly and thereby obtaining better solutions.

Abstract: We present two new hybrids of FCM and improved self-adaptive PSO.The methods are based on the FCM-PSO algorithm.We use FCM to initialize one particle to achieve better results in less iterations.The new methods are compared to FCM-PSO using many real and synthetic datasets.The proposed methods consistently outperform FCM-PSO in three evaluation metrics. Fuzzy clustering has become an important research field with many applications to real world problems. Among fuzzy clustering methods, fuzzy c-means (FCM) is one of the best known for its simplicity and efficiency, although it shows some weaknesses, particularly its tendency to fall into local minima. To tackle this shortcoming, many optimization-based fuzzy clustering methods have been proposed in the literature. Some of these methods are based solely on a metaheuristic optimization, such as particle swarm optimization (PSO) whereas others are hybrid methods that combine a metaheuristic with a traditional partitional clustering method such as FCM. It is demonstrated in the literature that methods that hybridize PSO and FCM for clustering have an improved accuracy over traditional partitional clustering approaches. On the other hand, PSO-based clustering methods have poor execution time in comparison to partitional clustering techniques. Another problem with PSO-based clustering is that the current PSO algorithms require tuning a range of parameters before they are able to find good solutions. In this paper we introduce two hybrid methods for fuzzy clustering that aim to deal with these shortcomings. The methods, referred to as FCM-IDPSO and FCM2-IDPSO, combine FCM with a recent version of PSO, the IDPSO, which adjusts PSO parameters dynamically during execution, aiming to provide better balance between exploration and exploitation, avoiding falling into local minima quickly and thereby obtaining better solutions. Experiments using two synthetic data sets and eight real-world data sets are reported and discussed. The experiments considered the proposed methods as well as some recent PSO-based fuzzy clustering methods. The results show that the methods introduced in this paper provide comparable or in many cases better solutions than the other methods considered in the comparison and were much faster than the other state of the art PSO-based methods.

128 citations

••

TL;DR: This paper proposes a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion.

Abstract: Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.

95 citations

### Cites background from "An interval weighed fuzzy c-means c..."

...3) The time complexities The time complexity of the standard FCM algorithm is 2 ( ) O nc s [41-43], where n is the number of object data, c is the number of clusters, and s is the dimension of data vectors....

[...]

••

TL;DR: A novel user clustering approach based on Quantum-behaved Particle Swarm Optimization (QPSO) has been proposed for the collaborative filtering based recommender system and evaluation results prove the usefulness of the generated recommendations and depict the users’ satisfaction on the proposed recommendation approach.

86 citations

••

VIT University

^{1}TL;DR: A new bio-inspired clustering ensemble through aggregating swarm intelligence and fuzzy clustering models for user-based collaborative filtering is presented and the obtained results illustrate the advantageous performance of the proposed approach over its peer works of recent times.

Abstract: In recent years, internet technologies and its rapid growth have created a paradigm of digital services. In this new digital world, users suffer due to the information overload problem and the recommender systems are widely used as a decision support tool to address this issue. Though recommender systems are proven personalization tool available, the need for the improvement of its recommendation ability and efficiency is high. Among various recommendation generation mechanisms available, collaborative filtering-based approaches are widely utilized to produce similarity-based recommendations. To improve the recommendation generation process of collaborative filtering approaches, clustering techniques are incorporated for grouping users. Though many traditional clustering mechanisms are employed for the users clustering in the existing works, utilization of bio-inspired clustering techniques needs to be explored for the generation of optimal recommendations. This article presents a new bio-inspired clustering ensemble through aggregating swarm intelligence and fuzzy clustering models for user-based collaborative filtering. The presented recommendation approaches have been evaluated on the real-world large-scale datasets of Yelp and TripAdvisor for recommendation accuracy and stability through standard evaluation metrics. The obtained results illustrate the advantageous performance of the proposed approach over its peer works of recent times.

85 citations

••

TL;DR: It is demonstrated that the proposed IT2 FS based approach is more efficient in giving better clustering results for uncertain gene expression dataset and is scalable to the large gene expression datasets.

58 citations

##### References

More filters

•

01 Jan 1982

TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.

Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

11,697 citations

••

TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.

Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

10,148 citations

••

TL;DR: It is shown that the principal eigenvector is a necessary representation of the priorities derived from a positive reciprocal pairwise comparison judgment matrix A=(aij) when A is a small perturbation of a consistent matrix.

1,184 citations

01 Jan 2003

TL;DR: In this paper, the principal eigenvector is a necessary representation of the priorities derived from a positive reciprocal pairwise comparison matrix A ¼ð aijÞ when A is a small perturbation of a consistent matrix.

Abstract: In this paper it is shown that the principal eigenvector is a necessary representation of the priorities derived from a positive reciprocal pairwise comparison judgment matrix A ¼ð aijÞ when A is a small perturbation of a consistent matrix. When providing numerical judgments, an individual attempts to estimate sequentially an underlying ratio scale and its equivalent consistent matrix of ratios. Near consistent matrices are essential because when dealing with intangibles, human judgment is of necessity inconsistent, and if with new information one is able to improve inconsistency to near consistency, then that could improve the validity of the priorities of a decision. In addition, judgment is much more sensitive and responsive to large rather than to small perturbations, and hence once near consistency is attained, it becomes uncertain which coefficients should be perturbed by small amounts to transform a near consistent matrix to a consistent one. If such perturbations were forced, they could be arbitrary and thus distort the validity of the derived priority vector in representing the underlying decision. 2002 Elsevier Science B.V. All rights reserved.

1,147 citations