scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Kernel based K-means clustering using rough set

01 Mar 2012-pp 1-5
TL;DR: This paper is proposing the combination of these two methods and known as kernel based K-Means using rough set, which can handle uncertainty and heterogeneous data and Roughbased K-means is one of them.
Abstract: From the beginning of the data analysis system cluster computing plays an important role on it The very early developed clustering algorithms which can handle only numerical data and K-means clustering is one of them and was proposed by Macqueen [1] in 1967 This algorithm helps us to find the homogeneity of the data set This K-means algorithm has been modified in many ways to get the modified K-means and kernel based K-means is one of them It is a nonlinear transformation which transforms the sample data into high dimensional feature space Though this kernel based K-means performs good almost on every data set but it is unable to handle uncertainty After rough set theory has been proposed by Pawlak [2], we have many clustering algorithms based on it which can handle uncertainty and heterogeneous data and Rough based K-means is one of them So in this paper we are proposing the combination of these two methods and known as kernel based K-Means using rough set
Citations
More filters
Book ChapterDOI
01 Jan 2014
TL;DR: The kernel based rough intuitionistic fuzzy c-means algorithm is introduced and it is shown that it is superior to all the algorithms in the sequel; i.e. both normal and the kernel based algorithms.
Abstract: Clustering of real life data for analysis has gained popularity and imprecise methods or their hybrid approaches has attracted many researchers of late. Recently, rough intuitionistic fuzzy c-means algorithm was introduced and studied by Tripathy et al [3] and it was found to be superior to all other algorithms in this family. Kernel based counter part of these algorithms have been found to behave better than their corresponding Euclidean distance based algorithms. Very recently kernel based rough fuzzy algorithm was put forth by Bhargav et al [4]. A comparative analysis over standard datasets and images has established the superiority of this algorithm over its corresponding standard algorithm. In this paper we introduce the kernel based rough intuitionistic fuzzy c-means algorithm and show that it is superior to all the algorithms in the sequel; i.e. both normal and the kernel based algorithms. We establish it through experimental analysis by taking different type of inputs and using standard accuracy measures.

14 citations


Cites background or methods from "Kernel based K-means clustering usi..."

  • ...Again, it was shown in [20, 21] and [17, 18] respectively that the kernel versions KFCM and KRCM perform better than their normal counterparts....

    [...]

  • ...Here, N is total number of data objects [17]....

    [...]

  • ...Replacing the Euclidean distance used in the above algorithms for computing the distance by kernel functions some algorithms have been put forth like the kernel based fuzzy c-means (KFCM) [20] and the kernel based rough c-means (KRCM) [17, 18, 21] were introduced Tripathy and Ghosh....

    [...]

Proceedings ArticleDOI
28 Sep 2015
TL;DR: A comparative analysis is performed over the Gaussian, hyper tangent and radial basis kernel functions by their application on various vague clustering approaches, revealing that for small sized datasets Gaussian kernel produces more accurate clustering than radial basis andhyper tangent kernel functions however for the datasets which are considerably large hyper tangents kernel is superior to other kernel functions.
Abstract: Application of clustering algorithms for investigating real life data has concerned many researchers and vague approaches or their hybridization with other analogous approaches has gained special attention due to their great effectiveness. Recently, rough intuitionistic fuzzy c-means algorithm has been proposed by Tripathy et al [3] and they established its supremacy over all other algorithms contained in the same set. Replacing the Euclidean distance metric with kernel induced metric makes it possible to cluster the objects which are linearly inseparable in the original space. In this paper a comparative analysis is performed over the Gaussian, hyper tangent and radial basis kernel functions by their application on various vague clustering approaches like rough c-means (RCM), intuitionistic fuzzy c-means (IFCM), rough fuzzy c-means (RFCM) and rough intuitionistic fuzzy c-means (RIFCM). All clustering algorithms have been tested on synthetic, user knowledge modeling and human activity recognition datasets taken from UCI repository against the standard accuracy indexes for clustering. The results reveal that for small sized datasets Gaussian kernel produces more accurate clustering than radial basis and hyper tangent kernel functions however for the datasets which are considerably large hyper tangent kernel is superior to other kernel functions. All experiments have been carried out using C language and python libraries have been used for statistical plotting.

12 citations


Cites methods from "Kernel based K-means clustering usi..."

  • ...But using this, the clusters generated are linearly separable....

    [...]

Book ChapterDOI
10 Dec 2013
TL;DR: A kernel based rough-fuzzy C-Means (KRFCM) algorithm is proposed and a modified version of the performance indexes obtained by replacing the distance function with kernel function is used by providing a comparative analysis of RFCM with KRFCM by computing their DB and D index values.
Abstract: Data clustering has found its usefulness in various fields. Algorithms are mostly developed using euclidean distance. But it has several drawbacks which maybe rectified by using kernel distance formula. In this paper, we propose a kernel based rough-fuzzy C-Means (KRFCM) algorithm and use modified version of the performance indexes (DB and D) obtained by replacing the distance function with kernel function. We provide a comparative analysis of RFCM with KRFCM by computing their DB and D index values. The analysis is based upon both numerical as well as image datasets. The results establish that the proposed algotihtm outperforms the existing one.

11 citations


Additional excerpts

  • ...[7]-[8]....

    [...]

Journal ArticleDOI
TL;DR: Based on unstable oil-water flow theory, a multilayer commingled reservoir simulator is established by modifying the production split method as discussed by the authors, which is to divide all the layers into several sets of production series according to the physical properties and recovery percent of layers at high water-cut stage.

7 citations


Cites methods from "Kernel based K-means clustering usi..."

  • ...Based on the pseudo flow resistance of one single layer, using the K-means clustering method (Wang and Niu 2004; Kong et al. 2004), layer regrouping is carried out to obtain the optimal production performance....

    [...]

Book
11 Sep 2013
TL;DR: The proceedings of the 14th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, RSFDGrC 2013, held in Halifax, Canada in October 2013 as one of the co-located conference of the 2013 Joint Rough Set Symposium, JRS 2013 as discussed by the authors.
Abstract: This book constitutes the thoroughly refereed conference proceedings of the 14th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, RSFDGrC 2013, held in Halifax, Canada in October 2013 as one of the co-located conference of the 2013 Joint Rough Set Symposium, JRS 2013. The 69 papers (including 44 regular and 25 short papers) included in the JRS proceedings (LNCS 8170 and LNCS 8171) were carefully reviewed and selected from 106 submissions. The papers in this volume cover topics such as inconsistency, incompleteness, non-determinism; fuzzy and rough hybridization; granular computing and covering-based rough sets; soft clustering; image and medical data analysis.

6 citations

References
More filters
01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations

01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations

Journal ArticleDOI
TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Abstract: The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

15,696 citations

Book
01 Jan 1982
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

11,697 citations

Journal ArticleDOI
TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.
Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.

8,811 citations


"Kernel based K-means clustering usi..." refers methods in this paper

  • ...After rough set theory has been proposed by Pawlak [2], we have many clustering algorithms based on it which can handle uncertainty and heterogeneous data and Rough based K-means is one of them....

    [...]