scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Local k-proximal plane clustering

01 Jan 2015-Neural Computing and Applications (Springer London)-Vol. 26, Iss: 1, pp 199-211
TL;DR: A local k-proximal plane clustering (LkPPC) is proposed by bringing k-means into kPPC which will force the data points to center around some prototypes and thus localize the representations of the cluster center plane.
Abstract: k-Plane clustering (kPC) and k-proximal plane clustering (kPPC) cluster data points to the center plane, instead of clustering data points to cluster center in k-means. However, the cluster center plane constructed by kPC and kPPC is infinitely extending, which will affect the clustering performance. In this paper, we propose a local k-proximal plane clustering (LkPPC) by bringing k-means into kPPC which will force the data points to center around some prototypes and thus localize the representations of the cluster center plane. The contributions of our LkPPC are as follows: (1) LkPPC introduces localized representation of each cluster center plane to avoid the infinitely confusion. (2) Different from kPPC, our LkPPC constructs cluster center plane that makes the data points of the same cluster close to both the same center plane and the prototype, and meanwhile far away from the other clusters to some extent, which leads to solve eigenvalue problems. (3) Instead of randomly selecting the initial data points, a Laplace graph strategy is established to initialize the data points. (4) The experimental results on several artificial datasets and benchmark datasets show the effectiveness of our LkPPC.
Citations
More filters
Journal ArticleDOI
TL;DR: The experimental results show that the proposed HAM algorithm is clearly superior to the standard ABC and MBO algorithms, as well as to other well-known algorithms, in terms of achieving the best optimal value and convergence speed.
Abstract: The aim of the study was to propose a new metaheuristic algorithm that combines parts of the well-known artificial bee colony (ABC) optimization with elements from the recent monarch butterfly optimization (MBO) algorithm. The idea is to improve the balance between the characteristics of exploration and exploitation in those algorithms in order to address the issues of trapping in local optimal solution, slow convergence, and low accuracy in numerical optimization problems. This article introduces a new hybrid approach by modifying the butterfly adjusting operator in MBO algorithm and uses that as a mutation operator to replace employee phase of the ABC algorithm. The new algorithm is called Hybrid ABC/MBO (HAM). The HAM algorithm is basically employed to boost the exploration versus exploitation balance of the original algorithms, by increasing the diversity of the ABC search process using a modified operator from MBO algorithm. The resultant design contains three components: The first and third component implements global search, while the second one performs local search. The proposed algorithm was evaluated using 13 benchmark functions and compared with the performance of nine metaheuristic methods from swarm intelligence and evolutionary computing: ABC, MBO, ACO, PSO, GA, DE, ES, PBIL, and STUDGA. The experimental results show that the HAM algorithm is clearly superior to the standard ABC and MBO algorithms, as well as to other well-known algorithms, in terms of achieving the best optimal value and convergence speed. The proposed HAM algorithm is a promising metaheuristic technique to be added to the repertory of optimization techniques at the disposal of researchers. The next step is to look into application fields for HAM.

47 citations

Journal ArticleDOI
TL;DR: The experimental results show that the proposed fuzzy least squares twin support vector clustering (F-LS-TWSVC) achieves comparable clustering accuracy to that of TWSVC with comparatively lesser computational time.
Abstract: In this paper, we have formulated a fuzzy least squares version of recently proposed clustering method, namely twin support vector clustering (TWSVC). Here, a fuzzy membership value of each data pattern to different cluster is optimized and is further used for assigning each data pattern to one or other cluster. The formulation leads to finding k cluster center planes by solving modified primal problem of TWSVC, instead of the dual problem usually solved. We show that the solution of the proposed algorithm reduces to solving a series of system of linear equations as opposed to solving series of quadratic programming problems along with system of linear equations as in TWSVC. The experimental results on several publicly available datasets show that the proposed fuzzy least squares twin support vector clustering (F-LS-TWSVC) achieves comparable clustering accuracy to that of TWSVC with comparatively lesser computational time. Further, we have given an application of F-LS-TWSVC for segmentation of color images.

31 citations

Journal Article
TL;DR: This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts, and includes the different performance measures that were used to evaluate the quality of clusters.
Abstract: Text mining has important applications in the area of data mining and information retrieval. One of the important tasks in text mining is document clustering. Many existing document clustering techniques use the bag-of-words model to represent the content of a document. It is only effective for grouping related documents when these documents share a large proportion of lexically equivalent terms. The synonymy between related documents is ignored. It reduces the effectiveness of applications using a standard full-text document representation. This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts. It also includes the different performance measures that are used to evaluate the quality of clusters. Keywords: Document Clustering, Latent Semantic Indexing, Vector Space Model, tf-idf, precision, recall, F-measure.

23 citations

Journal ArticleDOI
TL;DR: The proposed clustering algorithm Tree-TWSVC has efficient learning time, achieved due to the tree structure and the formulation that leads to solving a series of systems of linear equations, and can efficiently handle large datasets and outperforms other TWSVM-based clustering methods.
Abstract: Twin support vector machine (TWSVM) is an efficient supervised learning algorithm, proposed for the classification problems. Motivated by its success, we propose Tree-based localized fuzzy twin support vector clustering (Tree-TWSVC). Tree-TWSVC is a novel clustering algorithm that builds the cluster model as a binary tree, where each node comprises of proposed TWSVM-based classifier, termed as localized fuzzy TWSVM (LF-TWSVM). The proposed clustering algorithm Tree-TWSVC has efficient learning time, achieved due to the tree structure and the formulation that leads to solving a series of systems of linear equations. Tree-TWSVC delivers good clustering accuracy because of the square loss function and it uses nearest neighbour graph based initialization method. The proposed algorithm restricts the cluster hyperplane from extending indefinitely by using cluster prototype, which further improves its accuracy. It can efficiently handle large datasets and outperforms other TWSVM-based clustering methods. In this work, we propose two implementations of Tree-TWSVC: Binary Tree-TWSVC and One-against-all Tree-TWSVC. To prove the efficacy of the proposed method, experiments are performed on a number of benchmark UCI datasets. We have also given the application of Tree-TWSVC as an image segmentation tool.

21 citations


Cites background or methods from "Local k-proximal plane clustering"

  • ...Plane-based clustering methods have been proposed such as K-plane clustering (KPC) [17] and local k-proximal plane clustering (LkPPC) [18] by Bradley et al....

    [...]

  • ...The variable vi (i = 1, 2) is the prototype [18] of the i cluster and prevents the cluster plane from extending infinitely and controls its localization, proximal to the cluster....

    [...]

Journal ArticleDOI
TL;DR: A novel plane-based clustering, called k-proximal plane clustering (kPPC), where each center plane is not only close to the objective data points but also far away from the others by solving several eigenvalue problems.
Abstract: Instead of clustering data points to cluster center points in k-means, k-plane clustering (kPC) clusters data points to the center planes. However, kPC only concerns on within-cluster data points. In this paper, we propose a novel plane-based clustering, called k-proximal plane clustering (kPPC). In kPPC, each center plane is not only close to the objective data points but also far away from the others by solving several eigenvalue problems. The objective function of our kPPC comprises the information from between- and within-clusters data points. In addition, our kPPC is extended to nonlinear case by kernel trick. A determinative strategy using a Laplace graph to initialize data points is established in our kPPC. The experiments conducted on several artificial and benchmark datasets show that the performance of our kPPC is much better than both kPC and k-means.

18 citations

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations

01 Jan 1998

12,940 citations


"Local k-proximal plane clustering" refers background in this paper

  • ...In order to evaluate our LkPPC, we investigate its clustering accuracies and computational efficiencies on two artificial datasets and twelve benchmark datasets [40]....

    [...]

Journal Article
TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

10,306 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

9,141 citations