Showing papers on "Fuzzy clustering published in 2013"

PDF

Open Access

Journal Article•DOI•

Sparse Subspace Clustering: Algorithm, Theory, and Applications

[...]

Ehsan Elhamifar¹, René Vidal²•Institutions (2)

University of California, Berkeley¹, Johns Hopkins University²

01 Nov 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a sparse subspace clustering algorithm is proposed to cluster high-dimensional data points that lie in a union of low-dimensional subspaces, where a sparse representation corresponds to selecting a few points from the same subspace.

...read moreread less

Abstract: Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

...read moreread less

2,298 citations

Book Chapter•DOI•

Density-Based Clustering Based on Hierarchical Density Estimates

[...]

Ricardo J. G. B. Campello¹, Davoud Moulavi¹, Joerg Sander¹•Institutions (1)

University of Alberta¹

14 Apr 2013

TL;DR: This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.

...read moreread less

Abstract: We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data

...read moreread less

1,132 citations

Review on determining number of Cluster in K-Means Clustering

[...]

Trupti Kodinariya, Prashant R. Makwana

01 Jan 2013

TL;DR: Six different approaches to determine the right number of clusters in a dataset are explored, including k-means method, a simple and fast clustering technique that addresses the problem of cluster number selection by using a k-Means approach.

...read moreread less

Abstract: Clustering is widely used in different field such as biology, psychology, and economics. The result of clustering varies as number of cluster parameter changes hence main challenge of cluster analysis is that the number of clusters or the number of model parameters is seldom known, and it must be determined before clustering. The several clustering algorithm has been proposed. Among them k-means method is a simple and fast clustering technique. We address the problem of cluster number selection by using a k-means approach We can ask end users to provide a number of clusters in advance, but it is not feasible end user requires domain knowledge of each data set. There are many methods available to estimate the number of clusters such as statistical indices, variance based method, Information Theoretic, goodness of fit method etc...The paper explores six different approaches to determine the right number of clusters in a dataset

...read moreread less

927 citations

Book•DOI•

Data Clustering: Algorithms and Applications

[...]

Charu C. Aggarwal, Chandan K. Reddy¹•Institutions (1)

Wayne State University¹

21 Aug 2013

TL;DR: Top researchers from around the world explore the characteristics of clustering problems in a variety of application areas and explain how to glean detailed insight from the clustering process including how to verify the quality of the underlying cluster through supervision, human intervention, or the automated generation of alternative clusters.

...read moreread less

Abstract: Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering processincluding how to verify the quality of the underlying clustersthrough supervision, human intervention, or the automated generation of alternative clusters.

...read moreread less

759 citations

Journal Article•DOI•

Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image Segmentation

[...]

Maoguo Gong¹, Yan Liang¹, Jiao Shi¹, Wenping Ma¹, Jingjing Ma¹ - Show less +1 more•Institutions (1)

Xidian University¹

01 Feb 2013-IEEE Transactions on Image Processing

TL;DR: An improved fuzzy C-means (FCM) algorithm for image segmentation is presented by introducing a tradeoff weighted fuzzy factor and a kernel metric and results show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.

...read moreread less

Abstract: In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its objective function. The new algorithm adaptively determines the kernel parameter by using a fast bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore, the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.

...read moreread less

546 citations

Proceedings Article•

Multi-view K-means clustering on big data

[...]

Xiao Cai¹, Feiping Nie¹, Heng Huang¹•Institutions (1)

University of Texas at Arlington¹

03 Aug 2013

TL;DR: This paper proposes a new robust large-scale multi-view clustering method to integrate heterogeneous representations of largescale data and evaluates the proposed new methods by six benchmark data sets and compared the performance with several commonly used clustering approaches as well as the baseline multi- view clustering methods.

...read moreread less

Abstract: In past decade, more and more data are collected from multiple sources or represented by multiple views, where different views describe distinct perspectives of the data. Although each view could be individually used for finding patterns by clustering, the clustering performance could be more accurate by exploring the rich information among multiple views. Several multi-view clustering methods have been proposed to unsupervised integrate different views of data. However, they are graph based approaches, e.g. based on spectral clustering, such that they cannot handle the large-scale data. How to combine these heterogeneous features for unsupervised large-scale data clustering has become a challenging problem. In this paper, we propose a new robust large-scale multi-view clustering method to integrate heterogeneous representations of largescale data. We evaluate the proposed new methods by six benchmark data sets and compared the performance with several commonly used clustering approaches as well as the baseline multi-view clustering methods. In all experimental results, our proposed methods consistently achieve superiors clustering performances.

...read moreread less

471 citations

Journal Article•DOI•

Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis

[...]

Na Chen¹, Na Chen², Zeshui Xu¹, Meimei Xia¹•Institutions (2)

Southeast University¹, Nanjing University of Finance and Economics²

15 Feb 2013-Applied Mathematical Modelling

TL;DR: The interval-valued HFSs and the corresponding correlation coefficient formulas are developed and demonstrated their application in clustering with intervals-valued hesitant fuzzy information through a specific numerical example.

...read moreread less

449 citations

Journal Article•DOI•

Comparative Analysis of K-Means and Fuzzy C- Means Algorithms

[...]

Soumi Ghosh, Sanjay Kumar Dubey

01 Jan 2013-International Journal of Advanced Computer Science and Applications

TL;DR: Two important clustering algorithms namely centroid based K-means and representative object based FCM (Fuzzy C-Means) clustering algorithm are compared and performance is evaluated on the basis of the efficiency of clustering output.

...read moreread less

Abstract: In the arena of software, data mining technology has been considered as useful means for identifying patterns and trends of large volume of data. This approach is basically used to extract the unknown pattern from the large set of data for business as well as real time applications. It is a computational intelligence discipline which has emerged as a valuable tool for data analysis, new knowledge discovery and autonomous decision making. The raw, unlabeled data from the large volume of dataset can be classified initially in an unsupervised fashion by using cluster analysis i.e. clustering the assignment of a set of observations into clusters so that observations in the same cluster may be in some sense be treated as similar. The outcome of the clustering process and efficiency of its domain application are generally determined through algorithms. There are various algorithms which are used to solve this problem. In this research work two important clustering algorithms namely centroid based K-Means and representative object based FCM (Fuzzy C-Means) clustering algorithms are compared. These algorithms are applied and performance is evaluated on the basis of the efficiency of clustering output. The numbers of data points as well as the number of clusters are the factors upon which the behaviour patterns of both the algorithms are analyzed. FCM produces close results to K-Means clustering but it still requires more computation time than K-Means clustering. Keywords—clustering; k-means; fuzzy c-means; time complexity

...read moreread less

408 citations

Journal Article•DOI•

Consistency of spectral clustering in stochastic block models

[...]

Jing Lei, Alessandro Rinaldo

07 Dec 2013-arXiv: Statistics Theory

TL;DR: It is shown that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$ with $n$ the number of nodes.

...read moreread less

Abstract: We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical $k$-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.

...read moreread less

386 citations

Journal Article•DOI•

An energy aware fuzzy approach to unequal clustering in wireless sensor networks

[...]

Hakan Bagci¹, Adnan Yazici¹•Institutions (1)

Middle East Technical University¹

01 Apr 2013

TL;DR: A fuzzy energy-aware unequal clustering algorithm (EAUCF), that addresses the hot spots problem, and is compared with some popular clustering algorithms in the literature, namely Low Energy Adaptive Clustering Hierarchy, Cluster-Head Election Mechanism using Fuzzy Logic and Energy-Efficient Unequal Clustered.

...read moreread less

Abstract: In order to gather information more efficiently in terms of energy consumption, wireless sensor networks (WSNs) are partitioned into clusters. In clustered WSNs, each sensor node sends its collected data to the head of the cluster that it belongs to. The cluster-heads are responsible for aggregating the collected data and forwarding it to the base station through other cluster-heads in the network. This leads to a situation known as the hot spots problem where cluster-heads that are closer to the base station tend to die earlier because of the heavy traffic they relay. In order to solve this problem, unequal clustering algorithms generate clusters of different sizes. In WSNs that are clustered with unequal clustering, the clusters close to the base station have smaller sizes than clusters far from the base station. In this paper, a fuzzy energy-aware unequal clustering algorithm (EAUCF), that addresses the hot spots problem, is introduced. EAUCF aims to decrease the intra-cluster work of the cluster-heads that are either close to the base station or have low remaining battery power. A fuzzy logic approach is adopted in order to handle uncertainties in cluster-head radius estimation. The proposed algorithm is compared with some popular clustering algorithms in the literature, namely Low Energy Adaptive Clustering Hierarchy, Cluster-Head Election Mechanism using Fuzzy Logic and Energy-Efficient Unequal Clustering. The experiment results show that EAUCF performs better than the other algorithms in terms of first node dies, half of the nodes alive and energy-efficiency metrics in all scenarios. Therefore, EAUCF is a stable and energy-efficient clustering algorithm to be utilized in any WSN application.

...read moreread less

292 citations

Journal Article•DOI•

Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification

[...]

Susanna Röblitz¹, Marcus Weber¹•Institutions (1)

Zuse Institute Berlin¹

01 Jun 2013-Advanced Data Analysis and Classification

TL;DR: It is demonstrated in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.

...read moreread less

Abstract: Given a row-stochastic matrix describing pairwise similarities between data objects, spectral clustering makes use of the eigenvectors of this matrix to perform dimensionality reduction for clustering in fewer dimensions. One example from this class of algorithms is the Robust Perron Cluster Analysis (PCCA+), which delivers a fuzzy clustering. Originally developed for clustering the state space of Markov chains, the method became popular as a versatile tool for general data classification problems. The robustness of PCCA+, however, cannot be explained by previous perturbation results, because the matrices in typical applications do not comply with the two main requirements: reversibility and nearly decomposability. We therefore demonstrate in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.

...read moreread less

Journal Article•DOI•

Understanding and Enhancement of Internal Clustering Validation Measures

[...]

Yanchi Liu¹, Zhongmou Li², Hui Xiong², Xuedong Gao³, Junjie Wu⁴, Sen Wu³ - Show less +2 more•Institutions (4)

New Jersey Institute of Technology¹, Rutgers University², University of Science and Technology Beijing³, Beihang University⁴

07 Mar 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A new internal clustering validate measure, named clustering validation index based on nearest neighbors (CVNN), which is based on the notion of nearest neighbors is proposed, which can dynamically select multiple objects as representatives for different clusters in different situations.

...read moreread less

Abstract: Clustering validation has long been recognized as one of the vital issues essential to the success of clustering applications. In general, clustering validation can be categorized into two classes, external clustering validation and internal clustering validation. In this paper, we focus on internal clustering validation and present a study of 11 widely used internal clustering validation measures for crisp clustering. The results of this study indicate that these existing measures have certain limitations in different application scenarios. As an alternative choice, we propose a new internal clustering validation measure, named clustering validation index based on nearest neighbors (CVNN), which is based on the notion of nearest neighbors. This measure can dynamically select multiple objects as representatives for different clusters in different situations. Experimental results show that CVNN outperforms the existing measures on both synthetic data and real-world data in different application scenarios.

...read moreread less

Journal Article•DOI•

A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm

[...]

İbrahim Berkan Aydilek¹, Ahmet Arslan¹•Institutions (1)

Selçuk University¹

01 Jun 2013-Information Sciences

TL;DR: A fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm yields sufficient and sensible imputation performance results.

...read moreread less

Journal Article•DOI•

How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification

[...]

Christian Hennig¹, Tim Futing Liao²•Institutions (2)

University College London¹, University of Illinois at Urbana–Champaign²

01 May 2013-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: The application of a philosophy of cluster analysis to economic data from the 2007 US Survey of Consumer Finances demonstrates techniques and decisions required to obtain an interpretable clustering, and the clustering is shown to be significantly more structured than a suitable null model.

...read moreread less

Abstract: Data with mixed type (metric/ordinal/nominal) variables are typical for social stratification, i.e., partitioning a population into social classes. Approaches to cluster such data are compared, namely a latent class mixture model assuming local independence and dissimilarity based methods such as k-medoids. The design of an appropriate dissimilarity measure and the estimation of the number of clusters are discussed as well, comparing the BIC with dissimilarity based criteria. The comparison is based on a philosophy of cluster analysis that connects the problem of a choice of a suitable clustering method closely to the application by considering direct interpretations of the implications of the methodology. According to this philosophy, model assumptions serve to understand such implications but are not taken to be true. It is emphasised that researchers implicitly define the “true” clustering and number of clusters by the choice of a particular methodology. The researcher has to take the responsibility to specify the criteria on which such a comparison can be made. The application of this philosophy to socioeconomic data from the 2007 US Survey of Consumer Finances demonstrates some techniques to obtain an interpretable clustering in an ambiguous situation.

...read moreread less

Proceedings Article•DOI•

Latent Space Sparse Subspace Clustering

[...]

Vishal M. Patel, Hien M. Nguyen, René Vidal

01 Dec 2013

TL;DR: A method that learns the projection of data and finds the sparse coefficients in the low-dimensional latent space and applies spectral clustering to a similarity matrix built from these sparse coefficients.

...read moreread less

Abstract: We propose a novel algorithm called Latent Space Sparse Subspace Clustering for simultaneous dimensionality reduction and clustering of data lying in a union of subspaces. Specifically, we describe a method that learns the projection of data and finds the sparse coefficients in the low-dimensional latent space. Cluster labels are then assigned by applying spectral clustering to a similarity matrix built from these sparse coefficients. An efficient optimization method is proposed and its non-linear extensions based on the kernel methods are presented. One of the main advantages of our method is that it is computationally efficient as the sparse coefficients are found in the low-dimensional latent space. Various experiments show that the proposed method performs better than the competitive state-of-the-art subspace clustering methods.

...read moreread less

Journal Article•DOI•

[...]

Félix Iglesias¹, Wolfgang Kastner•Institutions (1)

Vienna University of Technology¹

24 Jan 2013-Energies

TL;DR: The effect of similarity measures in the application of clustering for discovering representatives in cases where correlation is supposed to be an important factor to consider, e.g., time series is checked.

...read moreread less

Abstract: Forecasting and modeling building energy profiles require tools able to discover patterns within large amounts of collected information. Clustering is the main technique used to partition data into groups based on internal and a priori unknown schemes inherent of the data. The adjustment and parameterization of the whole clustering task is complex and submitted to several uncertainties, being the similarity metric one of the first decisions to be made in order to establish how the distance between two independent vectors must be measured. The present paper checks the effect of similarity measures in the application of clustering for discovering representatives in cases where correlation is supposed to be an important factor to consider, e.g., time series. This is a necessary step for the optimized design and development of efficient clustering-based models, predictors and controllers of time-dependent processes, e.g., building energy consumption patterns. In addition, clustered-vector balance is proposed as a validation technique to compare clustering performances.

...read moreread less

Journal Article•DOI•

The k-means clustering technique: General considerations and implementation in Mathematica

[...]

Laurence Morissette, Sylvain Chartier

01 Feb 2013

TL;DR: This tutorial presents a simple yet powerful data clustering technique, through three different algorithms: the Forgy/Lloyd, algorithm, the MacQueen algorithm and the Hartigan & Wong algorithm, and an implementation in Mathematica.

...read moreread less

Abstract: Data clustering techniques are valuable tools for researchers working with large databases of multivariate data. In this tutorial, we present a simple yet powerful one: the k-means clustering technique, through three different algorithms: the Forgy/Lloyd, algorithm, the MacQueen algorithm and the Hartigan & Wong algorithm. We then present an implementation in Mathematica and various examples of the different options available to illustrate the application of the technique. Data clustering techniques are descriptive data analysis techniques that can be applied to multivariate data sets to uncover the structure present in the data. They are particularly useful when classical second order statistics (the sample mean and covariance) cannot be used. Namely, in exploratory data analysis, one of the assumptions that is made is that no prior knowledge about the dataset, and therefore the dataset’s distribution, is available. In such a situation, data clustering can be a valuable tool. Data clustering is a form of unsupervised classification, as the clusters are formed by evaluating similarities and dissimilarities of intrinsic characteristics between different cases, and the grouping of cases is based on those emergent similarities and not on an external criterion. Also, these techniques can be useful for datasets of any dimensionality over three, as it is very difficult for humans to compare items of such complexity reliably without a support to aid the comparison.

...read moreread less

Proceedings Article•DOI•

Scalable Sparse Subspace Clustering

[...]

Xi Peng¹, Lei Zhang¹, Zhang Yi¹•Institutions (1)

Sichuan University¹

23 Jun 2013

TL;DR: Out-of-sample extension of SSC is proposed, named as Scalable Sparse Subspace Clustering (SSSC), which makes SSC feasible to cluster large scale data sets and demonstrates the effectiveness and efficiency of the method comparing with the state-of theart algorithms.

...read moreread less

Abstract: In this paper, we address two problems in Sparse Subspace Clustering algorithm (SSC), i.e., scalability issue and out-of-sample problem. SSC constructs a sparse similarity graph for spectral clustering by using l1-minimization based coefficients, has achieved state-of-the-art results for image clustering and motion segmentation. However, the time complexity of SSC is proportion to the cubic of problem size such that it is inefficient to apply SSC into large scale setting. Moreover, SSC does not handle with out-of-sample data that are not used to construct the similarity graph. For each new datum, SSC needs recalculating the cluster membership of the whole data set, which makes SSC is not competitive in fast online clustering. To address the problems, this paper proposes out-of-sample extension of SSC, named as Scalable Sparse Subspace Clustering (SSSC), which makes SSC feasible to cluster large scale data sets. The solution of SSSC adopts a "sampling, clustering, coding, and classifying" strategy. Extensive experimental results on several popular data sets demonstrate the effectiveness and efficiency of our method comparing with the state-of-the-art algorithms.

...read moreread less

Book Chapter•DOI•

Auto-encoder Based Data Clustering

[...]

Chunfeng Song¹, Feng Liu², Yongzhen Huang¹, Liang Wang¹, Tieniu Tan¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Southeast University²

20 Nov 2013

TL;DR: This paper proposes a new clustering method based on the auto-encoder network, which can learn a highly non-linear mapping function and can obtain stable and effective clustering.

...read moreread less

Abstract: Linear or non-linear data transformations are widely used processing techniques in clustering. Usually, they are beneficial to enhancing data representation. However, if data have a complex structure, these techniques would be unsatisfying for clustering. In this paper, based on the auto-encoder network, which can learn a highly non-linear mapping function, we propose a new clustering method. Via simultaneously considering data reconstruction and compactness, our method can obtain stable and effective clustering. Experiments on three databases show that the proposed clustering model achieves excellent performance in terms of both accuracy and normalized mutual information.

...read moreread less

Journal Article•DOI•

PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks

[...]

Yizhou Sun¹, Brandon Norick¹, Jiawei Han¹, Xifeng Yan², Philip S. Yu³, Xiao Yu¹ - Show less +2 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of California, Santa Barbara², University of Illinois at Chicago³

01 Sep 2013-ACM Transactions on Knowledge Discovery From Data

TL;DR: This work proposes to integrate meta-path selection with user-guided clustering to cluster objects in networks, where a user first provides a small set of object seeds for each cluster as guidance, and an effective and efficient iterative algorithm, PathSelClus, is proposed to learn the model, where the clustering quality and the meta- path weights mutually enhance each other.

...read moreread less

Abstract: Real-world, multiple-typed objects are often interconnected, forming heterogeneous information networks. A major challenge for link-based clustering in such networks is their potential to generate many different results, carrying rather diverse semantic meanings. In order to generate desired clustering, we propose to use meta-path, a path that connects object types via a sequence of relations, to control clustering with distinct semantics. Nevertheless, it is easier for a user to provide a few examples (seeds) than a weighted combination of sophisticated meta-paths to specify her clustering preference. Thus, we propose to integrate meta-path selection with user-guided clustering to cluster objects in networks, where a user first provides a small set of object seeds for each cluster as guidance. Then the system learns the weight for each meta-path that is consistent with the clustering result implied by the guidance, and generates clusters under the learned weights of meta-paths. A probabilistic approach is proposed to solve the problem, and an effective and efficient iterative algorithm, PathSelClus, is proposed to learn the model, where the clustering quality and the meta-path weights mutually enhance each other. Our experiments with several clustering tasks in two real networks and one synthetic network demonstrate the power of the algorithm in comparison with the baselines.

...read moreread less

Journal Article•DOI•

A survey on enhanced subspace clustering

[...]

Kelvin Sim¹, Vivekanand Gopalkrishnan², Arthur Zimek³, Gao Cong⁴•Institutions (4)

Agency for Science, Technology and Research¹, IBM², Ludwig Maximilian University of Munich³, Nanyang Technological University⁴

01 Mar 2013-Data Mining and Knowledge Discovery

TL;DR: This survey presents enhanced approaches to subspace clustering by discussing the problems they are solving, their cluster definitions and algorithms, and the related works in high-dimensional clustering.

...read moreread less

Abstract: Subspace clustering finds sets of objects that are homogeneous in subspaces of high-dimensional datasets, and has been successfully applied in many domains. In recent years, a new breed of subspace clustering algorithms, which we denote as enhanced subspace clustering algorithms, have been proposed to (1) handle the increasing abundance and complexity of data and to (2) improve the clustering results. In this survey, we present these enhanced approaches to subspace clustering by discussing the problems they are solving, their cluster definitions and algorithms. Besides enhanced subspace clustering, we also present the basic subspace clustering and the related works in high-dimensional clustering.

...read moreread less

Journal Article•DOI•

Soft clustering -- Fuzzy and rough approaches and their extensions and derivatives

[...]

Georg Peters¹, Fernando Crespo², Pawan Lingras³, Richard Weber⁴•Institutions (4)

Munich University of Applied Sciences¹, Valparaiso University², University of Saint Mary³, University of Chile⁴

01 Feb 2013-International Journal of Approximate Reasoning

TL;DR: This article compares k-mean to fuzzy c-means and rough k-Means as important representatives of soft clustering, and surveys important extensions and derivatives of these algorithms.

...read moreread less

Journal Article•DOI•

A Fast and Robust Level Set Method for Image Segmentation Using Fuzzy Clustering and Lattice Boltzmann Method

[...]

Souleymane Balla-Arabe¹, Xinbo Gao¹, Bin Wang¹•Institutions (1)

Xidian University¹

07 Mar 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An energy functional based on the fuzzy c -means objective function which incorporates the bias field that accounts for the intensity inhomogeneity of the real-world image and deduce a fuzzy external force for the LBM solver based onThe model by Zhao is designed.

...read moreread less

Abstract: In the last decades, due to the development of the parallel programming, the lattice Boltzmann method (LBM) has attracted much attention as a fast alternative approach for solving partial differential equations In this paper, we first designed an energy functional based on the fuzzy c -means objective function which incorporates the bias field that accounts for the intensity inhomogeneity of the real-world image Using the gradient descent method, we obtained the corresponding level set equation from which we deduce a fuzzy external force for the LBM solver based on the model by Zhao The method is fast, robust against noise, independent to the position of the initial contour, effective in the presence of intensity inhomogeneity, highly parallelizable and can detect objects with or without edges Experiments on medical and real-world images demonstrate the performance of the proposed method in terms of speed and efficiency

...read moreread less

Journal Article•DOI•

Fuzzy time series forecasting with a novel hybrid approach combining fuzzy c-means and neural networks

[...]

Erol Egrioglu¹, Cagdas Hakan Aladag², Ufuk Yolcu³•Institutions (3)

Ondokuz Mayıs University¹, Hacettepe University², Giresun University³

01 Feb 2013-Expert Systems With Applications

TL;DR: A hybrid fuzzy time series approach with fuzzy c-means clustering method and artificial neural networks employed for fuzzification and defining fuzzy relationships, respectively is proposed to reach more accurate forecasts.

...read moreread less

Abstract: In recent years, time series forecasting studies in which fuzzy time series approach is utilized have got more attentions. Various soft computing techniques such as fuzzy clustering, artificial neural networks and genetic algorithms have been used in fuzzy time series method to improve the method. While fuzzy clustering and genetic algorithms are being used for fuzzification, artificial neural networks method is being preferred for using in defining fuzzy relationships. In this study, a hybrid fuzzy time series approach is proposed to reach more accurate forecasts. In the proposed hybrid approach, fuzzy c-means clustering method and artificial neural networks are employed for fuzzification and defining fuzzy relationships, respectively. The enrollment data of University of Alabama is forecasted by using both the proposed method and the other fuzzy time series approaches. As a result of comparison, it is seen that the most accurate forecasts are obtained when the proposed hybrid fuzzy time series approach is used.

...read moreread less

Proceedings Article•DOI•

Constrained Clustering and Its Application to Face Clustering in Videos

[...]

Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji¹•Institutions (1)

Rensselaer Polytechnic Institute¹

23 Jun 2013

TL;DR: An efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar, is introduced and is applicable to other clustering algorithms to significantly reduce the computational cost.

...read moreread less

Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pair wise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pair wise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pair wise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-the art algorithms.

...read moreread less

Journal Article•DOI•

Modeling urban evolution using neural networks, fuzzy logic and GIS: The case of the Athens metropolitan area

[...]

George Grekousis¹, Panos Manetos¹, Yorgos N. Photis¹•Institutions (1)

University of Thessaly¹

01 Feb 2013-Cities

TL;DR: An artificial intelligence approach integrated with geographical information systems (GISs) for modeling urban evolution using fuzzy logic and neural networks to provide a synthetic spatiotemporal methodology for the analysis, prediction and interpretation of urban growth.

...read moreread less

Journal Article•DOI•

Cluster center initialization algorithm for K-modes clustering

[...]

Shehroz S. Khan¹, Amir Ahmad²•Institutions (2)

University of Waterloo¹, King Abdulaziz University²

15 Dec 2013-Expert Systems With Applications

TL;DR: A new method for selecting the most relevant attributes, namely Prominent attributes, is proposed, which is compared with another existing method to find Significant attributes for unsupervised learning, and performs multiple clustering of data to find initial cluster centers.

...read moreread less

Abstract: Partitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results.

...read moreread less

Proceedings Article•

Do more views of a graph help? Community detection and clustering in multi-graphs

[...]

Evangelos E. Papalexakis¹, Leman Akoglu², Dino Ience•Institutions (2)

Carnegie Mellon University¹, Stony Brook University²

09 Jul 2013

TL;DR: This work proposes Multi-CLUS and GraphFuse, two multi-graph clustering techniques powered by Minimum Description Length and Tensor analysis, respectively, and demonstrates higher clustering accuracy than state-of-the-art baselines that do not exploit the multi-view nature of the network data.

...read moreread less

Abstract: Given a co-authorship collaboration network, how well can we cluster the participating authors into communities? If we also consider their citation network, based on the same individuals, is it possible to do a better job? In general, given a network with multiple types (or views) of edges (e.g., collaboration, citation, friendship), can community detection and graph clustering benefit? In this work, we propose Multi-CLUS and GraphFuse, two multi-graph clustering techniques powered by Minimum Description Length and Tensor analysis, respectively. We conduct experiments both on real and synthetic networks, evaluating the performance of our approaches. Our results demonstrate higher clustering accuracy than state-of-the-art baselines that do not exploit the multi-view nature of the network data. Finally, we address the fundamental question posed in the title, and provide a comprehensive answer, based on our systematic analysis.

...read moreread less

Journal Article•DOI•

An improved k-prototypes clustering algorithm for mixed numeric and categorical data

[...]

Jinchao Ji¹, Jinchao Ji², Tian Bai², Chunguang Zhou², Chao Ma², Zhe Wang² - Show less +2 more•Institutions (2)

Northeast Normal University¹, Jilin University²

23 Nov 2013-Neurocomputing

TL;DR: An improved k-prototypes algorithm to cluster mixed data is proposed, and a new measure to calculate the dissimilarity between data objects and prototypes of clusters is proposed that takes into account the significance of different attributes towards the clustering process.

...read moreread less

Journal Article•DOI•

Clustering Large Probabilistic Graphs

[...]

George Kollios¹, Michalis Potamias, Evimaria Terzi¹•Institutions (1)

Boston University¹

01 Feb 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A connection is established between the objective function and correlation clustering to propose practical approximation algorithms for the problem of clustering probabilistic graphs and show the practicality of the techniques using a large social network of Yahoo! users consisting of one billion edges.

...read moreread less

Abstract: We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.

...read moreread less

Collapse