scispace - formally typeset
Search or ask a question
Author

Liyong Zhang

Bio: Liyong Zhang is an academic researcher from Dalian University of Technology. The author has contributed to research in topics: Fuzzy clustering & Cluster analysis. The author has an hindex of 10, co-authored 26 publications receiving 347 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion.
Abstract: Missing values are a common phenomenon when dealing with real-world data sets. Analysis of incomplete data sets has become an active area of research. In this paper, we focus on the problem of clustering incomplete data, which is intended to introduce some prior distribution information of the missing values into the algorithm of fuzzy clustering. First, non-parametric hypothesis testing is employed to describe the missing values adhering to a certain Gaussian distribution as probabilistic information granules based on the nearest neighbors of incomplete data. Second, we propose a novel clustering model, in which probabilistic information granules of missing values are incorporated into the Fuzzy C-Means clustering of incomplete data by involving the maximum likelihood criterion. Third, the clustering model is optimized by using a tri-level alternating optimization utilizing the method of Lagrange multipliers. The convergence and the time complexity of the clustering algorithm are also discussed. The experiments reported both on synthetic and real-world data sets demonstrate that the proposed approach can effectively realize clustering of incomplete data.

95 citations

Journal ArticleDOI
TL;DR: In this paper, missing attributes are represented as intervals, and a novel fuzzy c-means algorithm for incomplete data based on nearest-neighbor intervals is proposed that can enhances the robustness of missing attribute imputation compared with other numerical imputation methods.
Abstract: Partially missing data sets are a prevailing problem in clustering analysis. In this paper, missing attributes are represented as intervals, and a novel fuzzy c-means algorithm for incomplete data based on nearest-neighbor intervals is proposed. The algorithm estimates the nearest-neighbor interval representation of missing attributes by using the attribute distribution information of the data sets sufficiently, which can enhances the robustness of missing attribute imputation compared with other numerical imputation methods. Also, the convex hyper-polyhedrons formed by interval prototypes can present the uncertainty of missing attributes, and simultaneously reflect the shape of the clusters to some degree, which is helpful in enhancing the robustness of clustering analysis. Comparisons and analysis of the experimental results for several UCI data sets demonstrate the capability of the proposed algorithm.

74 citations

Journal ArticleDOI
TL;DR: This paper realizes the kernel clustering of incomplete data set by means of a gradient-based alternating optimization of interval data clustering based on the interval kernel distance.

46 citations

Journal ArticleDOI
Xiaochen Lai1, Xia Wu1, Liyong Zhang1, Wei Lu1, Chongquan Zhong1 
TL;DR: An architecture named tracking-removed autoencoder (TRAE) is proposed by redesigning the input structure of hidden neurons in a dynamic way on the basis of the traditional AE to strengthen the dependence of missing values on known attribute values for each incomplete record.

37 citations

Journal ArticleDOI
01 Oct 2013
TL;DR: An interval representation of missing attributes based on nearest-NEighbor information, named nearest-neighbor interval, is put forward, and a hybrid approach utilizing genetic algorithm and fuzzy c-means is presented for incomplete data clustering.
Abstract: Incomplete data are often encountered in data sets used in clustering problems, and inappropriate treatment of incomplete data can significantly degrade the clustering performance. In view of the uncertainty of missing attributes, we put forward an interval representation of missing attributes based on nearest-neighbor information, named nearest-neighbor interval, and a hybrid approach utilizing genetic algorithm and fuzzy c-means is presented for incomplete data clustering. The overall algorithm is within the genetic algorithm framework, which searches for appropriate imputations of missing attributes in corresponding nearest-neighbor intervals to recover the incomplete data set, and hybridizes fuzzy c-means to perform clustering analysis and provide fitness metric for genetic optimization simultaneously. Several experimental results on a set of real-life data sets are presented to demonstrate the better clustering performance of our hybrid approach over the compared methods.

37 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm yields sufficient and sensible imputation performance results.

256 citations

Journal ArticleDOI
TL;DR: This paper aims at reviewing and analyzing related studies carried out in recent decades, from the experimental design perspective, and identifying limitations in the existing body of literature based upon which some directions for future research can be gleaned.
Abstract: Missing value imputation (MVI) has been studied for several decades being the basic solution method for incomplete dataset problems, specifically those where some data samples contain one or more missing attribute values. This paper aims at reviewing and analyzing related studies carried out in recent decades, from the experimental design perspective. Altogether, 111 journal papers published from 2006 to 2017 are reviewed and analyzed. In addition, several technical issues encountered during the MVI process are addressed, such as the choice of datasets, missing rates and missingness mechanisms, and the MVI techniques and evaluation metrics employed, are discussed. The results of analysis of these issues allow limitations in the existing body of literature to be identified based upon which some directions for future research can be gleaned.

240 citations

Journal ArticleDOI
TL;DR: A hybrid approach integrating the Fuzzy C-Means-based imputation method with the Genetic Algorithm is develop for missing traffic volume data estimation based on inductance loop detector outputs to show the proposed approach outperforms the conventional methods under prevailing traffic conditions.
Abstract: Although various innovative traffic sensing technologies have been widely employed, incomplete sensor data is one of the most major problems to significantly degrade traffic data quality and integrity. In this study, a hybrid approach integrating the Fuzzy C-Means (FCM)-based imputation method with the Genetic Algorithm (GA) is develop for missing traffic volume data estimation based on inductance loop detector outputs. By utilizing the weekly similarity among data, the conventional vector-based data structure is firstly transformed into the matrix-based data pattern. Then, the GA is applied to optimize the membership functions and centroids in the FCM model. The experimental tests are conducted to verify the effectiveness of the proposed approach. The traffic volume data collected at different temporal scales were used as the testing dataset, and three different indicators, including root mean square error, correlation coefficient, and relative accuracy, are utilized to quantify the imputation performance compared with some conventional methods (Historical method, Double Exponential Smoothing, and Autoregressive Integrated Moving Average model). The results show the proposed approach outperforms the conventional methods under prevailing traffic conditions.

182 citations

Journal ArticleDOI
TL;DR: Two new hybrids of FCM and improved self-adaptive PSO are presented, which combine FCM with a recent version of PSO, the IDPSO, which adjusts PSO parameters dynamically during execution, aiming to provide better balance between exploration and exploitation, avoiding falling into local minima quickly and thereby obtaining better solutions.
Abstract: We present two new hybrids of FCM and improved self-adaptive PSO.The methods are based on the FCM-PSO algorithm.We use FCM to initialize one particle to achieve better results in less iterations.The new methods are compared to FCM-PSO using many real and synthetic datasets.The proposed methods consistently outperform FCM-PSO in three evaluation metrics. Fuzzy clustering has become an important research field with many applications to real world problems. Among fuzzy clustering methods, fuzzy c-means (FCM) is one of the best known for its simplicity and efficiency, although it shows some weaknesses, particularly its tendency to fall into local minima. To tackle this shortcoming, many optimization-based fuzzy clustering methods have been proposed in the literature. Some of these methods are based solely on a metaheuristic optimization, such as particle swarm optimization (PSO) whereas others are hybrid methods that combine a metaheuristic with a traditional partitional clustering method such as FCM. It is demonstrated in the literature that methods that hybridize PSO and FCM for clustering have an improved accuracy over traditional partitional clustering approaches. On the other hand, PSO-based clustering methods have poor execution time in comparison to partitional clustering techniques. Another problem with PSO-based clustering is that the current PSO algorithms require tuning a range of parameters before they are able to find good solutions. In this paper we introduce two hybrid methods for fuzzy clustering that aim to deal with these shortcomings. The methods, referred to as FCM-IDPSO and FCM2-IDPSO, combine FCM with a recent version of PSO, the IDPSO, which adjusts PSO parameters dynamically during execution, aiming to provide better balance between exploration and exploitation, avoiding falling into local minima quickly and thereby obtaining better solutions. Experiments using two synthetic data sets and eight real-world data sets are reported and discussed. The experiments considered the proposed methods as well as some recent PSO-based fuzzy clustering methods. The results show that the methods introduced in this paper provide comparable or in many cases better solutions than the other methods considered in the comparison and were much faster than the other state of the art PSO-based methods.

128 citations

Journal ArticleDOI
TL;DR: Experimental results on four datasets from UCI machine learning repository suggests that the GTRS significantly improves the generality while keeping similar levels of accuracy in comparison to other three-way and similar models.

98 citations