scispace - formally typeset
Search or ask a question

Showing papers on "Dunn index published in 2018"


Journal ArticleDOI
TL;DR: The proposed algorithm achieves better results compared with other state-of-the-art algorithms when applied to high-dimensional datasets and confirms the importance of estimating multidimensional learning coefficients that consider particle movements in all the dimensions of the feature space.
Abstract: Particle swarm optimization (PSO) algorithm is widely used in cluster analysis. However, it is a stochastic technique that is vulnerable to premature convergence to sub-optimal clustering solutions. PSO-based clustering algorithms also require tuning of the learning coefficient values to find better solutions. The latter drawbacks can be evaded by setting a proper balance between the exploitation and exploration behaviors of particles while searching the feature space. Moreover, particles must take into account the magnitude of movement in each dimension and search for the optimal solution in the most populated regions in the feature space. This study presents a novel approach for data clustering based on particle swarms. In this proposal, the balance between exploitation and exploration processes is considered using a combination of (i) kernel density estimation technique associated with new bandwidth estimation method to address the premature convergence and (ii) estimated multidimensional gravitational learning coefficients. The proposed algorithm is compared with other state-of-the-art algorithms using 11 benchmark datasets from the UCI Machine Learning Repository in terms of classification accuracy, repeatability represented by the standard deviation of the classification accuracy over different runs, and cluster compactness represented by the average Dunn index values over different runs. The results of Friedman Aligned-Ranks test with Holm's test over the average classification accuracy and Dunn index values indicate that the proposed algorithm achieves better accuracy and compactness when compared with other algorithms. The significance of the proposed algorithm is represented in addressing the limitations of the PSO-based clustering algorithms to push forward clustering as an important technique in the field of expert systems and machine learning. Such application, in turn, enhances the classification accuracy and cluster compactness. In this context, the proposed algorithm achieves better results compared with other state-of-the-art algorithms when applied to high-dimensional datasets (e.g., Landsat and Dermatology). This finding confirms the importance of estimating multidimensional learning coefficients that consider particle movements in all the dimensions of the feature space. The proposed algorithm can likewise be applied in repeatability matters for better decision making, as in medical diagnosis, as proved by the low standard deviation obtained using the proposed algorithm in conducted experiments.

95 citations


Journal ArticleDOI
12 Jun 2018
TL;DR: This paper presents the approach that uses feature selection to refine the clustering of accelerometer data to detect physical activity, and provides an optimistic and usable approach to recognize activities using either a smartphone or smartwatch.
Abstract: Mobile and wearable devices now have a greater capability of sensing human activity ubiquitously and unobtrusively through advancements in miniaturization and sensing abilities. However, outstanding issues remain around the energy restrictions of these devices when processing large sets of data. This paper presents our approach that uses feature selection to refine the clustering of accelerometer data to detect physical activity. This also has a positive effect on the computational burden that is associated with processing large sets of data, as energy efficiency and resource use is decreased because less data is processed by the clustering algorithms. Raw accelerometer data, obtained from smartphones and smartwatches, have been preprocessed to extract both time and frequency domain features. Principle component analysis feature selection (PCAFS) and correlation feature selection (CFS) have been used to remove redundant features. The reduced feature sets have then been evaluated against three widely used clustering algorithms, including hierarchical clustering analysis (HCA), k-means, and density-based spatial clustering of applications with noise (DBSCAN). Using the reduced feature sets resulted in improved separability, reduced uncertainty, and improved efficiency compared with the baseline, which utilized all features. Overall, the CFS approach in conjunction with HCA produced higher Dunn Index results of 9.7001 for the phone and 5.1438 for the watch features, which is an improvement over the baseline. The results of this comparative study of feature selection and clustering, with the specific algorithms used, has not been performed previously and provides an optimistic and usable approach to recognize activities using either a smartphone or smartwatch.

31 citations


Journal ArticleDOI
TL;DR: Investigating spatial structure of precipitation variation revealed that the DWE had a decreasing and increasing relationship with longitude and latitude, respectively, in Iran.
Abstract: The present study proposed a time-space framework using discrete wavelet transform-based multiscale entropy (DWE) approach to analyze and spatially categories the precipitation variation in Iran. To this end, historical monthly precipitation time series during 1960–2010 from 31 rain gauges were used in this study. First, wavelet-based de-noising approach was applied to diminish the effect of noise in precipitation time series which may affect the entropy values. Next, Daubechies (db) mother wavelets (db5–db10) were used to decompose the precipitation time series. Subsequently, entropy concept was applied to the sub-series to measure the uncertainty and disorderliness at multiple scales. According to the pattern of entropy across scales, each cluster was assigned an entropy signature that provided an estimation of the entropy pattern of precipitation in each cluster. Spatial categorization of rain gauges was performed using DWE values as input data to k-means and self-organizing map (SOM) clustering techniques. According to evaluation criteria, it was proved that k-means with clustering number equal to 5 with Silhouette coefficient = 0.33 , Davis–Bouldin = 1.18 and Dunn index = 1.52 performed better in determining homogenous areas. Finally, investigating spatial structure of precipitation variation revealed that the DWE had a decreasing and increasing relationship with longitude and latitude, respectively, in Iran.

27 citations


Journal ArticleDOI
TL;DR: In this paper, a wavelet transform based multiscale entropy (WME) and wavelet-based multi-scale relative entropy (WMRE) approach was used to analyze and gage the complexity of the precipitation series and spatially classify the raingauges in Iran.
Abstract: The hydrologic process and dynamic system of precipitation is influenced by many physical factors which are excessively complex and variable. Present study used a wavelet transform based multiscale entropy (WME) and wavelet-based multiscale relative entropy (WMRE) approach in order to analyze and gage the complexity of the precipitation series and spatially classify the raingauges in Iran. For this end, historical annual precipitation data of 51 years (1960–2010) from 31 raingauges was decomposed using WT in which smooth Daubechies (db) mother wavelet (db5–db10), optimal level of decomposition and boundary extensions were considered. Next, entropy concept was applied for components obtained from WT to measure of dispersion, uncertainty, disorderliness and diversification in a multi-scale form. Spatial classification of raingauges was performed using WME and WMRE values as input data to SOM and k-means approaches. Three validity indices namely Davis Bouldin (DB), Silhouette coefficient (SC) and Dunn index were used to validate the proposed model’s efficiency. Based on results, it was observed that k-means approach had better performance in determining homogenous areas with SC = 0.337, DB = 0.769 and Dunn = 1.42. Finally, spatial structure of precipitation variation in latitude and longitude directions demonstrated that WME and WMRE values had a decreasing trend with latitude, however, it was seen that WME and WMRE had an increasing relationship with longitude in Iran.

20 citations


Book ChapterDOI
19 Apr 2018
TL;DR: The aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more.
Abstract: Data mining is the extraction of different data of intriguing as such (constructive, relevant, constructive, previously unexplored and considerably valuable) patterns or information from very large stack of data or different dataset. In other words, it is the experimental exploration of associations, links, and mainly the overall patterns that prevails in large datasets but is hidden or unknown. So, to explore the performance analysis using different clustering techniques we used R Language. This R language is a tool, which allows the user to analyse the data from various and different perspective and angles, in order to get a proper experimental results and in order to derive a meaningful relationships. In this paper, we are studying, analysing and comparing various algorithms and their techniques used for cluster analysis using R language. Our aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more. Finally as per the basics of the results that obtained we analyzed and compared, validated the efficiency of many different algorithms with respect to one another.

13 citations


Journal ArticleDOI
TL;DR: An effective precipitation-based regionalization methodology by conjugating both temporal pre-processing and spatial clustering approaches in a way to take advantage of multiscale properties of precipitation time series is proposed.
Abstract: Determination of homogenous precipitation-based regions is a very important task in effective management of water resources. The present study tried to propose an effective precipitation-based regionalization methodology by conjugating both temporal pre-processing and spatial clustering approaches in a way to take advantage of multiscale properties of precipitation time series. Annual precipitation data of 51 years (1960-2010) for 31 rain gauges (RGs) were collected and used in proposed clustering approaches. Discreet wavelet transform (DWT) was used to capture the time-frequency attributes of the time series and multiscale regionalization was performed by using k-means and Self Organizing Maps (SOM) clustering techniques. Daubechies function (db) was selected as mother wavelet to decompose the precipitation time series. Also, proper boundary extensions and decomposition level were applied. Different combinations of the approximation (A) and detail (D) coefficients were used to determine the input dataset as a basis of spatial clustering. The proposed model’s efficiency in spatial clustering stage was verified using three different indexes namely, Silhouette Coefficient (SC), Dunn index and Davis Bouldin index (DB). Results approved superior performance of k-means technique in comparison to SOM. It was also deduced that DWT-based regionalization methodology showed improvements in comparison to historical-based models. Cross mutual information was used to investigate the RGs of cluster 3’s homogeneousness in DWT-k-means approach. Results of non-linear correlation approach verified homogeneity of cluster 3. Verifications based on mean annual precipitation values of rain gauges in each cluster also approved the capability of multiscale approach in precipitation regionalization.

9 citations


Journal ArticleDOI
TL;DR: This work adopts sum of squared error (SSE) approach and Dunn index to measure the quality of clusters and performs the experimentation on real world crime data to identify spatiotemporal crime clusters.
Abstract: The various sources generate large volume of spatiotemporal data of different types including crime events. In order to detect crime spot and predict future events, their analysis is important. Crime events are spatiotemporal in nature; therefore a distance function is defined for spatiotemporal events and is used in Fuzzy C-Means algorithm for crime analysis. This distance function takes care of both spatial and temporal components of spatiotemporal data. We adopt sum of squared error (SSE) approach and Dunn index to measure the quality of clusters. We also perform the experimentation on real world crime data to identify spatiotemporal crime clusters.

8 citations


Book ChapterDOI
01 Jan 2018
TL;DR: A model of a consumer as a bag and each bag consists of instances, where each instance will represent a day or a month of consumption and a multi-instance clustering algorithm is applied to solve the consumer clustering problem.
Abstract: With the rollout of smart metering infrastructure at large scale, demand-response programs may now be tailored based on consumption and production patterns mined from sensed data. In previous works, groups of similar energy consumption profiles were obtained. But, discovering typical consumption profiles is not enough, it is also important to reveal various preferences, behaviors and characteristics of individual consumers. However, the current approaches cannot determine clusters of similar consumer or prosumer households. To tackle this issue, we propose to model the consumer clustering problem as a multi-instance clustering problem and we apply a multi-instance clustering algorithm to solve it. We model a consumer as a bag and each bag consists of instances, where each instance will represent a day or a month of consumption. Internal indices were used for evaluating our clustering process. The obtained results are general applicable, and will be useful in a general business analytics context.

6 citations


Proceedings ArticleDOI
01 Aug 2018
TL;DR: Based on RFM and LRFM outliers found that the customers have lost customer groups, low consumption and uncertain new customers becoming loyal customers.
Abstract: The aims of this study obtain to outlier data on RFM (Recency, Frequency, Monetary) and LRFM (Length, Recency, Frequency, Monetary) models. The outlier have found analyzed to determining customer loyalty. There five step in this study. First is determining data based on RFM and LRFM models attributes. Second is normalizing the data with min-max method. Third is clustering data with DBSCAN algorithm after determining best cluster with Dunn Index method. Last is analizing the data to determining customer loyalty. This study found that there are 8 outliers on RFM and 9 outliers on LRFM. Based on RFM and LRFM outliers found that the customers have lost customer groups, low consumption and uncertain new customers becoming loyal customers.

6 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: Results show that a two-cluster characterization of the kinematic knee data in each plane is quite effective and that the men and women knee patterns are balanced between the two clusters and, for 80% of participants, the right and left knees are in the same cluster.
Abstract: The purpose of this study is to investigate data clustering to determine representative patterns in three-dimensional (3D) knee kinematic data measurements. Kinematic data are high-dimensional vectors to describe the temporal variations of the three fundamental angles of knee rotation during a walking cycle, namely the abduction/adduction angle, with respect to the frontal plane, the flexion/extension angle, with respect to the sagittal plane, and internal/external angle, with respect to the transverse plane. To offset the curse of dimensionality, inherent to high dimensional data pattern analysis, the method reduces dimensionality by isometric mapping without affecting information content. The data thus simplified is then clustered by the DBSCAN algorithm. The method has been tested on a large database of 165 healthy knee kinematic data measurements. Clusters are validated in terms of the silhouette index, the Dunn index, and connectivity. Results show that a two-cluster characterization of the kinematic knee data in each plane is quite effective. A further clinical investigation shows that the men and women knee patterns are balanced between the two clusters and, for 80% of participants, the right and left knees are in the same cluster.

4 citations


Journal ArticleDOI
TL;DR: The attained results show that the proposed ensemble clustering outperforms the other state of the art clustering techniques.
Abstract: Clustering is being used in different fields of research, including data mining, taxonomy, document retrieval, image segmentation, pattern classification. Text clustering is a technique through which text/ documents are divided into a particular number of groups, so that text within each group is related in contents. In this paper, the idea of ensemble text clustering of majority voting is defined. For this purpose, different clustering methods such as fuzzy c-means, k-means, agglomerative, Gustafson Kessel and k-medoid are used. After performing the pre-processing of the documents, inverse document frequency (IDF) has been achieved by the provided dataset. The achieved IDF is considered as input to the clustering algorithms. Dunn Index and Davies Bouldin Index have been calculated which are applied to analyze the usefulness of the proposed ensemble clustering. In this work, a dataset "Textclus" which contains four different classes, history, education, politician and art as a text is applied. Additionally, another dataset "20newsgroups" is also applied for analysis. The clustering quality measures have also been calculated from the proposed ensemble clustering results. The attained results show that the proposed ensemble clustering outperforms the other state of the art clustering techniques.

Book ChapterDOI
19 Apr 2018
TL;DR: The aim of the research is to combine the conventional clustering algorithms based on rough sets and fuzzy sets with metaheuristics like firefly algorithm and fuzzyFirefly algorithm with meta heuristics using Gaussian kernel in place of the traditional Euclidean distance measure.
Abstract: The aim of our research is to combine the conventional clustering algorithms based on rough sets and fuzzy sets with metaheuristics like firefly algorithm and fuzzy firefly algorithm. Image segmentation is carried out using the resultant hybrid clustering algorithms. The performance of the proposed algorithms is compared with numerous contemporary clustering algorithms and their firefly fused counter-parts. We further bolster the performance of our proposed algorithm my using Gaussian kernel in place of the traditional Euclidean distance measure. We test the performance of our algorithms using two performance indices, namely DB (Davis Bouldin) indexand Dunn index. Our experimental results highlight the advantages of using metaheuristics and kernels over the existing clustering algorithms.

Book ChapterDOI
08 Jun 2018
TL;DR: A new radial layout visualization, called the Quasi-circular mapping visualization (QCMV), is introduced to address the problems of ordering DAs and visual results of crowding which hamper clustering analysis.
Abstract: Radial coordinate visualization (RadViz) and Star Coordinates (SC) can effectively map high dimensional data to low dimensional space, owing to which can place an arbitrary number of Dimension Anchors (DAs) Nevertheless, the problem owner is faced with ordering DAs, which is a NP-complete problem and visual results of crowding which hamper clustering analysis We introduce a new radial layout visualization, called the Quasi-circular mapping visualization (QCMV), to address those problems in this paper Firstly, QCMV extend the original dimension of datasets by the probability distribution histogram of the dimension and affinity propagation (AP) algorithm In additional, distributing them on the unit circle by their correlation according to the correlation of the extended dimensions Then, mapping the dimensions extended and reordered data to integrate a polygon in the Quasi-circular space and visualizing them by the geometric center and area of the polygon in the three dimension Finally strengthening their visual clustering effect with t-SNE We also compare the visual clustering results of RadViz, SC and QCMV with two indexes, correct rate and Dunn index on visually analyzing the three datasets It shows better effect of visual clustering with QCMV