scispace - formally typeset
Search or ask a question

Showing papers on "Dunn index published in 2015"


Proceedings ArticleDOI
01 Aug 2015
TL;DR: Experimental investigation demonstrates that results obtained by the use of Dunn index or Davies-Bouldin index are better than those by Ball-Hall or BetaCV index, with those using Davies-Bs performing the best overall.
Abstract: Fuzzy rule interpolation (FRI) has been a vital reasoning tool for sparse fuzzy rule-based systems. Throughout interpolative reasoning, an FRI system may produce a large number of interpolated rules, which generally serve no further purpose once the required outcomes have been obtained. However, this abandoned pool of interpolated rules can be used to improve the existing sparse rule base, because they contain useful information on the underlying problem domain. Efficient extraction of knowledge from such a pool of interpolated rules are indeed helpful to analyse and update the sparse rule base, leading to a dynamic sparse fuzzy rule base for building an enhanced fuzzy system. Following this idea, a genetic algorithm (GA) based dynamic fuzzy rule interpolation framework has been proposed recently. This paper presents an extension of the dynamic FRI system. In particular, it investigates different fitness functions and their effects on the outcomes of the GA-based system. A variety of fitness functions based on cluster quality indices are employed and tested, including Dunn Index, Davies-Boulding Index, Ball-Hall Index and BetaCV Index. Experimental investigation demonstrates that results obtained by the use of Dunn index or Davies-Bouldin index are better than those by Ball-Hall or BetaCV index, with those using Davies-Bouldin index performing the best overall. Such results offer an empirical guideline for the selection of the fitness function in implementing accurate GA-based dynamic FRI systems.

17 citations


Journal ArticleDOI
TL;DR: This research helped to assemble a diversified portfolio of stocks with the use of clustering and will help investor community in specific and in turn it helps the society and economy in general for better allocation of wealth.
Abstract: This research paper proposed agent based framework for portfolio management using non-hierarchical clustering method. The framework included various agents such as data agent, clustering agent, ranking agent, portfolio manager and user agent. The data agent collected financial ratio of Nifty 50 companies from financial database. Clustering agents generated clusters and DB index computed to find optimum cluster size of each method. Validation agent evaluated the performance of k -means, k -medoids and fast k -means using intra-class inertia. Clusters generated by k -means used for investment and portfolio analysis using Markowitz model. This research helped to assemble a diversified portfolio of stocks with the use of clustering Keywords Clustering, Data mining (DM), Davis-Bouldin (DB) Index, Dunn Index, k -means, k -medoids, Partitioning Around Medoids(PAM), Silhouette index 1. INTRODUCTION Data mining is a process of automatically discovering knowledge and predicting future trends from large financial markets. It creates opportunities for companies to make proactive and knowledge-driven decision in order to gain a competitive advantage. There are varieties of DM techniques available over past decades that include classification, similarity search, cluster analysis, association rule mining. Data mining techniques are also widely applied in number of financial areas, including predicting stock prices, predicting stock indices, portfolio management, portfolio risk management, trend detection, designing recommender [27, 28]. Portfolio management is one of major problem in financial domain. In today‟s competitive financial environment, an investor wants to earn maximum profit from his assets. An investor considers an investment in securities faces with the problem of choosing from among a large number of securities. He confuses in which security he has to invest. It depends upon the risk-return characteristics of individual securities. He selects most desirable securities and likes to allocate his funds over this group of securities. Again, he faces with the problem of deciding which securities to select and how much to invest in each. The investor chooses the optimal portfolio taking into consideration the risk and return characteristics of all possible portfolios. The research work describes about an agent based framework for portfolio management using non-hierarchical clustering methods. The proposed framework consist of various agents such as data agent, clustering agent, ranking agent, user agent and portfolio manager. This framework assists investors in strategic planning and investment decision-making. This research work can help to assemble a diversified portfolio of stocks with the help of clustering and also will help investor community in specific and in turn it helps the society and economy in general for better allocation of wealth. In this research paper,

8 citations


Book ChapterDOI
01 Jan 2015
TL;DR: The findings suggest that the Integrative Multiview Clustering provides more compact and separated clusters and the interpretation of the resulting partition is clearer than the one obtained by classical approache.
Abstract: The main goal of this work is to develop a methodology for finding nutritional patterns based on a variety of subject characteristics which can contribute to better understand the interactions between nutrition and health, provided that the complexity of the phenomenon gives poor performance using classical approaches. An innovative methodology based on advanced clustering techniques is proposed in order to find more compact patterns or clusters. The Integrative Multiview Clustering (IMC) combines Multiview Clustering approach with crossing operations over the several partitions obtained. Comparison with other classical clustering techniques is provided to assess the performance of our approach. The Dunn-like cluster validity index proposed by Bezdek & Pal is used for the comparison from a structural point of view, as it is more robust than the original Dunn index. The performance of the IMC method is better than other popular clustering techniques based on the Dunn-like Index. Our findings suggest that the Integrative Multiview Clustering provides more compact and separated clusters. In addition, IMC helps to reduce the high dimensionality of the data based on multiview division of attributes and also, the resulting partition is easier to interpret. Using the Integrative Multiview Clustering approach, a good partition is obtained from a structural point of view. Also, the interpretation of the resulting partition is clearer than the one obtained by classical approache

5 citations


Proceedings ArticleDOI
23 Apr 2015
TL;DR: A hybrid algorithm, called ACPSO algorithm for optimal clustering process, used for the discovery centroids with the stimulation of ant colony system and the experimental results shows the proposed method's performance is good as compared with existing algorithm in most of evaluation metrics.
Abstract: K-means clustering groups the similar information using distance function. Even though it is a good algorithm for grouping, it may affect the clustering performance in terms of cluster initialization. This directed to new research track on emerging better algorithms with good initial centroids. This paper gives a hybrid algorithm, called ACPSO algorithm for optimal clustering process. ACO algorithm is used in this paper for the discovery centroids with the stimulation of ant colony system. Once initial centroids are produced by ACO algorithm, PSO algorithm is applied to find optimal cluster with the help of different fitness function, namely, XB index, Sym index, DB index, Connected DB index, Connected Dunn index and Mean Square Distance. Finally, experimentation is performed with iris data and performance is evaluated with five different evaluation metrics. The experimental results shows the proposed method's performance is good as compared with existing algorithm in most of evaluation metrics.

5 citations


Proceedings ArticleDOI
Reetika Roy1, J. Anuradha1
01 Nov 2015
TL;DR: The algorithm has been implemented with the Iris data set and its validity and effectiveness is tested with the help of commonly used internal evaluation measures for clustering like Davies Boudlin Index and Dunn Index.
Abstract: The preeminent intention of the proposed study is exploring the performance of the Brainstorm Optimization algorithm in Hard c-means clustering of data. The rationale behind this analysis is to generate a random solution set of centroids and then modify the centroids so as to refine the clusters. As we are using Brainstorm Optimization which is a form of evolutionary algorithm this refinement of centroid happens through competition and cooperation with existing centroid values. This algorithm incorporates both exploitation and exploration of the search space to generate the new centroids. The algorithm has been implemented with the Iris data set and its validity and effectiveness is tested with the help of commonly used internal evaluation measures for clustering like Davies Boudlin Index and Dunn Index.

5 citations


Journal Article
TL;DR: Large data sets of the accidents of a manufacturing and industrial unit have been studied by applying clustering methods and association rules as data mining methods, finding optimum number of clusters has been determined.
Abstract: Uncertain and stochastic states have been always taken into consideration in the fields of risk management and accident, like other fields of industrial engineering, and have made decision making difficult and complicated for managers in corrective action selection and control measure approach. In this research, huge data sets of the accidents of a manufacturing and industrial unit have been studied by applying clustering methods and association rules as data mining methods. First, the accident data was briefly studied. Then, effective features in an accident were selected while consulting with industry experts and considering production process information. By performing clustering method, data was divided into separate clusters and by using Dunn Index as validator of clustering, optimum number of clusters has been determined. In the next stage, by using the Apriori Algorithm as one of association rule methods, the relations between these fields were identified and the association rules among them were extracted and analyzed. Since managers need precise information for decision making, data mining methods, when to be used properly, may act as a supporting system.

4 citations


01 May 2015
TL;DR: Om· t·esults show that ensemble based clustering is indeed a good altet·native fm· clustet· analysis with the premise of an improved performance ovet· traditional clustering algorithms.
Abstract: Ensemble learning is a recent and extended approach to the unsupervised data mining technique called clustering which is used from finding natunl gmupings that exist in a dataset Hetre, we applied an ensemble based clustering algol'ithm called Random Fot·ests with Pat·tition amund Medoids (PAM) to multiple time sel'ies gene expt·ession data of Plasmodium falcipat·um The Random Fot·est algol'ithm is most common ensemble leat·ning appmach that uses decision tt·ees Random Fm·est consists of lat·ge numbet· of classification tt·ees (nnging fmm hundt·eds to thousands) built from rabootstnp sampling of the dataset We also applied the following intemal clustet· validity measures; Silhouette Width index, Connectivity Index and the Dunn Index to select the optimal numbet· of final clustet·s Om· t·esults show that ensemble based clustering is indeed a good altet·native fm· clustet· analysis with the premise of an improved performance ovet· traditional clustering algorithms

3 citations


Journal ArticleDOI
TL;DR: FSC-SOM can improve the cluster center of FSC with SOM in order to obtain the better quality of clustering results, and the clustering result of F SCOM is better than or equal to the clusters result ofFSC that proven by the value of external and internal validity measurement.
Abstract: Recently, clustering algorithms combined conventional methods and artificial intelligence. FSC-SOM is designed to handle the problem of SOM, such as defining the number of clusters and initial value of neuron weights. FSC find the number of clusters and the cluster centers which become the parameter of SOM. FSC-SOM is expected to improve the quality of FSC since the determination of the cluster centers are processed twice i.e. searching for data with high density at FSC then updating the cluster centers at SOM. FSC-SOM was tested using 10 datasets that is measured with F-Measure, entropy, Silhouette Index, and Dunn Index. The result showed that FSC-SOM can improve the cluster center of FSC with SOM in order to obtain the better quality of clustering results. The clustering result of FSC-SOM is better than or equal to the clustering result of FSC that proven by the value of external and internal validity measurement.

3 citations


15 Dec 2015
TL;DR: According to the results of this study, the most important identified factors by the use of clustering are Hemoglobin, age, sex, smoking, alcohol and Creatinine.
Abstract: Introduction: According to the World Health Organization, TB is the largest cause of death among infectious diseases. Due to the high percentage of tuberculosis infection and the high number of death among these patients, this study was carried out to categorized and find the relationship between different clinical and demographical characteristics. Method: This descriptive analytical study was done on 600 patients from Masih Daneshvari hospital tuberculosis research center. K-means clustering, Apriori association rules, and data mining algorithms (SPSS Clementine software) were used for clustering and determining the common characteristics among patients. Results: Based on DUNN index, 3 clusters were chosen as optimal cluster. The common factors between clusters have been described in details in findings section. According to the characteristics of each cluster, patients can be classified based on the effectiveness of various factors Conclusion: According to the results of this study, the most important identified factors by the use of clustering are Hemoglobin, age, sex, smoking, alcohol and Creatinine. Based on the association rules the highest rate of relationship is found between cough, weight loss, and ESR.

3 citations


DOI
01 Sep 2015
TL;DR: In this article, a fuzzy c-means clustering technique is explored to investigate the track of tropical cyclones over the North Indian Ocean (NIO) for the period (1976-2014).
Abstract: A fuzzy, c-means (FCM) clustering technique is explored to investigate the track of tropical cyclones over the North Indian Ocean (NIO) for the period (1976-2014). A total of five clusters is objectively identified based on partition index, partition coefficient, Dunn Index and separation index. The results obtained during analysis emphasized that each cluster has the unique features in terms of their genesis location, landfall, travel duration, trajectory, seasonality, accumulated cyclone energy and Intensity. Analysis of large scale environmental parameters, constructed preceding day of genesis show some of these parameters to be potential precursors to TC formation for almost all the clusters, most prominently, mid-tropospheric humidity, zonal wind, vorticity and outgoing long wave radiation of the main developing regions. The individual clusters have the several distinct features in their seasonal cycles. The cluster C5 shows distinct bimodal distributions where as other clusters are formed throughout the year. ENSO influenced the cyclone frequency in two of the five clusters. The MJO is found to play an important role in the genesis of the cyclone. The post monsoon season cyclone frequency is more in MJO phase 2, 3 and 4. The technique (FCM) can be used as a guideline in terms of the probable affected zone of TC Tracks by the operational forecasters.

2 citations


Book ChapterDOI
01 Jan 2015
TL;DR: High usability of algorithm and encouraging results suggests that swarm clustering (PSO based clustering) with Davies-Bouldin index as fitness functions with respect to Dunn index can be a practical tool for analyzing gene expression patterns.
Abstract: Clustering problem is being studied by many of the researchers using swarm intelligence. However, the search space is not carried out entirely randomly; a proper fitness function is required to determine the next step in the search space. This paper studies Particle Swarm Optimization (PSO) based clustering with two different fitness functions namely Xie-Beni and Davies-Bouldin indices for brain tumor gene expression dataset. Clustering results are validated using Mean Absolute Error (MAE) and Dunn Index (DI). To analyze function of genes, genes that have similar expression patterns should be grouped and the datasets should be presented to the physicians in a meaningful way. High usability of algorithm and the encouraging results suggests that swarm clustering (PSO based clustering) with Davies-Bouldin index as fitness functions with respect to Dunn index can be a practical tool for analyzing gene expression patterns.

Journal ArticleDOI
TL;DR: Cluster analysis is widely used in cancer research to discover molecular subgroups that inform subsequent laboratory investigations and define risk classification criteria for subsequent clinical trials and frequently a specific CCAM is chosen without quantifying the validity of its results.
Abstract: Background Cluster analysis is widely used in cancer research to discover molecular subgroups that inform subsequent laboratory investigations and define risk classification criteria for subsequent clinical trials. However, for any data set, there are a very large number of candidate cluster analysis methods (CCAMs) due to the many choices for feature selection criteria, number of selected features, number of clusters to define, etc. Frequently, a specific CCAM is chosen without quantifying the validity of its results in terms of reproducibility or distinctiveness of the reported subgroups.