scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Medical data mining using BGA and RGA for weighting of features in fuzzy k-NN classification

12 Jul 2009-Vol. 5, pp 3070-3075
TL;DR: The k-nearest neighbor (k-NN) algorithm is commonly used in applications of classifiers and data mining and the related area due to its simplicity and effectiveness and all of features and optimal feature subsets with three features are investigated.
Abstract: The k-nearest neighbor (k-NN) algorithm is commonly used in applications of classifiers and data mining and the related area due to its simplicity and effectiveness. In this study, all of features and optimal feature subsets with three features are investigated. For classification, crisp k-NN, fuzzy k-NN, and weighting fuzzy k-NN classifiers are compared. For weighting of features, two types of coding including binary-coded genetic algorithms (BGA) and real-coded genetic algorithms (BGA) are evaluated. Experiments are conducted on the Wisconsin diagnosis breast cancer (WDBC) dataset and the Pima (PIMA) Indians diabetes dataset, and the classification accuracy, false negative, and computation time are reported in this paper.
Citations
More filters
Journal ArticleDOI
01 Jan 2013
TL;DR: Results indicate that the proposed ADM-RCGA is fast, accurate, and reliable, and outperforms all the other GAs considered in the present study.
Abstract: Adaptive directed mutation (ADM) operator, a novel, simple, and efficient real-coded genetic algorithm (RCGA) is proposed and then employed to solve complex function optimization problems. The suggested ADM operator enhances the abilities of GAs in searching global optima as well as in speeding convergence by integrating the local directional search strategy and the adaptive random search strategies. Using 41 benchmark global optimization test functions, the performance of the new algorithm is compared with five conventional mutation operators and then with six genetic algorithms (GAs) reported in literature. Results indicate that the proposed ADM-RCGA is fast, accurate, and reliable, and outperforms all the other GAs considered in the present study.

94 citations

Journal ArticleDOI
TL;DR: A novel Artificial Bee Colony (ABC) algorithm in which a mutation operator is added to an ArtificialBee Colony for improving its performance is proposed, in order to enhance the diversity of ABC, without compromising with the solution quality.

86 citations


Cites methods from "Medical data mining using BGA and R..."

  • ...6 [29] RGA-fuzzy-KNN [29] (5xCV) 82 [29] ML-NN [31] (10xCV) 79....

    [...]

  • ...Tand and Tseng [29] developed GA-based methods to estimate a weight vector of the feature vector applied in the fuzzy k-NN estimation....

    [...]

Journal ArticleDOI
31 Mar 2016
TL;DR: Various Data Mining techniques such as classification, clustering, association, and also related work to analyse and predict human disease are highlighted.
Abstract: Health care industry produces enormous quantity of data that clutches complex information relating to patients and their medical conditions. Data mining is gaining popularity in different research arenas due to its infinite applications and methodologies to mine the information in correct manner. Data mining techniques have the capabilities to discover hidden patterns or relationships among the objects in the medical data. In last decade, there has been increase in usage of data mining techniques on medical data for determining useful trends or patterns that are used in analysis and decision making. Data mining has an infinite potential to utilize healthcare data more efficiently and effectually to predict different kind of disease. This paper features various Data Mining techniques such as classification, clustering, association and also highlights related work to analyse and predict human disease.

57 citations

Proceedings ArticleDOI
01 Feb 2017
TL;DR: This research work comprehensively compared different data classification techniques and their prediction accuracy for chronic kidney disease dataset using performance measures like ROC, kappa statistics, RMSE and MAE using WEKA tool.
Abstract: In recent years, the advent of latest web and data technologies has encouraged massive data growth in almost every sector. Businesses and leading industries are viewing these huge data repositories as a tool to design future strategies, prediction models by analyzing patterns and gaining knowledge from this unstructured data by applying different data mining techniques. Medical domain has now become richer in term of maintaining digital records of patients related to their diagnosis and treatment. These huge data repositories can range from patient personnel data, diagnosis, treatment histories, test diagnosis, images and various scans. This terabytes of medical data is quantity rich but weaker in information in terms of knowledge and robust tools to identify hidden patterns of knowledge specifically in medical sector. Data Mining as a field of research has already well proven capabilities of identifying hidden patterns, analysis and knowledge applied on different research domains, now gaining popularity day by day among researchers and scientist towards generating novel and deep insights of these large biomedical datasets also. Uncovering new biomedical and healthcare related knowledge in order to support clinical decision making, is another dimension of data mining. Through massive literature survey, it is found that early disease prediction is the most demanded area of research in health care sector. As health care domain is bit wider domain and having different disease characteristics, different techniques have their own prediction efficiencies, which can be enhanced and changed in order to get into most optimize way. In this research work, authors have comprehensively compared different data classification techniques and their prediction accuracy for chronic kidney disease. Authors have compared J48, Naive Bayes, Random Forest, SVM and k-NN classifiers using performance measures like ROC, kappa statistics, RMSE and MAE using WEKA tool. Authors have also compared these classifiers on various accuracy measures like TP rate, FP rate, precision, recall and f-measure by implementing on WEKA. Experimental result shows that random forest classifier has better classification accuracy over others for chronic kidney disease dataset.

40 citations

01 Jan 2013
TL;DR: The main focus of this paper is to analyze data mining techniques required for medical data mining especially to discover locally frequent diseases such as heart ailments, lung cancer, breast cancer and so on.
Abstract: In the last decade there has been increasing usage of data mining techniques on medical data for discovering useful trends or patterns that are used in diagnosis and decision making. Data mining techniques such as clustering, classification, regression, association rule mining, CART (Classification and Regression Tree) are widely used in healthcare domain. Data mining algorithms, when appropriately used, are capable of improving the quality of prediction, diagnosis and disease classification. The main focus of this paper is to analyze data mining techniques required for medical data mining especially to discover locally frequent diseases such as heart ailments, lung cancer, breast cancer and so on. We evaluate the data mining techniques for finding locally frequent patterns in terms of cost, performance, speed and accuracy. We also compare data mining techniques with conventional methods.

38 citations

References
More filters
Journal ArticleDOI
01 Jul 1985
TL;DR: The theory of fuzzy sets is introduced into the K-nearest neighbor technique to develop a fuzzy version of the algorithm, and three methods of assigning fuzzy memberships to the labeled samples are proposed.
Abstract: Classification of objects is an important area of research and application in a variety of fields. In the presence of full knowledge of the underlying probabilities, Bayes decision theory gives optimal error rates. In those cases where this information is not present, many algorithms make use of distance or similarity among samples as a means of classification. The K-nearest neighbor decision rule has often been used in these pattern recognition problems. One of the difficulties that arises when utilizing this technique is that each of the labeled samples is given equal importance in deciding the class memberships of the pattern to be classified, regardless of their `typicalness'. The theory of fuzzy sets is introduced into the K-nearest neighbor technique to develop a fuzzy version of the algorithm. Three methods of assigning fuzzy memberships to the labeled samples are proposed, and experimental results and comparisons to the crisp version are presented.

2,323 citations


"Medical data mining using BGA and R..." refers methods in this paper

  • ...[5] as a generalization of the k-NN algorithm to allow the assignment of fractional membership, instead of zero or one like k-NN, to each class....

    [...]

Proceedings Article
01 Oct 1987
TL;DR: In this article, the authors developed and investigated the method of sharing functions to permit the formation of stable subpopulations of different strings within a GA, thereby permitting the parallel investigation of many peaks.
Abstract: Many practical search and optimization problems require the investigation of multiple local optima. In this paper, the method of sharing functions is developed and investigated to permit the formation of stable subpopulations of different strings within a genetic algorithm (CA), thereby permitting the parallel investigation of many peaks. The theory and implementation of the method are investigated and two, one-dimensional test functions are considered. On a test function with five peaks of equal height, a GA without sharing loses strings at all but one peak; a GA with sharing maintains roughly equally sized subpopulations clustered about all five peaks. On a test function with five peaks of different sizes, a GA without sharing loses strings at all but the highest peak; a GA with sharing allocates decreasing numbers of strings to peaks of decreasing value as predicted by theory.

2,154 citations


"Medical data mining using BGA and R..." refers background in this paper

  • ...For more information about the genetic algorithm schemes, see references [6-9]....

    [...]

  • ...[8, 9] There are two types coding to implement genetic operators, named binary-coded genetic algorithms (BGA) and real-coded genetic algorithms (RGA)....

    [...]

Book
29 Jan 1999
TL;DR: Improving the algorithm foundations advanced operators writing a genetic algorithm applications of genetic algorithms and showing the benefits of incorporating reinforcement learning into genetic algorithms.
Abstract: An introduction to genetic algorithms for scientists and engineers , An introduction to genetic algorithms for scientists and engineers , کتابخانه الکترونیک و دیجیتال - آذرسا

1,021 citations


"Medical data mining using BGA and R..." refers background in this paper

  • ...For more information about the genetic algorithm schemes, see references [6-9]....

    [...]

  • ...[8, 9] There are two types coding to implement genetic operators, named binary-coded genetic algorithms (BGA) and real-coded genetic algorithms (RGA)....

    [...]

Proceedings ArticleDOI
20 May 1996
TL;DR: Dynamic niche sharing is developed that is able to efficiently identify and search multiple niches (peaks) in a multimodal domain and perform better than two other methods for multiple optima identification, standard sharing and deterministic crowding.
Abstract: Genetic algorithms utilize populations of individual hypotheses that converge over time to a single optimum, even within a multimodal domain. This paper examines methods that enable genetic algorithms to identify multiple optima within multimodal domains by maintaining population members within the niches defined by the multiple optima. A new mechanism, dynamic niche sharing, is developed that is able to efficiently identify and search multiple niches (peaks) in a multimodal domain. Dynamic niche sharing is shown to perform better than two other methods for multiple optima identification, standard sharing and deterministic crowding.

400 citations

Journal ArticleDOI
TL;DR: Tests with practical forest inventory data show that the method performs noticeably better than other applications of k-NN estimation methods in forest inventories, and that the problem of biases in the species volume predictions can for example, almost completely be overcome with this new approach.

222 citations


"Medical data mining using BGA and R..." refers result in this paper

  • ...Despite its simplicity, it has many advantages, such as it may give competitive performance compared to many other methods [1-4]....

    [...]