scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Breast cancer diagnosis using GA feature selection and Rotation Forest

01 Apr 2017-Neural Computing and Applications (Springer London)-Vol. 28, Iss: 4, pp 753-763
TL;DR: Different data mining techniques for diagnosis of breast cancer are presented and it is shown that the Rotation Forest model with GA-based 14 features show the highest classification accuracy and when compared with the previous works, the proposed approach reveals the enhancement in performances.
Abstract: Breast cancer is one of the primary causes of death among the women worldwide, and the accurate diagnosis is one of the most significant steps in breast cancer treatment. Data mining techniques can support doctors in diagnosis decision-making process. In this paper, we present different data mining techniques for diagnosis of breast cancer. Two different Wisconsin Breast Cancer datasets have been used to evaluate the system proposed in this study. The proposed system has two stages. In the first stage, in order to eliminate insignificant features, genetic algorithms are used for extraction of informative and significant features. This process reduces the computational complexity and speed up the data mining process. In the second stage, several data mining techniques are employed to make a decision for two different categories of subjects with or without breast cancer. Different individual and multiple classifier systems were used in the second stage in order to construct accurate system for breast cancer classification. The performance of the methods is evaluated using classification accuracy, area under receiver operating characteristic curves and F-measure. Results obtained with the Rotation Forest model with GA-based 14 features show the highest classification accuracy (99.48 %), and when compared with the previous works, the proposed approach reveals the enhancement in performances. Results obtained in this study have potential to open new opportunities in diagnosis of breast cancer.
Citations
More filters
Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed feature selection method effectively reduces the dimensions of the dataset and achieves superior classification accuracy using the selected features.

353 citations

Journal ArticleDOI
TL;DR: Support vector machines models using Glucose, Resistin, Age and BMI as predictors allowed predicting the presence of breast cancer in women with sensitivity ranging between 82 and 88% and specificity ranging between 85 and 90%.
Abstract: The goal of this exploratory study was to develop and assess a prediction model which can potentially be used as a biomarker of breast cancer, based on anthropometric data and parameters which can be gathered in routine blood analysis. For each of the 166 participants several clinical features were observed or measured, including age, BMI, Glucose, Insulin, HOMA, Leptin, Adiponectin, Resistin and MCP-1. Machine learning algorithms (logistic regression, random forests, support vector machines) were implemented taking in as predictors different numbers of variables. The resulting models were assessed with a Monte Carlo Cross-Validation approach to determine 95% confidence intervals for the sensitivity, specificity and AUC of the models. Support vector machines models using Glucose, Resistin, Age and BMI as predictors allowed predicting the presence of breast cancer in women with sensitivity ranging between 82 and 88% and specificity ranging between 85 and 90%. The 95% confidence interval for the AUC was [0.87, 0.91]. These findings provide promising evidence that models combining age, BMI and metabolic parameters may be a powerful tool for a cheap and effective biomarker of breast cancer.

213 citations

Journal ArticleDOI
TL;DR: Classification performance results indicate that, it is possible to have an efficient sleep monitoring system with a single-channel EEG, and can be used effectively in medical and home-care applications.
Abstract: Sleep scoring is used as a diagnostic technique in the diagnosis and treatment of sleep disorders. Automated sleep scoring is crucial, since the large volume of data should be analyzed visually by the sleep specialists which is burdensome, time-consuming tedious, subjective, and error prone. Therefore, automated sleep stage classification is a crucial step in sleep research and sleep disorder diagnosis. In this paper, a robust system, consisting of three modules, is proposed for automated classification of sleep stages from the single-channel electroencephalogram (EEG). In the first module, signals taken from Pz-Oz electrode were denoised using multiscale principal component analysis. In the second module, the most informative features are extracted using discrete wavelet transform (DWT), and then, statistical values of DWT subbands are calculated. In the third module, extracted features were fed into an ensemble classifier, which can be called as rotational support vector machine (RotSVM). The proposed classifier combines advantages of the principal component analysis and SVM to improve classification performances of the traditional SVM. The sensitivity and accuracy values across all subjects were 84.46% and 91.1%, respectively, for the five-stage sleep classification with Cohen’s kappa coefficient of 0.88. Obtained classification performance results indicate that, it is possible to have an efficient sleep monitoring system with a single-channel EEG, and can be used effectively in medical and home-care applications.

196 citations


Cites methods from "Breast cancer diagnosis using GA fe..."

  • ...Motivation for this classifier was found in EML approach, referred to as a rotation forest, which was successfully applied in [42], and one can consider rotational SVM as type of rotation forest proposed by Rodriguez et al....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a mass detection method based on CNN deep features and unsupervised extreme learning machine (ELM) clustering and builds a feature set fusing deep features, morphological features, texture features, and density features.
Abstract: A computer-aided diagnosis (CAD) system based on mammograms enables early breast cancer detection, diagnosis, and treatment. However, the accuracy of the existing CAD systems remains unsatisfactory. This paper explores a breast CAD method based on feature fusion with convolutional neural network (CNN) deep features. First, we propose a mass detection method based on CNN deep features and unsupervised extreme learning machine (ELM) clustering. Second, we build a feature set fusing deep features, morphological features, texture features, and density features. Third, an ELM classifier is developed using the fused feature set to classify benign and malignant breast masses. Extensive experiments demonstrate the accuracy and efficiency of our proposed mass detection and breast cancer classification method.

194 citations


Cites background or methods from "Breast cancer diagnosis using GA fe..."

  • ...[24] E. Aličković and A. Subasi, ‘‘Breast cancer diagnosis using GA feature selection and rotation forest,’’ Neural Comput....

    [...]

  • ...We also show the results of the method mentioned in [24] using our datasets....

    [...]

  • ...Aličković and Subasi [24] proposed a breast CAD method, in which genetic algorithms are used for extraction of informative and significant features, and the rotation forest is used to make a decision for two different categories of subjects with or without breast cancer....

    [...]

  • ...To further evaluate our proposed method, we also select the state-of-art algorithm mentioned in [24] as the baseline, which is can be simplified as ‘‘GARF’’....

    [...]

  • ...To further evaluate our proposed method, we also select the state-of-art algorithm mentioned in [24] as the baseline,...

    [...]

References
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations


"Breast cancer diagnosis using GA fe..." refers background in this paper

  • ...The SVM tries to find the optimal hyper plane that maximizes the distance between the instances of two different classes [54]....

    [...]

Book
16 Jul 1998
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Abstract: From the Publisher: This book represents the most comprehensive treatment available of neural networks from an engineering perspective. Thorough, well-organized, and completely up to date, it examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks. Written in a concise and fluid manner, by a foremost engineering textbook author, to make the material more accessible, this book is ideal for professional engineers and graduate students entering this exciting field. Computer experiments, problems, worked examples, a bibliography, photographs, and illustrations reinforce key concepts.

29,130 citations


"Breast cancer diagnosis using GA fe..." refers background in this paper

  • ...well-known back-propagation algorithm [23]....

    [...]

  • ...This learning process is named as back-propagation learning [23]....

    [...]

Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations


"Breast cancer diagnosis using GA fe..." refers background or methods in this paper

  • ...5 algorithm uses equations established on information theory to estimate the ‘‘goodness’’ of the test; particularly, they select the test that extracts the highest amount of data from a set of samples, given the restriction that just single attribute is to be tested [40]....

    [...]

  • ...In reality, even though these are very crude, this approach frequently performs relatively fine [40]....

    [...]

Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations