Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation

doi:10.1007/978-981-15-1420-3_90

Home
/
Papers
/
Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation

Book Chapter•DOI•

Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation

Suja S. Panicker¹, Prakasam Gayathri¹•Institutions (1)

VIT University¹

01 Jan 2020-Vol. 601, pp 831-841

TL;DR: A novel approach of surveying the popular feature selection algorithms specifically used in medical data classification, by considering the following types of medical data—signals, images and numerical is presented.

read less

Abstract: Feature selection algorithms play a crucial role in any machine learning problem. Choice of the best algorithm yields optimal subset of features thereby increasing the accuracy and reducing the time required for training. In the case of high dimensional datasets it is also advantageous in removing the irrelevant features. This paper presents a novel approach of surveying the popular feature selection algorithms specifically used in medical data classification, by considering the following types of medical data—signals, images and numerical. This work shall be very useful to researchers in collecting first hand information since we have reviewed the various aspects such as—available medical datasets, feature selection techniques, choice of classifier, issues in identifying the feature selection technique, analysis of major feature selection methodologies and detailed mechanisms thereof. We have also performed sample experimentation on the standard medical datasets from UCI and analyzed the effects on time and performance by employing 12 popular classifiers. The results demonstrate improved accuracy and lowered computation times.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Feature Selection Pipeline based on Hybrid Optimization Approach with Aggregated Medical Data

[...]

Palwinder Kaur, Rajesh Kumar Singh

01 Jan 2022-International Journal of Advanced Computer Science and Applications

TL;DR: It is proposed in this paper that a diagnostic system that can detect diabetic retinopathy, glaucoma, and cataract can be built as an alternative to current methods and will help to improve healthcare workflows and practices.

...read moreread less

Abstract: For quite some time, the usage of many sources of data (data fusion) and the aggregation of that data have been underappreciated. For the purposes of this study, trials using several medical datasets were conducted, with the results serving as a single aggregated source for identifying eye illnesses. It is proposed in this paper that a diagnostic system that can detect diabetic retinopathy, glaucoma, and cataract can be built as an alternative to current methods. The data fusion and data aggregation techniques used to create this multi-model system made it conceivable. As the name implies, it is a way of compiling data from a large number of legitimate sources. The development of a pipeline of algorithms was accomplished through iterative trials and hyper parameter tweaking. CLAHE (Contrast Level Adaptive Histogram Equalization) approaches, which increase the gradient between picture edges, improve segmentation by raising the contrast between picture edges. The Gabor filter has been shown to be the most effective method of selecting features. The Gabor filter was selected using a hybrid optimization method (LION + Cuckoo), which was developed by the author. For automation, the Support Vector Machine (SVM) radial is the most effective method since it delivers excellent stability and accuracy in terms of accuracy and recall, as well as precision and recall. The discoveries and approaches detailed here provide a more solid foundation for future image-based diagnostics researchers to build on in the future. Eventually, the findings of this study will help to improve healthcare workflows and practices. Keywords—Content-based image retrieval system; CLAHE; Gabor filter; Cuckoo search; LION optimization; support vector machine

...read moreread less

3 citations

Book Chapter•DOI•

HealthCare Data Analytics: A Machine Learning-Based Perspective

[...]

Ксения Гончарова¹•Institutions (1)

VIT University¹

01 Jan 2022

TL;DR: In this paper , the design, development, functionalities, and upcoming trends in investigation of Big Data Analytics are discussed along with advantages in relation to infrastructural, organizational, operational, managerial, strategic areas, and articulation of latest trending areas.

...read moreread less

Abstract: Alarming surge in amounts of diverse data in various domains has contributed to ever-growing research in Big Data Analytics globally. Despite the enormous boom in effective application of Big Data Analytics, health care has not entirely clutched the possible benefits. This paper studies the design, development, functionalities, and upcoming trends in investigation of Big Data Analytics. In this paper, the five Big Data Analytics’ potentials are showcased along with advantages in relation to infrastructural, organizational, operational, managerial, strategic areas, and articulation of latest trending areas. Current paper will be greatly advantageous to fellow researchers not just with fundamental facets pertaining to Big Data Analytics in healthcare domain but also a summary of research gaps, latest trends, and developments, thereby opening new avenues for future research.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Feature selection: evaluation, application, and small sample performance

[...]

Anil K. Jain¹, D. Zongker¹•Institutions (1)

Michigan State University¹

01 Feb 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work studies the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models and shows that pooling features derived from different texture Models, followed by a feature selection results in a substantial improvement in the classification accuracy.

...read moreread less

Abstract: A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by Pudil et al. (1994), dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations.

...read moreread less

2,238 citations

Journal Article•DOI•

A Survey on Feature Selection

[...]

Jianyu Miao¹, Lingfeng Niu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2016-Procedia Computer Science

TL;DR: The experimental results show that unsupervised feature selection algorithms benefits machine learning tasks improving the performance of clustering.

...read moreread less

267 citations

Journal Article•DOI•

A hybrid feature selection algorithm for gene expression data classification

[...]

Huijuan Lu¹, Junying Chen¹, Ke Yan¹, Qun Jin², Yu Xue³, Zhigang Gao⁴ - Show less +2 more•Institutions (4)

China Jiliang University¹, Waseda University², Nanjing University of Information Science and Technology³, Hangzhou Dianzi University⁴

20 Sep 2017-Neurocomputing

TL;DR: Experimental results show that the proposing MIMAGA-Selection method significantly reduces the dimension of gene expression data and removes the redundancies for classification and the reduced gene expression dataset provides highest classification accuracy compared to conventional feature selection algorithms.

...read moreread less

253 citations

Journal Article•DOI•

High dimensional data classification and feature selection using support vector machines

[...]

Bissan Ghaddar¹, Joe Naoum-Sawaya¹•Institutions (1)

University of Western Ontario¹

16 Mar 2018-European Journal of Operational Research

TL;DR: A new approach based on iteratively adjusting a bound on the l1-norm of the classifier vector in order to force the number of selected features to converge towards the desired maximum limit is introduced.

...read moreread less

186 citations

Journal Article•DOI•

Binary black hole algorithm for feature selection and classification on biological data

[...]

Elnaz Pashaei¹, Nizamettin Aydin¹•Institutions (1)

Yıldız Technical University¹

01 Jul 2017

TL;DR: A binary version of Black Hole Algorithm called BBHA is proposed for solving feature selection problem in biological data and demonstrates that Random Forest is the best decision tree algorithm and the proposed BBHA wrapper based feature selection approach outperforms the performances of other algorithms.

...read moreread less

Abstract: Average solution quality of one filter and four wrapper approaches on 8 medical datasetsDisplay Omitted A binary version of the Black Hole Algorithm (BBHA) for solving discrete problems is proposed.Proposed algorithm was compared to 6 well known decision tree classifiers.Experimental results demonstrate that Random Forest is the best decision tree algorithmThe proposed BBHA wrapper based feature selection approach outperforms the performances of other algorithms.The proposed method also performed much faster, needs single parameter for configuring the model, and is simple to understand. Biological data often consist of redundant and irrelevant features. These features can lead to misleading in modeling the algorithms and overfitting problem. Without a feature selection method, it is difficult for the existing models to accurately capture the patterns on data. The aim of feature selection is to choose a small number of relevant or significant features to enhance the performance of the classification. Existing feature selection methods suffer from the problems such as becoming stuck in local optima and being computationally expensive. To solve these problems, an efficient global search technique is needed.Black Hole Algorithm (BHA) is an efficient and new global search technique, inspired by the behavior of black hole, which is being applied to solve several optimization problems. However, the potential of BHA for feature selection has not been investigated yet. This paper proposes a Binary version of Black Hole Algorithm called BBHA for solving feature selection problem in biological data. The BBHA is an extension of existing BHA through appropriate binarization. Moreover, the performances of six well-known decision tree classifiers (Random Forest (RF), Bagging, C5.0, C4.5, Boosted C5.0, and CART) are compared in this study to employ the best one as an evaluator of proposed algorithm.The performance of the proposed algorithm is tested upon eight publicly available biological datasets and is compared with Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Simulated Annealing (SA), and Correlation based Feature Selection (CFS) in terms of accuracy, sensitivity, specificity, Matthews Correlation Coefficient (MCC), and Area Under the receiver operating characteristic (ROC) Curve (AUC). In order to verify the applicability and generality of the BBHA, it was integrated with Naive Bayes (NB) classifier and applied on further datasets on the text and image domains.The experimental results confirm that the performance of RF is better than the other decision tree algorithms and the proposed BBHA wrapper based feature selection method is superior to BPSO, GA, SA, and CFS in terms of all criteria. BBHA gives significantly better performance than the BPSO and GA in terms of CPU Time, the number of parameters for configuring the model, and the number of chosen optimized features. Also, BBHA has competitive or better performance than the other methods in the literature.

...read moreread less

135 citations