scispace - formally typeset
Search or ask a question
Book ChapterDOI

Feature Selection Algorithms in Medical Data Classification: A Brief Survey and Experimentation

01 Jan 2020-Vol. 601, pp 831-841
TL;DR: A novel approach of surveying the popular feature selection algorithms specifically used in medical data classification, by considering the following types of medical data—signals, images and numerical is presented.
Abstract: Feature selection algorithms play a crucial role in any machine learning problem. Choice of the best algorithm yields optimal subset of features thereby increasing the accuracy and reducing the time required for training. In the case of high dimensional datasets it is also advantageous in removing the irrelevant features. This paper presents a novel approach of surveying the popular feature selection algorithms specifically used in medical data classification, by considering the following types of medical data—signals, images and numerical. This work shall be very useful to researchers in collecting first hand information since we have reviewed the various aspects such as—available medical datasets, feature selection techniques, choice of classifier, issues in identifying the feature selection technique, analysis of major feature selection methodologies and detailed mechanisms thereof. We have also performed sample experimentation on the standard medical datasets from UCI and analyzed the effects on time and performance by employing 12 popular classifiers. The results demonstrate improved accuracy and lowered computation times.
Citations
More filters
Journal ArticleDOI
TL;DR: It is proposed in this paper that a diagnostic system that can detect diabetic retinopathy, glaucoma, and cataract can be built as an alternative to current methods and will help to improve healthcare workflows and practices.
Abstract: For quite some time, the usage of many sources of data (data fusion) and the aggregation of that data have been underappreciated. For the purposes of this study, trials using several medical datasets were conducted, with the results serving as a single aggregated source for identifying eye illnesses. It is proposed in this paper that a diagnostic system that can detect diabetic retinopathy, glaucoma, and cataract can be built as an alternative to current methods. The data fusion and data aggregation techniques used to create this multi-model system made it conceivable. As the name implies, it is a way of compiling data from a large number of legitimate sources. The development of a pipeline of algorithms was accomplished through iterative trials and hyper parameter tweaking. CLAHE (Contrast Level Adaptive Histogram Equalization) approaches, which increase the gradient between picture edges, improve segmentation by raising the contrast between picture edges. The Gabor filter has been shown to be the most effective method of selecting features. The Gabor filter was selected using a hybrid optimization method (LION + Cuckoo), which was developed by the author. For automation, the Support Vector Machine (SVM) radial is the most effective method since it delivers excellent stability and accuracy in terms of accuracy and recall, as well as precision and recall. The discoveries and approaches detailed here provide a more solid foundation for future image-based diagnostics researchers to build on in the future. Eventually, the findings of this study will help to improve healthcare workflows and practices. Keywords—Content-based image retrieval system; CLAHE; Gabor filter; Cuckoo search; LION optimization; support vector machine

3 citations

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the design, development, functionalities, and upcoming trends in investigation of Big Data Analytics are discussed along with advantages in relation to infrastructural, organizational, operational, managerial, strategic areas, and articulation of latest trending areas.
Abstract: Alarming surge in amounts of diverse data in various domains has contributed to ever-growing research in Big Data Analytics globally. Despite the enormous boom in effective application of Big Data Analytics, health care has not entirely clutched the possible benefits. This paper studies the design, development, functionalities, and upcoming trends in investigation of Big Data Analytics. In this paper, the five Big Data Analytics’ potentials are showcased along with advantages in relation to infrastructural, organizational, operational, managerial, strategic areas, and articulation of latest trending areas. Current paper will be greatly advantageous to fellow researchers not just with fundamental facets pertaining to Big Data Analytics in healthcare domain but also a summary of research gaps, latest trends, and developments, thereby opening new avenues for future research.
References
More filters
Journal ArticleDOI
TL;DR: This work studies the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models and shows that pooling features derived from different texture Models, followed by a feature selection results in a substantial improvement in the classification accuracy.
Abstract: A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by Pudil et al. (1994), dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations.

2,238 citations

Journal ArticleDOI
TL;DR: The experimental results show that unsupervised feature selection algorithms benefits machine learning tasks improving the performance of clustering.

267 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposing MIMAGA-Selection method significantly reduces the dimension of gene expression data and removes the redundancies for classification and the reduced gene expression dataset provides highest classification accuracy compared to conventional feature selection algorithms.

253 citations

Journal ArticleDOI
TL;DR: A new approach based on iteratively adjusting a bound on the l1-norm of the classifier vector in order to force the number of selected features to converge towards the desired maximum limit is introduced.

186 citations

Journal ArticleDOI
01 Jul 2017
TL;DR: A binary version of Black Hole Algorithm called BBHA is proposed for solving feature selection problem in biological data and demonstrates that Random Forest is the best decision tree algorithm and the proposed BBHA wrapper based feature selection approach outperforms the performances of other algorithms.
Abstract: Average solution quality of one filter and four wrapper approaches on 8 medical datasetsDisplay Omitted A binary version of the Black Hole Algorithm (BBHA) for solving discrete problems is proposed.Proposed algorithm was compared to 6 well known decision tree classifiers.Experimental results demonstrate that Random Forest is the best decision tree algorithmThe proposed BBHA wrapper based feature selection approach outperforms the performances of other algorithms.The proposed method also performed much faster, needs single parameter for configuring the model, and is simple to understand. Biological data often consist of redundant and irrelevant features. These features can lead to misleading in modeling the algorithms and overfitting problem. Without a feature selection method, it is difficult for the existing models to accurately capture the patterns on data. The aim of feature selection is to choose a small number of relevant or significant features to enhance the performance of the classification. Existing feature selection methods suffer from the problems such as becoming stuck in local optima and being computationally expensive. To solve these problems, an efficient global search technique is needed.Black Hole Algorithm (BHA) is an efficient and new global search technique, inspired by the behavior of black hole, which is being applied to solve several optimization problems. However, the potential of BHA for feature selection has not been investigated yet. This paper proposes a Binary version of Black Hole Algorithm called BBHA for solving feature selection problem in biological data. The BBHA is an extension of existing BHA through appropriate binarization. Moreover, the performances of six well-known decision tree classifiers (Random Forest (RF), Bagging, C5.0, C4.5, Boosted C5.0, and CART) are compared in this study to employ the best one as an evaluator of proposed algorithm.The performance of the proposed algorithm is tested upon eight publicly available biological datasets and is compared with Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Simulated Annealing (SA), and Correlation based Feature Selection (CFS) in terms of accuracy, sensitivity, specificity, Matthews Correlation Coefficient (MCC), and Area Under the receiver operating characteristic (ROC) Curve (AUC). In order to verify the applicability and generality of the BBHA, it was integrated with Naive Bayes (NB) classifier and applied on further datasets on the text and image domains.The experimental results confirm that the performance of RF is better than the other decision tree algorithms and the proposed BBHA wrapper based feature selection method is superior to BPSO, GA, SA, and CFS in terms of all criteria. BBHA gives significantly better performance than the BPSO and GA in terms of CPU Time, the number of parameters for configuring the model, and the number of chosen optimized features. Also, BBHA has competitive or better performance than the other methods in the literature.

135 citations