An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms
Reads0
Chats0
TLDR
In this article , the mammography dataset is used to classify breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms.Abstract:
Simple Summary The screening of breast cancer in its earlier stages can play a crucial role in minimizing mortality rate by enabling clinicians to administer timely treatments and preventing the cancer from reaching the critical stage. With this view, the objective of this research is to develop an efficient automated approach for analyzing and classifying mammograms into four classes. Primarily, artefacts present in the mammograms are eliminated and the mammograms are enhanced utilizing image-processing techniques. When applying seven data augmentation methods, the volume of the mammography dataset is enlarged. Afterward, the region of interest (ROI) is extracted from the mammograms employing a region-growing algorithm with a dynamic intensity threshold calculated for each mammogram. From each ROI, a total of 16 geometrical features are extracted. These features are investigated with eleven state-of-the-art machine learning (ML) algorithms and depending on test accuracies, three ensemble models are developed. Among the ensemble models, the highest test accuracy of 96.03% is gained by stacking Random Forest and XGB classifier (RF-XGB). Furthermore, the performance of RF-XGB is boosted by utilizing various feature selection methods resulting in 98.05% accuracy. Moreover, the performance consistency of the best model is evaluated with the K-fold cross-validation experiment. This proposed approach of classifying mammograms may assist specialists in the precise and effective diagnosis of breast cancer. Abstract Background: Breast cancer, behind skin cancer, is the second most frequent malignancy among women, initiated by an unregulated cell division in breast tissues. Although early mammogram screening and treatment result in decreased mortality, differentiating cancer cells from surrounding tissues are often fallible, resulting in fallacious diagnosis. Method: The mammography dataset is used to categorize breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms. After artefact removal and the preprocessing of the mammograms, the dataset is augmented with seven augmentation techniques. The region of interest (ROI) is extracted by employing several algorithms including a dynamic thresholding method. Sixteen geometrical features are extracted from the ROI while eleven ML algorithms are investigated with these features. Three ensemble models are generated from these ML models employing the stacking method where the first ensemble model is built by stacking ML models with an accuracy of over 90% and the accuracy thresholds for generating the rest of the ensemble models are >95% and >96. Five feature selection methods with fourteen configurations are applied to notch up the performance. Results: The Random Forest Importance algorithm, with a threshold of 0.045, produces 10 features that acquired the highest performance with 98.05% test accuracy by stacking Random Forest and XGB classifier, having a higher than >96% accuracy. Furthermore, with K-fold cross-validation, consistent performance is observed across all K values ranging from 3–30. Moreover, the proposed strategy combining image processing, feature extraction and ML has a proven high accuracy in classifying breast cancer.read more
Citations
More filters
Journal ArticleDOI
Automated Detection of Broncho-Arterial Pairs Using CT Scans Employing Different Approaches to Classify Lung Diseases
TL;DR: In this article , structural distortions of bronchi and arteries (BA) are considered in the classification of lung diseases, and four approaches to highlight these are introduced: (a) a Hessian-based approach, (b) a region-growing algorithm, (c) a clustering-based method, and (d) a color-coding-based algorithm.
Journal ArticleDOI
Using feature maps to unpack the CNN 'Black box' theory with two medical datasets of different modality
Sami Azam,Sidratul Montaha,Kayes Uddin Fahim,A. R. H. Rafid,Md. Saddam Hossain Mukta,Mirjam Jonkman +5 more
TL;DR: In this paper , the differences between the feature maps were evaluated using T-tests and ANOVA for seventeen geometrical features and six intensity-based features for skin cancer dermoscopy and CT scan.
References
More filters
Journal ArticleDOI
Use of the Hough transformation to detect lines and curves in pictures
Richard O. Duda,Peter E. Hart +1 more
TL;DR: It is pointed out that the use of angle-radius rather than slope-intercept parameters simplifies the computation further, and how the method can be used for more general curve fitting.
Journal ArticleDOI
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
Davide Chicco,Giuseppe Jurman +1 more
TL;DR: This article shows how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario.
Journal ArticleDOI
Global estimates of cancer prevalence for 27 sites in the adult population in 2008
TL;DR: The latest estimates of global cancer incidence and survival were used to update previous figures of limited duration prevalence to the year 2008 and highlight the need for long‐term care targeted at managing patients with certain very frequently diagnosed cancer forms.
Journal ArticleDOI
Estimation of prediction error by using K-fold cross-validation
TL;DR: This paper investigates two families that connect the training error and K-fold cross-validation, which has a downward bias and has an upward bias.
Journal ArticleDOI
On the Canny edge detector
Lijun Ding,A. Ardeshir Goshtasby +1 more
TL;DR: It is shown that defining edges in this manner causes some obvious edges to be missed and how to revise the Canny edge detector to improve its detection accuracy is shown.