scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis

TL;DR: This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process.
Abstract: Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes an optimized machine learning-based model that extracts optimal texture features from TB-related images and selects the hyper-parameters of the classifiers and highlights the efficiency of modified SVM classifier compared with other standard ones.
Abstract: Computer science plays an important role in modern dynamic health systems. Given the collaborative nature of the diagnostic process, computer technology provides important services to healthcare professionals and organizations, as well as to patients and their families, researchers, and decision-makers. Thus, any innovations that improve the diagnostic process while maintaining quality and safety are crucial to the development of the healthcare field. Many diseases can be tentatively diagnosed during their initial stages. In this study, all developed techniques were applied to tuberculosis (TB). Thus, we propose an optimized machine learning-based model that extracts optimal texture features from TB-related images and selects the hyper-parameters of the classifiers. Increasing the accuracy rate and minimizing the number of characteristics extracted are our goals. In other words, this is a multitask optimization issue. A genetic algorithm (GA) is used to choose the best features, which are then fed into a support vector machine (SVM) classifier. Using the ImageCLEF 2020 data set, we conducted experiments using the proposed approach and achieved significantly higher accuracy and better outcomes in comparison with the state-of-the-art works. The obtained experimental results highlight the efficiency of modified SVM classifier compared with other standard ones.

20 citations

Posted Content
16 Jan 2020
TL;DR: A data preprocessing ensemble for imbalanced Big Data classification is presented, with focus on two-class problems, and it is proved that the ensemble classifier outperforms classic machine learning models with an added data balancing method, such as Random Forests.
Abstract: Big Data scenarios pose a new challenge to traditional data mining algorithms, since they are not prepared to work with such amount of data. Smart Data refers to data of enough quality to improve the outcome from a data mining algorithm. Existing data mining algorithms unability to handle Big Datasets prevents the transition from Big to Smart Data. Automation in data acquisition that characterizes Big Data also brings some problems, such as differences in data size per class. This will lead classifiers to lean towards the most represented classes. This problem is known as imbalanced data distribution, where one class is underrepresented in the dataset. Ensembles of classifiers are machine learning methods that improve the performance of a single base classifier by the combination of several of them. Ensembles are not exempt from the imbalanced classification problem. To deal with this issue, the ensemble method have to be designed specifically. In this paper, a data preprocessing ensemble for imbalanced Big Data classification is presented, with focus on two-class problems. Experiments carried out in 21 Big Datasets have proved that our ensemble classifier outperforms classic machine learning models with an added data balancing method, such as Random Forests.

7 citations

Journal ArticleDOI
TL;DR: A comprehensive review of state-of-the-art machine and deep learning-based systems for detecting airway disorders is presented in this article , where the authors analyse the difficulties and potential future paths.
Abstract: Airway disease is a major healthcare issue that causes at least 3 million fatalities every year. It is also considered one of the foremost causes of death all around the globe by 2030. Numerous studies have been undertaken to demonstrate the latest advances in artificial intelligence algorithms to assist in identifying and classifying these diseases. This comprehensive review aims to summarise the state-of-the-art machine and deep learning-based systems for detecting airway disorders, envisage the trends of the recent work in this domain, and analyze the difficulties and potential future paths. This systematic literature review includes the study of one hundred fifty-five articles on airway diseases such as cystic fibrosis, emphysema, lung cancer, Mesothelioma, covid-19, pneumoconiosis, asthma, pulmonary edema, tuberculosis, pulmonary embolism as well as highlights the automated learning techniques to predict them. The study concludes with a discussion and challenges about expanding the efficiency and machine and deep learning-assisted airway disease detection applications.

7 citations

Journal ArticleDOI
22 Jun 2022-Cancers
TL;DR: A research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders is proposed, which covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.
Abstract: Simple Summary Many patient clinical characteristics, such as diagnosis dates, biomarker status, and therapies received, are only available as unstructured text in electronic health records. Obtaining this information for research purposes is a difficult and costly process, requiring trained clinical experts to manually review patient documents. Machine Learning techniques offer a promising solution for efficiently extracting clinically relevant information from unstructured text found in patient documents. However, the use of data produced with machine learning techniques for research purposes introduces unique challenges in assessing validity and generalizability to different cohorts of interest. To enable the effective and accurate use of such data for research purposes, we developed an evaluation framework to be utilized by model developers, data users, and other stakeholders. This framework can serve as a baseline to contextualize the quality, strengths, and limitations of using data produced with machine learning techniques for research purposes. Abstract A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.

6 citations

Journal ArticleDOI
TL;DR: The aim is to develop an efficient tuberculosis detection system based on stochastic learning with artificial neural network (ANN) model by random variations using Chest X-ray images, and it was discovered that the proposed method attained better accuracy when compared to state-of-the-art methods.
Abstract: Tuberculosis (TB) is still one of the most serious health issues today with a high fatality rate. While attempts are being made to make primary diagnosis more reliable and accessible in places with high tuberculosis rates, Chest X-rays has become a popular source. However, specialist radiologists are required for the screening process, which could be a challenge in developing countries. For early diagnosis of tuberculosis utilizing CXR images, a complete automatic system of tuberculosis detection can decrease the need for trained staff. Various deep learning and machine learning technologies have been introduced in recent years for examining digital chest radiographs for TB-related variances with the goal of reducing inter-class reader variability and reproducibility, as well as providing radiologic services in areas where radiologists are not available. Tuberculosis is sometimes misclassified as other conditions with similar radiographic patterns as a result of CXR images, resulting in inefficient therapy. The current approach, however, is limited to Computer-Aided Detection (CAD), which has only been evaluated with non-deep learning models. Deep neural networks open potentially new avenues for tuberculosis treatment. There are no peer-reviewed studies comparing the effectiveness of various deep learning systems in detecting TB anomalies, and none compare multiple deep learning systems with human readers. In this paper, the aim of the proposed method is to develop an efficient tuberculosis detection system based on stochastic learning with artificial neural network (ANN) model by random variations using Chest X-ray images. This approach can able to incorporate random functions into the network, either by assigning stochastic transfer functions to the network or by assigning stochastic weights to the network. This proposed method is to learn features from CXR images and optimize the parameters of an ANN model by randomly mixing the training dataset before each iteration, resulting in varied ordering of model parameter updates. Furthermore, in a neural network, model weights are frequently initialized at a random beginning point. By focusing on randomness functions with optimization, the proposed technique achieved great accuracy. The motivation of the proposed method is to detect abnormalities in CXR with the different levels of complexity of TB by strong or weak evidence with different deep geometric contexts such as shape, size, cavitation, and density. ANN’s primary benefit is extracting hidden linear and non-linear inter-relationships of high-dimensional and complex data. The proposed method was systematically tested with the Shenzhen and Montgomery datasets using metrics such as sensitivity, specificity, and accuracy, and it was discovered that the proposed method attained better accuracy when compared to state-of-the-art methods. The proposed method shows an improved efficiency with sensitivity of 96.12%, specificity of 98.01%, accuracy 98.45% and F-Score 95.88% respectively.

3 citations

References
More filters
Book ChapterDOI
07 Jul 1992
TL;DR: Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Abstract: In real-world concept learning problems, the representation of data often uses many features, only a few of which may be related to the target concept. In this situation, feature selection is important both to speed up learning and to improve concept quality. A new feature selection algorithm Relief uses a statistical method and avoids heuristic search. Relief requires linear time in the number of given features and the number of training instances regardless of the target concept to be learned. Although the algorithm does not necessarily find the smallest subset of features, the size tends to be small because only statistically relevant features are selected. This paper focuses on empirical test results in two artificial domains; the LED Display domain and the Parity domain with and without noise. Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.

2,908 citations

Book
06 Jun 2012
TL;DR: An up-to-date, self-contained introduction to a state-of-the-art machine learning approach, Ensemble Methods: Foundations and Algorithms shows how these accurate methods are used in real-world tasks and gives the necessary groundwork to carry out further research in this evolving field.
Abstract: An up-to-date, self-contained introduction to a state-of-the-art machine learning approach, Ensemble Methods: Foundations and Algorithms shows how these accurate methods are used in real-world tasks. It gives you the necessary groundwork to carry out further research in this evolving field. After presenting background and terminology, the book covers the main algorithms and theories, including Boosting, Bagging, Random Forest, averaging and voting schemes, the Stacking method, mixture of experts, and diversity measures. It also discusses multiclass extension, noise tolerance, error-ambiguity and bias-variance decompositions, and recent progress in information theoretic diversity. Moving on to more advanced topics, the author explains how to achieve better performance through ensemble pruning and how to generate better clustering results by combining multiple clusterings. In addition, he describes developments of ensemble methods in semi-supervised learning, active learning, cost-sensitive learning, class-imbalance learning, and comprehensibility enhancement.

1,834 citations

Journal ArticleDOI
TL;DR: This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling.
Abstract: Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods. A theoretical information is complemented with many descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. A set of practical examples of gradient boosting applications are presented and comprehensively analyzed.

1,463 citations

Journal ArticleDOI
TL;DR: Deep learning with DCNNs can accurately classify TB at chest radiography with an AUC of 0.99 and an independent board-certified cardiothoracic radiologist blindly interpreted the images to evaluate a potential radiologist-augmented workflow.
Abstract: Purpose To evaluate the efficacy of deep convolutional neural networks (DCNNs) for detecting tuberculosis (TB) on chest radiographs. Materials and Methods Four deidentified HIPAA-compliant datasets were used in this study that were exempted from review by the institutional review board, which consisted of 1007 posteroanterior chest radiographs. The datasets were split into training (68.0%), validation (17.1%), and test (14.9%). Two different DCNNs, AlexNet and GoogLeNet, were used to classify the images as having manifestations of pulmonary TB or as healthy. Both untrained and pretrained networks on ImageNet were used, and augmentation with multiple preprocessing techniques. Ensembles were performed on the best-performing algorithms. For cases where the classifiers were in disagreement, an independent board-certified cardiothoracic radiologist blindly interpreted the images to evaluate a potential radiologist-augmented workflow. Receiver operating characteristic curves and areas under the curve (AUCs) were used to assess model performance by using the DeLong method for statistical comparison of receiver operating characteristic curves. Results The best-performing classifier had an AUC of 0.99, which was an ensemble of the AlexNet and GoogLeNet DCNNs. The AUCs of the pretrained models were greater than that of the untrained models (P < .001). Augmenting the dataset further increased accuracy (P values for AlexNet and GoogLeNet were .03 and .02, respectively). The DCNNs had disagreement in 13 of the 150 test cases, which were blindly reviewed by a cardiothoracic radiologist, who correctly interpreted all 13 cases (100%). This radiologist-augmented approach resulted in a sensitivity of 97.3% and specificity 100%. Conclusion Deep learning with DCNNs can accurately classify TB at chest radiography with an AUC of 0.99. A radiologist-augmented approach for cases where there was disagreement among the classifiers further improved accuracy. © RSNA, 2017.

1,253 citations