scispace - formally typeset
Search or ask a question

Showing papers on "AdaBoost published in 2022"


Journal ArticleDOI
01 Jul 2022-Sensors
TL;DR: The Random Forest Ensemble Method had the best accuracy (97%), whereas the AdaBoost and Bagging algorithms had lower accuracy, precision, recall, and F1-scores.
Abstract: Diabetes is a long-lasting disease triggered by expanded sugar levels in human blood and can affect various organs if left untreated. It contributes to heart disease, kidney issues, damaged nerves, damaged blood vessels, and blindness. Timely disease prediction can save precious lives and enable healthcare advisors to take care of the conditions. Most diabetic patients know little about the risk factors they face before diagnosis. Nowadays, hospitals deploy basic information systems, which generate vast amounts of data that cannot be converted into proper/useful information and cannot be used to support decision making for clinical purposes. There are different automated techniques available for the earlier prediction of disease. Ensemble learning is a data analysis technique that combines multiple techniques into a single optimal predictive system to evaluate bias and variation, and to improve predictions. Diabetes data, which included 17 variables, were gathered from the UCI repository of various datasets. The predictive models used in this study include AdaBoost, Bagging, and Random Forest, to compare the precision, recall, classification accuracy, and F1-score. Finally, the Random Forest Ensemble Method had the best accuracy (97%), whereas the AdaBoost and Bagging algorithms had lower accuracy, precision, recall, and F1-scores.

55 citations



Journal ArticleDOI
TL;DR: In this article , different ML procedures were assessed based on the mean squared error (MSE) and determination coefficient (R2) to select the most robust models for modeling the process.

41 citations


Journal ArticleDOI
TL;DR: In this paper, different ML procedures were assessed based on the mean squared error (MSE) and determination coefficient (R2) to select the most robust models for modeling the process.

41 citations


Journal ArticleDOI
TL;DR: In this paper , a Gini Impurity-based Weighted Random Forest (GIWRF) was used as the embedded feature selection technique for intrusion detection system (IDS) in order to protect the network, resources and sensitive data.
Abstract: Abstract To protect the network, resources, and sensitive data, the intrusion detection system (IDS) has become a fundamental component of organizations that prevents cybercriminal activities. Several approaches have been introduced and implemented to thwart malicious activities so far. Due to the effectiveness of machine learning (ML) methods, the proposed approach applied several ML models for the intrusion detection system. In order to evaluate the performance of models, UNSW-NB 15 and Network TON_IoT datasets were used for offline analysis. Both datasets are comparatively newer than the NSL-KDD dataset to represent modern-day attacks. However, the performance analysis was carried out by training and testing the Decision Tree (DT), Gradient Boosting Tree (GBT), Multilayer Perceptron (MLP), AdaBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) for the binary classification task. As the performance of IDS deteriorates with a high dimensional feature vector, an optimum set of features was selected through a Gini Impurity-based Weighted Random Forest (GIWRF) model as the embedded feature selection technique. This technique employed Gini impurity as the splitting criterion of trees and adjusted the weights for two different classes of the imbalanced data to make the learning algorithm understand the class distribution. Based upon the importance score, 20 features were selected from UNSW-NB 15 and 10 features from the Network TON_IoT dataset. The experimental result revealed that DT performed well with the feature selection technique than other trained models of this experiment. Moreover, the proposed GIWRF-DT outperformed other existing methods surveyed in the literature in terms of the F1 score.

41 citations


Journal ArticleDOI
TL;DR: In this paper , the compressive strength of fly ash-based geopolymer concrete is estimated using decision tree, bagging regressor, and AdaBoost regressor with an R 2 value of 0.97.

39 citations


Journal ArticleDOI
01 Mar 2022-Polymers
TL;DR: It was discovered that ensembled machine learning techniques outperformed individual machineLearning techniques in forecasting the compressive strength of geopolymer composites, however, the outcomes of the individual machine learning model were also within the acceptable limit.
Abstract: Geopolymers may be the best alternative to ordinary Portland cement because they are manufactured using waste materials enriched in aluminosilicate. Research on geopolymer composites is accelerating. However, considerable work, expense, and time are needed to cast, cure, and test specimens. The application of computational methods to the stated objective is critical for speedy and cost-effective research. In this study, supervised machine learning approaches were employed to predict the compressive strength of geopolymer composites. One individual machine learning approach, decision tree, and two ensembled machine learning approaches, AdaBoost and random forest, were used. The coefficient correlation (R2), statistical tests, and k-fold analysis were used to determine the validity and comparison of all models. It was discovered that ensembled machine learning techniques outperformed individual machine learning techniques in forecasting the compressive strength of geopolymer composites. However, the outcomes of the individual machine learning model were also within the acceptable limit. R2 values of 0.90, 0.90, and 0.83 were obtained for AdaBoost, random forest, and decision models, respectively. The models’ decreased error values, such as mean absolute error, mean absolute percentage error, and root-mean-square errors, further confirmed the ensembled machine learning techniques’ increased precision. Machine learning approaches will aid the building industry by providing quick and cost-effective methods for evaluating material properties.

35 citations


Journal ArticleDOI
TL;DR: In this paper , a Gini Impurity-based Weighted Random Forest (GIWRF) was used as the embedded feature selection technique for intrusion detection system (IDS) in order to protect the network, resources and sensitive data.
Abstract: Abstract To protect the network, resources, and sensitive data, the intrusion detection system (IDS) has become a fundamental component of organizations that prevents cybercriminal activities. Several approaches have been introduced and implemented to thwart malicious activities so far. Due to the effectiveness of machine learning (ML) methods, the proposed approach applied several ML models for the intrusion detection system. In order to evaluate the performance of models, UNSW-NB 15 and Network TON_IoT datasets were used for offline analysis. Both datasets are comparatively newer than the NSL-KDD dataset to represent modern-day attacks. However, the performance analysis was carried out by training and testing the Decision Tree (DT), Gradient Boosting Tree (GBT), Multilayer Perceptron (MLP), AdaBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) for the binary classification task. As the performance of IDS deteriorates with a high dimensional feature vector, an optimum set of features was selected through a Gini Impurity-based Weighted Random Forest (GIWRF) model as the embedded feature selection technique. This technique employed Gini impurity as the splitting criterion of trees and adjusted the weights for two different classes of the imbalanced data to make the learning algorithm understand the class distribution. Based upon the importance score, 20 features were selected from UNSW-NB 15 and 10 features from the Network TON_IoT dataset. The experimental result revealed that DT performed well with the feature selection technique than other trained models of this experiment. Moreover, the proposed GIWRF-DT outperformed other existing methods surveyed in the literature in terms of the F1 score.

35 citations


Journal ArticleDOI
TL;DR: In this article , a black-box interpretation approach was employed to elucidate the predictions of tree-based and LKRR algorithms for compressive strength prediction of concrete, and the comparison revealed that treebased algorithms and LkRR provided acceptable accuracy.

35 citations


Journal ArticleDOI
TL;DR: In this paper , the authors investigated the application of tree-based models, including decision tree (DT), random forest (RF), and AdaBoost, in slope stability classification under seismic loading conditions.
Abstract: Slope stability analysis allows engineers to pinpoint risky areas, study trigger mechanisms for slope failures, and design slopes with optimal safety and reliability. Before the widespread usage of computers, slope stability analysis was conducted through semi analytical methods, or stability charts. Presently, engineers have developed many computational tools to perform slope stability analysis more efficiently. The challenge associated with furthering slope stability methods is to create a reliable design solution to perform reliable estimations involving a number of geometric and mechanical variables. The objective of this study was to investigate the application of tree-based models, including decision tree (DT), random forest (RF), and AdaBoost, in slope stability classification under seismic loading conditions. The input variables used in the modelling were slope height, slope inclination, cohesion, friction angle, and peak ground acceleration to classify safe slopes and unsafe slopes. The training data for the developed computational intelligence models resulted from a series of slope stability analyses performed using a standard geotechnical engineering software commonly used in geotechnical engineering practice. Upon construction of the tree-based models, the model assessment was performed through the use and calculation of accuracy, F1-score, recall, and precision indices. All tree-based models could efficiently classify the slope stability status, with the AdaBoost model providing the highest performance for the classification of slope stability for both model development and model assessment parts. The proposed AdaBoost model can be used as a screening tool during the stage of feasibility studies of related infrastructure projects, to classify slopes according to their expected status of stability under seismic loading conditions.

34 citations


Journal ArticleDOI
TL;DR: In this article , a new model for predicting bearing capacity is developed using an extreme gradient boosting (XGBoost) algorithm using a total of 200 driven piles static load test-based case histories were used to construct and verify the model.
Abstract: The major criteria that control pile foundation design is pile bearing capacity (Pu). The load bearing capacity of piles is affected by the various characteristics of soils and the involvement of multiple parameters related to both soil and foundation. In this study, a new model for predicting bearing capacity is developed using an extreme gradient boosting (XGBoost) algorithm. A total of 200 driven piles static load test-based case histories were used to construct and verify the model. The developed XGBoost model results were compared to a number of commonly used algorithms—Adaptive Boosting (AdaBoost), Random Forest (RF), Decision Tree (DT) and Support Vector Machine (SVM) using various performance measure metrics such as coefficient of determination, mean absolute error, root mean square error, mean absolute relative error, Nash–Sutcliffe model efficiency coefficient and relative strength ratio. Furthermore, sensitivity analysis was performed to determine the effect of input parameters on Pu. The results show that all of the developed models were capable of making accurate predictions however the XGBoost algorithm surpasses others, followed by AdaBoost, RF, DT, and SVM. The sensitivity analysis result shows that the SPT blow count along the pile shaft has the greatest effect on the Pu.

Journal ArticleDOI
TL;DR: In this article , compressive strength and tensile strength tests were conducted on high performance concrete (HPC) with fly ash and silica fume separately and together, and with polypropylene fiber in triple-blending.

Journal ArticleDOI
TL;DR: In this article , a model of medium-term forecasting of load graphs for electric power system (EPS) with specific properties, based on the use of ensemble machine learning methods is proposed.

Journal ArticleDOI
TL;DR: A novel automated classification algorithm by fusing a number of deep learning approaches has been proposed to detect prostate cancer from ultrasound (US) and MRI images and explains why a specific decision is made given the input US or MRI image.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using tunnel boring machines (TBMs) operation data, which showed a more powerful learning and generalisation ability for small and imbalanced samples.
Abstract: Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines (TBMs). During the TBM tunnelling process, a large number of operation data are generated, reflecting the interaction between the TBM system and surrounding rock, and these data can be used to evaluate the rock mass quality. This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data. Based on the Songhua River water conveyance project, a total of 7538 TB M tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing. Then, through the tree-based feature selection method, 10 key TBM operation parameters are selected, and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers. The preprocessed data are randomly divided into the training set (90%) and test set (10%) using simple random sampling. Besides stacking ensemble classifier, seven individual classifiers are established as the comparison. These classifiers include support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), gradient boosting decision tree (GBDT), decision tree (DT), logistic regression (LR) and multi-layer perceptron (MLP), where the hyper-parameters of each classifier are optimised using the grid search method. The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers, and it shows a more powerful learning and generalisation ability for small and imbalanced samples. Additionally, a relative balance training set is obtained by the synthetic minority oversampling technique (SMOTE), and the influence of sample imbalance on the prediction performance is discussed.

Journal ArticleDOI
TL;DR: In this article , a multi-level ensemble machine learning (ML) was used to determine critical shear stress (CSS) of gravel particles in a cohesive mixture of clay-silt-gravel.
Abstract: Exploration of incipient motion study is significantly important for the river hydraulics community. The present study, along with experimental investigation, considered a new multi-level ensemble machine learning (ML) to determine critical shear stress (CSS) of gravel particles in a cohesive mixture of clay-silt-gravel, clay-silt-sand-gravel, and clay-sand-gravel. The multi-level ensemble ML included a voting-based ensemble meta-estimator integrated with three modern standalone ensemble techniques, namely extreme gradient boosting (XGBoost), Adaptive boosting (Adaboost), and Random Forest (RF), and performance is compared with three standalone ensemble models for prediction of CSS values. Besides, the optimum input combinations were explored using the forward stepwise selection method, as a correlation-based feature selection, and mutual information theory. The outcomes of simulation indicated that the multi-level ensemble machine learning (voting) model in terms of correlation coefficient (R = 0.9641), and root mean square error (RMSE = 0.2022) was superior to the standalone ensemble techniques, i.e., XGBoost (R = 0.9482, and RMSE = 0.2375), Adaboost (R = 0.9496, and RMSE = 0.2387), and RF (R = 0.9392, and RMSE = 0.2739) for accurate estimation of CSS.

Journal ArticleDOI
TL;DR: In this paper , the potential ability of various modern and powerful machine learning methods such as Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), XGBoost, AdaBoost, GBDT, ET, DT, and Random Forest (RF) were investigated to estimate tetracycline (TC) photodegradation from wastewater by 10 different metal-organic frameworks (MOFs).

Journal ArticleDOI
TL;DR: In this paper, the potential ability of various modern and powerful machine learning methods such as Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM), XGBoost, AdaBoost, GBDT, ET, DT, and Random Forest (RF) were investigated to estimate tetracycline (TC) photodegradation from wastewater by 10 different metal-organic frameworks (MOFs).

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an efficient approach to detect credit card fraud using a neural network ensemble classifier and a hybrid data resampling method, which is obtained using a long short-term memory (LSTM) neural network as the base learner in the adaptive boosting technique.
Abstract: Recent advancements in electronic commerce and communication systems have significantly increased the use of credit cards for both online and regular transactions. However, there has been a steady rise in fraudulent credit card transactions, costing financial companies huge losses every year. The development of effective fraud detection algorithms is vital in minimizing these losses, but it is challenging because most credit card datasets are highly imbalanced. Also, using conventional machine learning algorithms for credit card fraud detection is inefficient due to their design, which involves a static mapping of the input vector to output vectors. Therefore, they cannot adapt to the dynamic shopping behavior of credit card clients. This paper proposes an efficient approach to detect credit card fraud using a neural network ensemble classifier and a hybrid data resampling method. The ensemble classifier is obtained using a long short-term memory (LSTM) neural network as the base learner in the adaptive boosting (AdaBoost) technique. Meanwhile, the hybrid resampling is achieved using the synthetic minority oversampling technique and edited nearest neighbor (SMOTE-ENN) method. The effectiveness of the proposed method is demonstrated using publicly available real-world credit card transaction datasets. The performance of the proposed approach is benchmarked against the following algorithms: support vector machine (SVM), multilayer perceptron (MLP), decision tree, traditional AdaBoost, and LSTM. The experimental results show that the classifiers performed better when trained with the resampled data, and the proposed LSTM ensemble outperformed the other algorithms by obtaining a sensitivity and specificity of 0.996 and 0.998, respectively.

Journal ArticleDOI
Ted Fleming1
TL;DR: In this article , a novel automated classification algorithm by fusing a number of deep learning approaches has been proposed to detect prostate cancer from ultrasound (US) and MRI images, and the proposed method explains why a specific decision is made given the input US or MRI image.

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the AdaBoost-Random Forest classifier provides 95.47% accuracy in the early detection of heart disease.
Abstract: As a result of technology improvements, various features have been collected for heart disease diagnosis. Large data sets have several drawbacks, including limited storage capacity and long access and processing times. For medical therapy, early diagnosis of heart problems is crucial. Disease of heart is a devastating human disease that is quickly increasing in developed and also developing countries, resulting in death. In this type of disease, the heart normally fails to provide enough blood to different body parts in order to allow them to perform their regular functions. Early, as well as, proper diagnosis of this condition is very critical for averting further damage and also to save patients' lives. In this work, machine learning (ML) is utilized to find out whether a person has cardiac disease or not. Both the types of ensemble classifiers, namely, homogeneous as well as heterogeneous classifiers (formed by combining two separate classifiers), have been implemented in this work. The data mining preprocessing using Synthetic Minority Oversampling Technique (SMOTE) has been employed to cope with the imbalance problem of the class as well as noise. The proposed work has two steps. SMOTE is used in the initial phase to reduce the impact of data imbalance and the second phase is classifying data using Naive Bayes (NB), decision tree (DT) algorithms, and their ensembles. The experimental results demonstrate that the AdaBoost-Random Forest classifier provides 95.47% accuracy in the early detection of heart disease.

Journal ArticleDOI
TL;DR: The proposed approach to effectively detect CKD by combining the information-gain-based feature selection technique and a cost-sensitive adaptive boosting (AdaBoost) classifier has produced an effective predictive model for CKD diagnosis and could be applied to more imbalanced medical datasets for effective disease detection.
Abstract: The high prevalence of chronic kidney disease (CKD) is a significant public health concern globally. The condition has a high mortality rate, especially in developing countries. CKD often go undetected since there are no obvious early-stage symptoms. Meanwhile, early detection and on-time clinical intervention are necessary to reduce the disease progression. Machine learning (ML) models can provide an efficient and cost-effective computer-aided diagnosis to assist clinicians in achieving early CKD detection. This research proposed an approach to effectively detect CKD by combining the information-gain-based feature selection technique and a cost-sensitive adaptive boosting (AdaBoost) classifier. An approach like this could save CKD screening time and cost since only a few clinical test attributes would be needed for the diagnosis. The proposed approach was benchmarked against recently proposed CKD prediction methods and well-known classifiers. Among these classifiers, the proposed cost-sensitive AdaBoost trained with the reduced feature set achieved the best classification performance with an accuracy, sensitivity, and specificity of 99.8%, 100%, and 99.8%, respectively. Additionally, the experimental results show that the feature selection positively impacted the performance of the various classifiers. The proposed approach has produced an effective predictive model for CKD diagnosis and could be applied to more imbalanced medical datasets for effective disease detection.

Proceedings ArticleDOI
06 Jun 2022
TL;DR: The newly developed ensemble learning techniques were developed based on a survey dataset of 309 people with or without lung cancer by oversampling SMOTE method and the ensemble techniques used are XGBoost, LightGBM, Bagging, and AdaBoost by k-fold 10 cross-validation method.
Abstract: Lung cancers are malignant lung tumors resulting from uncontrolled growth of lung cells that metastasizes to other parts of the body and can cause death. Although lung cancer cannot be prevented, the risk of cancer development can be lowered. Early detection of lung cancer is essential for patient survival, and machine learning-based prediction models have potential use in predicting lung cancer. Ensemble techniques are compelling and powerful techniques in Machine Learning to improve the prediction accuracy as classifiers. This paper reviewed some research articles on lung cancer prediction models that used machine learning and ensemble learning techniques. Furthermore, we added our newly developed ensemble learning techniques to this paper which was developed based on a survey dataset of 309 people with or without lung cancer by oversampling SMOTE method. The ensemble techniques we used are XGBoost, LightGBM, Bagging, and AdaBoost by k-fold 10 cross-validation method and the attributes our lung cancer prediction models used are age, smoking, yellow fingers, anxiety, peer pressure, chronic disease, fatigue, allergy, wheezing, alcohol, coughing, shortness of breath, swallowing difficulty, and chest pain. Results: According to our analysis, the XGBoost technique performed better than other ensemble techniques and achieved an accuracy of 94.42 %, precision of 95.66%, recall of 94.46%, and AUC of 98.14%, respectively.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors investigated the ensemble trees, i.e., random forest (RF), extremely randomized tree (ET), adaptive boosting machine (AdaBoost), gradient boosting machine, extreme gradient boosting machines (XGBoost), light gradient boosting Machine (LGM), and category gradient boosting mechanism (CGM), for predicting strong rockburst.
Abstract: Rockburst is a severe geological hazard that restricts deep mine operations and tunnel constructions. To overcome the shortcomings of widely used algorithms in rockburst prediction, this study investigates the ensemble trees, i.e., random forest (RF), extremely randomized tree (ET), adaptive boosting machine (AdaBoost), gradient boosting machine, extreme gradient boosting machine (XGBoost), light gradient boosting machine, and category gradient boosting machine, for rockburst estimation based on 314 real rockburst cases. Additionally, Bayesian optimization is utilized to optimize these ensemble trees. To improve performance, three combination strategies, voting, bagging, and stacking, are adopted to combine multiple models according to training accuracy. ET and XGBoost receive the best capabilities (85.71% testing accuracy) in single models, and except for AdaBoost, six ensemble trees have high accuracy and can effectively foretell strong rockburst to prevent large-scale underground disasters. The combination models generated by voting, bagging, and stacking perform better than single models, and the voting 2 model that combines XGBoost, ET, and RF with simple soft voting, is the most outstanding (88.89% testing accuracy). The performed sensitivity analysis confirms that the voting 2 model has better robustness than single models and has remarkable adaptation and superiority when input parameters vary or miss, and it has more power to deal with complex and variable engineering environments. Eventually, the rockburst cases in Sanshandao Gold Mine, China, were investigated, and these data verify the practicability of voting 2 in field rockburst prediction.

Journal ArticleDOI
TL;DR: In this article , a large database containing 2197 data points from various literature was compiled and six machine learning algorithms, namely multivariance linear regression (MLR), Gaussian process regression (GPR), support vector machine (SVM), decision tree (DT), random forest (RF) and adaptive boosting methods (AdaBoost), were implemented to predict the thermal conductivity of soils based on the compiled database.

Journal ArticleDOI
TL;DR: In this article , the authors proposed an offensive text classification algorithm named LSTM-BOOST employing Long Short-Term Memory(LSTM) model with ensemble learning to recognize offensive Bengali texts in various social media platforms.
Abstract: Recently, offensive content has become increasingly popular for harassing and criticizing people on numerous social media platforms. This paper proposes an offensive text classification algorithm named LSTM-BOOST employing Long Short-Term Memory(LSTM) model with ensemble learning to recognize offensive Bengali texts in various social media platforms. The proposed LSTM-BOOST model uses the modified AdaBoost algorithm employing principal component analysis(PCA) along with LSTM networks. In the LSTM-Boost model, the dataset is divided into three categories, and PCA and LSTM networks are applied to each part of the dataset to obtain the most significant variance and reduce the weighted error of the weak hypothesis of the model. Furthermore, different classifiers are used for baseline experiment and the model is evaluated on various word embedding vector methods. Our investigation found that the LSTM-BOOST algorithms outperform most of the baseline architecture, leading F1-score of 92.61% on the Bengali offensive text from Social Platforms(BHSSP) dataset.

Journal ArticleDOI
TL;DR: In this paper , the authors developed and tested machine learning-based models for COVID-19 severity prediction, which achieved 100% accuracy, specificity, sensitivity, and ROC curve in conducting a prognostic prediction using different machine learning classifiers.
Abstract: The purpose of this study is to develop and test machine learning-based models for COVID-19 severity prediction. COVID-19 test samples from 337 COVID-19 positive patients at Cheikh Zaid Hospital were grouped according to the severity of their illness. Ours is the first study to estimate illness severity by combining biological and non-biological data from patients with COVID-19. Moreover the use of ML for therapeutic purposes in Morocco is currently restricted, and ours is the first study to investigate the severity of COVID-19. When data analysis approaches were used to uncover patterns and essential characteristics in the data, C-reactive protein, platelets, and D-dimers were determined to be the most associated to COVID-19 severity prediction. In this research, many data reduction algorithms were used, and Machine Learning models were trained to predict the severity of sickness using patient data. A new feature engineering method based on topological data analysis called Uniform Manifold Approximation and Projection (UMAP) shown that it achieves better results. It has 100% accuracy, specificity, sensitivity, and ROC curve in conducting a prognostic prediction using different machine learning classifiers such as X_GBoost, AdaBoost, Random Forest, and ExtraTrees. The proposed approach aims to assist hospitals and medical facilities in determining who should be seen first and who has a higher priority for admission to the hospital.

Journal ArticleDOI
TL;DR: Analysis of widely used machine learning classifiers using a real-life RTA dataset from Gauteng, South Africa shows that the RF classifier, combined with multiple imputations by chained equations, yielded the best performance when compared with the other combinations.
Abstract: Road traffic accidents (RTAs) are a major cause of injuries and fatalities worldwide. In recent years, there has been a growing global interest in analysing RTAs, specifically concerned with analysing and modelling accident data to better understand and assess the causes and effects of accidents. This study analysed the performance of widely used machine learning classifiers using a real-life RTA dataset from Gauteng, South Africa. The study aimed to assess prediction model designs for RTAs to assist transport authorities and policymakers. It considered classifiers such as naïve Bayes, logistic regression, k-nearest neighbour, AdaBoost, support vector machine, random forest, and five missing data methods. These classifiers were evaluated using five evaluation metrics: accuracy, root-mean-square error, precision, recall, and receiver operating characteristic curves. Furthermore, the assessment involved parameter adjustment and incorporated dimensionality reduction techniques. The empirical results and analyses show that the RF classifier, combined with multiple imputations by chained equations, yielded the best performance when compared with the other combinations.

Journal ArticleDOI
TL;DR: In this article , the authors proposed an improved learning model to predict the severity of the patients by exploiting a combination of machine learning techniques, which used an adaptive boost algorithm with a decision tree estimator and a new parameter tuning process.

Journal ArticleDOI
Sha Lin, Bei Han, Yanyan Li, Chao Han, Wei Li 
TL;DR: The analysis of engineering examples shows that the ensemble learning algorithm can deal with geotechnical material variables well and give accurate and reliable prediction results, which has good applicability for slope stability evaluation.