Showing papers on "Random forest published in 2021"

PDF

Open Access

Journal Article•DOI•

A comparative analysis of gradient boosting algorithms

[...]

Candice Bentéjac¹, Anna Csörgő², Gonzalo Martínez-Muñoz³•Institutions (3)

University of Bordeaux¹, Pázmány Péter Catholic University², Autonomous University of Madrid³

01 Mar 2021-Artificial Intelligence Review

TL;DR: A comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed and indicates that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small.

...read moreread less

Abstract: The family of gradient boosting algorithms has been recently extended with several interesting proposals (i.e. XGBoost, LightGBM and CatBoost) that focus on both speed and accuracy. XGBoost is a scalable ensemble technique that has demonstrated to be a reliable and efficient machine learning challenge solver. LightGBM is an accurate model focused on providing extremely fast training performance using selective sampling of high gradient instances. CatBoost modifies the computation of gradients to avoid the prediction shift in order to improve the accuracy of the model. This work proposes a practical analysis of how these novel variants of gradient boosting work in terms of training speed, generalization performance and hyper-parameter setup. In addition, a comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed using carefully tuned models as well as using their default settings. The results of this comparison indicate that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small. LightGBM is the fastest of all methods but not the most accurate. Finally, XGBoost places second both in accuracy and in training speed. Finally an extensive analysis of the effect of hyper-parameter tuning in XGBoost, LightGBM and CatBoost is carried out using two novel proposed tools.

...read moreread less

375 citations

Journal Article•DOI•

An ensemble machine learning approach through effective feature extraction to classify fake news

[...]

Saqib Hakak¹, Mamoun Alazab², Suleman Khan³, Thippa Reddy Gadekallu⁴, Praveen Kumar Reddy Maddikunta⁴, Wazir Zada Khan⁵ - Show less +2 more•Institutions (5)

University of New Brunswick¹, Charles Darwin University², Air University (Islamabad)³, VIT University⁴, Jazan University⁵

01 Apr 2021-Future Generation Computer Systems

TL;DR: This article has proposed an ensemble classification model for detection of the fake news that has achieved a better accuracy compared to the state-of-the-art.

...read moreread less

186 citations

Journal Article•DOI•

Large group activity security risk assessment and risk early warning based on random forest algorithm

[...]

Yanyu Chen¹, Wenzhe Zheng¹, Wenbo Li¹, Yimiao Huang²•Institutions (2)

Zhejiang Normal University¹, Changchun University²

01 Apr 2021-Pattern Recognition Letters

TL;DR: The method of calculating the importance of the random forest algorithm to variables and the calculation formula of the weight of the security risk index leads to the conclusion that the random Forest algorithm has good predictive ability in the risk assessment of large-scale group activities.

...read moreread less

172 citations

Journal Article•DOI•

Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms With Relief and LASSO Feature Selection Techniques

[...]

Pronab Ghosh¹, Sami Azam², Mirjam Jonkman², Asif Karim², F. M. Javed Mehedi Shamrat, Eva Ignatious², Shahana Shultana¹, Abhijith Reddy Beeravolu², Friso De Boer² - Show less +5 more•Institutions (2)

Daffodil International University¹, Charles Darwin University²

22 Jan 2021-IEEE Access

TL;DR: In this article, the authors proposed a model that incorporates different methods to achieve effective prediction of heart disease, which used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model.

...read moreread less

Abstract: Cardiovascular diseases (CVD) are among the most common serious illnesses affecting human health. CVDs may be prevented or mitigated by early diagnosis, and this may reduce mortality rates. Identifying risk factors using machine learning models is a promising approach. We would like to propose a model that incorporates different methods to achieve effective prediction of heart disease. For our proposed model to be successful, we have used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model. We have used a combined dataset (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). Suitable features are selected by using the Relief, and Least Absolute Shrinkage and Selection Operator (LASSO) techniques. New hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process. We have also instrumented some machine learning algorithms to calculate the Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE) and F1 Score (F1) of our model, along with the Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate (FNR). The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy while using RFBM and Relief feature selection methods (99.05%).

...read moreread less

169 citations

Journal Article•DOI•

Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques

[...]

Abid Ishaq¹, Saima Sadiq¹, Muhammad Umer¹, Saleem Ullah¹, Seyedali Mirjalili, Vaibhav Rupapara², Michele Nappi³ - Show less +3 more•Institutions (3)

University of Engineering and Technology, Lahore¹, Florida International University², University of Salerno³

04 Mar 2021-IEEE Access

TL;DR: In this paper, the authors analyzed the heart failure survivors from the dataset of 299 patients admitted in hospital and found significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient's survivor prediction.

...read moreread less

Abstract: Cardiovascular disease is a substantial cause of mortality and morbidity in the world. In clinical data analytics, it is a great challenge to predict heart disease survivor. Data mining transforms huge amounts of raw data generated by the health industry into useful information that can help in making informed decisions. Various studies proved that significant features play a key role in improving performance of machine learning models. This study analyzes the heart failure survivors from the dataset of 299 patients admitted in hospital. The aim is to find significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient’s survivor prediction. To predict patient’s survival, this study employs nine classification models: Decision Tree (DT), Adaptive boosting classifier (AdaBoost), Logistic Regression (LR), Stochastic Gradient classifier (SGD), Random Forest (RF), Gradient Boosting classifier (GBM), Extra Tree Classifier (ETC), Gaussian Naive Bayes classifier (G-NB) and Support Vector Machine (SVM). The imbalance class problem is handled by Synthetic Minority Oversampling Technique (SMOTE). Furthermore, machine learning models are trained on the highest ranked features selected by RF. The results are compared with those provided by machine learning algorithms using full set of features. Experimental results demonstrate that ETC outperforms other models and achieves 0.9262 accuracy value with SMOTE in prediction of heart patient’s survival.

...read moreread less

162 citations

Journal Article•DOI•

Predictive modeling for sustainable high-performance concrete from industrial wastes: A comparison and optimization of models using ensemble learners

[...]

Furqan Farooq¹, Wisal Ahmed², Arslan Akbar², Fahid Aslam³, Rayed Alyousef³ - Show less +1 more•Institutions (3)

COMSATS Institute of Information Technology¹, City University of Hong Kong², Salman bin Abdulaziz University³

10 Apr 2021-Journal of Cleaner Production

TL;DR: This study uses machine intelligence algorithms with individual learners and ensemble learners (bagging, boosting) to predict the strength of (HPC) prepared with waste materials and suggested that the individual model response is enhanced by using the bagging and boosting learners.

...read moreread less

152 citations

Journal Article•DOI•

An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier

[...]

Saloni Kumari¹, Deepika Kumar¹, Mamta Mittal•Institutions (1)

Bharati Vidyapeeth's College of Engineering¹

01 Jun 2021

TL;DR: The proposed ensemble soft voting classifier gives binary classification and uses the ensemble of three machine learning algorithms viz. random forest, logistic regression, and Naive Bayes for the classification.

...read moreread less

Abstract: Diabetes is a dreadful disease identified by escalated levels of glucose in the blood Machine learning algorithms help in identification and prediction of diabetes at an early stage The main objective of this study is to predict diabetes mellitus with better accuracy using an ensemble of machine learning algorithms The Pima Indians Diabetes dataset has been considered for experimentation, which gathers details of patients with and without having diabetes The proposed ensemble soft voting classifier gives binary classification and uses the ensemble of three machine learning algorithms viz random forest, logistic regression, and Naive Bayes for the classification Empirical evaluation of the proposed methodology has been conducted with state-of-the-art methodologies and base classifiers such as AdaBoost, Logistic Regression,Support Vector machine, Random forest, Naive Bayes, Bagging, GradientBoost, XGBoost, CatBoost by taking accuracy, precision, recall, F1-score as the evaluation criteria The proposed ensemble approach gives the highest accuracy, precision, recall, and F1_score value with 7904%, 7348%, 7145% and 806% respectively on the PIMA diabetes dataset Further, the efficiency of the proposed methodology has also been compared and analysed with breast cancer dataset The proposed ensemble soft voting classifier has given 9702% accuracy on the breast cancer dataset

...read moreread less

141 citations

Posted Content•DOI•

A survey on missing data in machine learning.

[...]

Tlamelo Emmanuel¹, Thabiso M. Maupong¹, Dimane Mpoeleng¹, Thabo Semong¹, Banyatsang Mphago¹, Oteng Tabona¹ - Show less +2 more•Institutions (1)

Botswana International University of Science and Technology¹

17 Jun 2021-Journal of Big Data

TL;DR: This paper aggregates some of the literature on missing data particularly focusing on machine learning techniques, and gives insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for.

...read moreread less

Abstract: Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

...read moreread less

138 citations

Journal Article•DOI•

Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling

[...]

Stefanos Georganos¹, Taïs Grippa¹, Assane Niang Gadiaga², Catherine Linard², Moritz Lennert¹, Sabine Vanhuysse¹, Nicholus Mboga¹, Eléonore Wolff¹, Stamatis Kalogirou³ - Show less +5 more•Institutions (3)

Université libre de Bruxelles¹, Université de Namur², Harokopio University³

20 Jan 2021-Geocarto International

TL;DR: From the first empirical results, it is concluded that GRF can be more predictive when an appropriate spatial scale is selected to model the data, with reduced residual autocorrelation and lower Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) values.

...read moreread less

Abstract: Machine learning algorithms such as Random Forest (RF) are being increasingly applied on traditionally geographical topics such as population estimation. Even though RF is a well performing and gen...

...read moreread less

130 citations

Journal Article•DOI•

Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

[...]

Manjurul Ahsan, M. A. Parvez Mahmud, Pritom Kumar Saha, Kishor Datta Gupta, Zahed Siddique - Show less +1 more

01 Jan 2021

TL;DR: CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score, and the study outcomes demonstrate that the model’s performance varies depending on the data scaling method.

...read moreread less

Abstract: Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.

...read moreread less

128 citations

Journal Article•DOI•

Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset

[...]

Ziadoon Kamil Maseer¹, Robiah Yusof¹, Nazrulazhar Bahaman¹, Salama A. Mostafa², Cik Feresa Mohd Foozy² - Show less +1 more•Institutions (2)

Universiti Teknikal Malaysia Melaka¹, Universiti Tun Hussein Onn Malaysia²

03 Feb 2021-IEEE Access

TL;DR: 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML–AIDS of networks and computers are applied and the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models are evaluated.

...read moreread less

Abstract: An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS) Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML–AIDS of networks and computers These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results

...read moreread less

Proceedings Article•DOI•

Heart Disease Prediction using Hybrid machine Learning Model

[...]

M. Kavitha¹, G. Gnaneswar¹, R. Dinesh¹, Y. Rohith Sai¹, R. Sai Suraj¹ - Show less +1 more•Institutions (1)

K L University¹

20 Jan 2021

TL;DR: In this article, a hybrid model of decision tree and random forest was used to predict heart disease in the Cleveland heart disease dataset, which achieved an accuracy of 88.7% through the prediction model with the hybrid model.

...read moreread less

Abstract: Heart disease causes a significant mortality rate around the world, and it has become a health threat for many people. Early prediction of heart disease may save many lives; detecting cardiovascular diseases like heart attacks, coronary artery diseases etc., is a critical challenge by the regular clinical data analysis. Machine learning (ML) can bring an effective solution for decision making and accurate predictions. The medical industry is showing enormous development in using machine learning techniques. In the proposed work, a novel machine learning approach is proposed to predict heart disease. The proposed study used the Cleveland heart disease dataset, and data mining techniques such as regression and classification are used. Machine learning techniques Random Forest and Decision Tree are applied. The novel technique of the machine learning model is designed. In implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2. Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental results show an accuracy level of 88.7% through the heart disease prediction model with the hybrid model. The interface is designed to get the user's input parameter to predict the heart disease, for which we used a hybrid model of Decision Tree and Random Forest.

...read moreread less

Journal Article•DOI•

Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements

[...]

De-Cheng Feng¹, Wen-Jie Wang¹, Sujith Mangalathu, Gang Hu², Tao Wu³ - Show less +1 more•Institutions (3)

Southeast University¹, Harbin Institute of Technology², Chang'an University³

15 May 2021-Engineering Structures

TL;DR: It is shown that the ensemble machine learnning models are significantly superior to mechanics-driven models in both predicting accuracy and discrepancy.

...read moreread less

Journal Article•DOI•

Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction

[...]

Amirhosein Mosavi¹, Farzaneh Sajedi Hosseini², Bahram Choubin, Massoud Goodarzi, Adrienn Dineva³, Elham Rafiei Sardooi⁴ - Show less +2 more•Institutions (4)

Ton Duc Thang University¹, University of Tehran², Duy Tan University³, Jiroft University⁴

01 Jan 2021-Water Resources Management

TL;DR: Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.

...read moreread less

Abstract: Due to the rapidly increasing demand for groundwater, as one of the principal freshwater resources, there is an urge to advance novel prediction systems to more accurately estimate the groundwater potential for an informed groundwater resource management. Ensemble machine learning methods are generally reported to produce more accurate results. However, proposing the novel ensemble models along with running comparative studies for performance evaluation of these models would be equally essential to precisely identify the suitable methods. Thus, the current study is designed to provide knowledge on the performance of the four ensemble models i.e., Boosted generalized additive model (GamBoost), adaptive Boosting classification trees (AdaBoost), Bagged classification and regression trees (Bagged CART), and random forest (RF). To build the models, 339 groundwater resources’ locations and the spatial groundwater potential conditioning factors were used. Thereafter, the recursive feature elimination (RFE) method was applied to identify the key features. The RFE specified that the best number of features for groundwater potential modeling was 12 variables among 15 (with a mean Accuracy of about 0.84). The modeling results indicated that the Bagging models (i.e., RF and Bagged CART) had a higher performance than the Boosting models (i.e., AdaBoost and GamBoost). Overall, the RF model outperformed the other models (with accuracy = 0.86, Kappa = 0.67, Precision = 0.85, and Recall = 0.91). Also, the topographic position index’s predictive variables, valley depth, drainage density, elevation, and distance from stream had the highest contribution in the modeling process. Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.

...read moreread less

Journal Article•DOI•

AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes

[...]

V. Jackins¹, S. Vimal¹, M. Kaliappan, Mi Young Lee²•Institutions (2)

National Engineering College¹, Sejong University²

01 May 2021-The Journal of Supercomputing

TL;DR: The artificial intelligence has been used with Naive Bayes classification and random forest classification algorithm to classify many disease datasets like diabetes, heart disease, and cancer to check whether the patient is affected by that disease or not.

...read moreread less

Abstract: Healthcare practices include collecting all kinds of patient data which would help the doctor correctly diagnose the health condition of the patient. These data could be simple symptoms observed by the subject, initial diagnosis by a physician or a detailed test result from a laboratory. Thus, these data are only utilized for analysis by a doctor who then ascertains the disease using his/her personal medical expertise. The artificial intelligence has been used with Naive Bayes classification and random forest classification algorithm to classify many disease datasets like diabetes, heart disease, and cancer to check whether the patient is affected by that disease or not. A performance analysis of the disease data for both algorithms is calculated and compared. The results of the simulations show the effectiveness of the classification techniques on a dataset, as well as the nature and complexity of the dataset used.

...read moreread less

Journal Article•DOI•

Prediction of cryptocurrency returns using machine learning

[...]

Erdinc Akyildirim¹, Erdinc Akyildirim², Erdinc Akyildirim³, Ahmet Goncu⁴, Ahmet Goncu⁵, Ahmet Sensoy⁶ - Show less +2 more•Institutions (6)

Mehmet Akif Ersoy University¹, University of Zurich², ETH Zurich³, University of Liverpool⁴, Shanghai Jiao Tong University⁵, Bilkent University⁶

01 Feb 2021-Annals of Operations Research

TL;DR: Machine learning classification algorithms reach about 55–65% predictive accuracy on average at the daily or minute level frequencies, while the support vector machines demonstrate the best and consistent results in terms of predictive accuracy compared to the logistic regression, artificial neural networks and random forest classification algorithms.

...read moreread less

Abstract: In this study, the predictability of the most liquid twelve cryptocurrencies are analyzed at the daily and minute level frequencies using the machine learning classification algorithms including the support vector machines, logistic regression, artificial neural networks, and random forests with the past price information and technical indicators as model features. The average classification accuracy of four algorithms are consistently all above the 50% threshold for all cryptocurrencies and for all the timescales showing that there exists predictability of trends in prices to a certain degree in the cryptocurrency markets. Machine learning classification algorithms reach about 55–65% predictive accuracy on average at the daily or minute level frequencies, while the support vector machines demonstrate the best and consistent results in terms of predictive accuracy compared to the logistic regression, artificial neural networks and random forest classification algorithms.

...read moreread less

Journal Article•DOI•

Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain

[...]

Alhanoof Althnian, Duaa H. AlSaeed, Heyam H. Al-Baity, Amani Khalaf Samha, Alanoud Bin Dris, Najla Alzakari, Afnan Abou Elwafa, Heba Kurdi - Show less +4 more

15 Jan 2021-Applied Sciences

TL;DR: It is found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT.

...read moreread less

Abstract: Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naive Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

...read moreread less

Journal Article•DOI•

Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making

[...]

Mohammad Pourhomayoun¹, Mahdi Shakibi¹•Institutions (1)

California State University, Los Angeles¹

16 Jan 2021-Smart Health

TL;DR: In the wake of COVID-19 disease, caused by the SARS-CoV-2 virus, a predictive model based on Artificial Intelligence (AI) and Machine Learning (ML) algorithms to determine the health risk and predict the mortality risk of patients with CoV-19 was developed in this paper.

...read moreread less

Journal Article•DOI•

The Role of Machine Learning Algorithms for Diagnosing Diseases

[...]

Ibrahim Mahmood Ibrahim¹, Adnan Mohsin Abdulazeez•Institutions (1)

University of Kurdistan¹

19 Mar 2021

TL;DR: An overview of the machine learning algorithms that are applied for the identification and prediction of many diseases such as Naïve Bayes, logistic regression, support vector machine, K-nearest neighbor,K-means clustering, decision tree, and random forest are given.

...read moreread less

Abstract: Nowadays, machine learning algorithms have become very important in the medical sector, especially for diagnosing disease from the medical database. Many companies using these techniques for the early prediction of diseases and enhance medical diagnostics. The motivation of this paper is to give an overview of the machine learning algorithms that are applied for the identification and prediction of many diseases such as Naive Bayes, logistic regression, support vector machine, K-nearest neighbor, K-means clustering, decision tree, and random forest. In this work, many previous studies were reviewed that used machine learning algorithms for detecting various diseases in the medical area in the last three years. A comparison is provided concerning these algorithms, assessment processes, and the obtained results. Finally, a discussion of the previous works is presented.

...read moreread less

Journal Article•DOI•

2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors

[...]

Monika Bansal¹, Munish Kumar², Manish Kumar•Institutions (2)

Punjabi University¹, Punjab Technical University²

01 May 2021-Multimedia Tools and Applications

TL;DR: A comparative analysis view among various feature descriptors algorithms and classification models for 2D object recognition reveals that a hybridization of SIFT, SURF and ORB method with Random Forest classification model accomplishes the best results as compared to other state-of-the-art work.

...read moreread less

Abstract: Object recognition is a key research area in the field of image processing and computer vision, which recognizes the object in an image and provides a proper label. In the paper, three popular feature descriptor algorithms that are Scale Invariant Feature Transform (SIFT), Speeded Up Robust Feature (SURF) and Oriented Fast and Rotated BRIEF (ORB) are used for experimental work of an object recognition system. A comparison among these three descriptors is exhibited in the paper by determining them individually and with different combinations of these three methodologies. The amount of the features extracted using these feature extraction methods are further reduced using a feature selection (k-means clustering) and a dimensionality reduction method (Locality Preserving Projection). Various classifiers i.e. K-Nearest Neighbor, Naive Bayes, Decision Tree, and Random Forest are used to classify objects based on their similarity. The focus of this article is to present a study of the performance comparison among these three feature extraction methods, particularly when their combination derives in recognizing the object more efficiently. In this paper, the authors have presented a comparative analysis view among various feature descriptors algorithms and classification models for 2D object recognition. The Caltech-101 public dataset is considered in this article for experimental work. The experiment reveals that a hybridization of SIFT, SURF and ORB method with Random Forest classification model accomplishes the best results as compared to other state-of-the-art work. The comparative analysis has been presented in terms of recognition accuracy, True Positive Rate (TPR), False Positive Rate (FPR), and Area Under Curve (AUC) parameters.

...read moreread less

Journal Article•DOI•

Sentiment analysis and classification of Indian farmers’ protest using twitter data

[...]

Ashwin Sanjay Neogi¹, Kirti Anilkumar Garg¹, Ram Krishn Mishra¹, Yogesh K. Dwivedi²•Institutions (2)

Synergy University¹, Swansea University²

01 Jan 2021

TL;DR: This research gathered data from the microblogging website Twitter concerning farmers’ protest to understand the sentiments that the public shared on an international level and used models to categorize and analyze the sentiments based on a collection of around 20,000 tweets on the protest.

...read moreread less

Abstract: Protests are an integral part of democracy and an important source for citizens to convey their demands and/or dissatisfaction to the government. As citizens become more aware of their rights, there has been an increasing number of protests all over the world for various reasons. With the advancement of technology, there has also been an exponential rise in the use of social media to exchange information and ideas. In this research, we gathered data from the microblogging website Twitter concerning farmers’ protest to understand the sentiments that the public shared on an international level. We used models to categorize and analyze the sentiments based on a collection of around 20,000 tweets on the protest. We conducted our analysis using Bag of Words and TF-IDF and discovered that Bag of Words performed better than TF-IDF. In addition, we also used Naive Bayes, Decision Trees, Random Forests, and Support Vector Machines and also discovered that Random Forest had the highest classification accuracy.

...read moreread less

Journal Article•DOI•

EMCNet: Automated COVID-19 diagnosis from X-ray images using convolutional neural network and ensemble of machine learning classifiers

[...]

Prottoy Saha¹, Muhammad Sheikh Sadi¹, Md. Milon Islam¹•Institutions (1)

Khulna University of Engineering & Technology¹

01 Jan 2021-Informatics in Medicine Unlocked

TL;DR: In this paper, a convolutional neural network was developed focusing on the simplicity of the model to extract deep and high-level features from X-ray images of patients infected with COVID-19.

...read moreread less

Journal Article•DOI•

GIS-based landslide susceptibility modeling: A comparison between fuzzy multi-criteria and machine learning algorithms

[...]

Sk Ajim Ali¹, Farhana Parvin¹, Jana Vojteková², Romulus Costache³, Nguyen Thi Thuy Linh⁴, Quoc Bao Pham⁵, Matej Vojtek², Ljubomir Gigović⁶, Ateeque Ahmad¹, Mohammad Ali Ghorbani⁷ - Show less +6 more•Institutions (7)

Aligarh Muslim University¹, University of Constantine the Philosopher², University of Bucharest³, Water Resources University⁴, Duy Tan University⁵, University of Defence⁶, Virginia Tech College of Natural Resources and Environment⁷

01 Mar 2021-Geoscience frontiers

TL;DR: The results revealed that random forest (RF) classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve, lower value of mean absolute error, and higher value of Kappa index.

...read moreread less

Abstract: Hazards and disasters have always negative impacts on the way of life. Landslide is an overwhelming natural as well as man-made disaster that causes loss of natural resources and human properties throughout the world. The present study aimed to assess and compare the prediction efficiency of different models in landslide susceptibility in the Kysuca river basin, Slovakia. In this regard, the fuzzy decision-making trial and evaluation laboratory combining with the analytic network process (FDEMATEL-ANP), Naive Bayes (NB) classifier, and random forest (RF) classifier were considered. Initially, a landslide inventory map was produced with 2000 landslide and non-landslide points by randomly divided with a ratio of 70%:30% for training and testing, respectively. The geospatial database for assessing the landslide susceptibility was generated with the help of 16 landslide conditioning factors by allowing for topographical, hydrological, lithological, and land cover factors. The ReliefF method was considered for determining the significance of selected conditioning factors and inclusion in the model building. Consequently, the landslide susceptibility maps (LSMs) were generated using the FDEMATEL-ANP, Naive Bayes (NB) classifier, and random forest (RF) classifier models. Finally, the area under curve (AUC) and different arithmetic evaluation were used for validating and comparing the results and models. The results revealed that random forest (RF) classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve (AUC = 0.954), lower value of mean absolute error (MAE = 0.1238) and root mean square error (RMSE = 0.2555), and higher value of Kappa index (K = 0.8435) and overall accuracy (OAC = 92.2%).

...read moreread less

Journal Article•DOI•

Application of decision tree-based ensemble learning in the classification of breast cancer.

[...]

Mohammad M. Ghiasi¹, Sohrab Zendehboudi¹•Institutions (1)

St. John's University¹

01 Jan 2021-Computers in Biology and Medicine

TL;DR: The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification.

...read moreread less

Journal Article•DOI•

Extreme gradient boosting and deep neural network based ensemble learning approach to forecast hourly solar irradiance

[...]

Pratima Kumari¹, Durga Toshniwal¹•Institutions (1)

Indian Institute of Technology Roorkee¹

10 Jan 2021-Journal of Cleaner Production

TL;DR: The proposed ensemble model, which consists of two advance base models, namely extreme gradient boosting forest and deep neural networks (XGBF-DNN), is proposed for hourly global horizontal irradiance forecast and exhibits the best combination of stability and prediction accuracy irrespective of seasonal variations in weather conditions.

...read moreread less

Journal Article•DOI•

Ensemble machine learning models for the detection of energy theft

[...]

Sravan Kumar Gunturi¹, Dipu Sarkar¹•Institutions (1)

National Institute of Technology Nagaland¹

01 Mar 2021-Electric Power Systems Research

TL;DR: An extensive analysis based on a practical dataset of 5000 customers reveals that bagging models outperform other algorithms and the precision analysis shows that the proposed bagging methods perform better.

...read moreread less

Journal Article•DOI•

Random forests for global sensitivity analysis: A selective review

[...]

Anestis Antoniadis¹, Anestis Antoniadis², Sophie Lambert-Lacroix², Jean-Michel Poggi³, Jean-Michel Poggi⁴ - Show less +1 more•Institutions (4)

University of Cape Town¹, University of Grenoble², University of Paris³, Université Paris-Saclay⁴

01 Feb 2021-Reliability Engineering & System Safety

TL;DR: The idea is to use a random forests methodology as an efficient non-parametric approach for building meta-models that allow an efficient sensitivity analysis, and an adequate set of tools for quantifying variable importance are reviewed.

...read moreread less

Journal Article•DOI•

A Review on Machine Learning for EEG Signal Processing in Bioengineering

[...]

Mohammad-Parsa Hosseini¹, Amin Hosseini², Kiarash Ahi³•Institutions (3)

Santa Clara University¹, Islamic Azad University², University of Connecticut³

01 Jan 2021-IEEE Reviews in Biomedical Engineering

TL;DR: This paper provides a comprehensive overview of Machine Learning applications used in EEG analysis and gives an overview of each of the methods and general applications that each is best suited to.

...read moreread less

Abstract: Electroencephalography (EEG) has been a staple method for identifying certain health conditions in patients since its discovery. Due to the many different types of classifiers available to use, the analysis methods are also equally numerous. In this review, we will be examining specifically machine learning methods that have been developed for EEG analysis with bioengineering applications. We reviewed literature from 1988 to 2018 to capture previous and current classification methods for EEG in multiple applications. From this information, we are able to determine the overall effectiveness of each machine learning method as well as the key characteristics. We have found that all the primary methods used in machine learning have been applied in some form in EEG classification. This ranges from Naive-Bayes to Decision Tree/Random Forest, to Support Vector Machine (SVM). Supervised learning methods are on average of higher accuracy than their unsupervised counterparts. This includes SVM and KNN. While each of the methods individually is limited in their accuracy in their respective applications, there is hope that the combination of methods when implemented properly has a higher overall classification accuracy. This paper provides a comprehensive overview of Machine Learning applications used in EEG analysis. It also gives an overview of each of the methods and general applications that each is best suited to.

...read moreread less

Journal Article•DOI•

Tree species classification from airborne hyperspectral and LiDAR data using 3D convolutional neural networks

[...]

Janne Mäyrä¹, Sarita Keski-Saari², Sonja Kivinen², Sonja Kivinen¹, Topi Tanhuanpää³, Topi Tanhuanpää², Pekka Hurskainen³, Pekka Hurskainen¹, Peter Kullberg¹, Laura Poikolainen², Arto Viinikka¹, Sakari Tuominen, Timo Kumpula², Petteri Vihervaara¹ - Show less +10 more•Institutions (3)

Finnish Environment Institute¹, University of Eastern Finland², University of Helsinki³

01 Apr 2021-Remote Sensing of Environment

TL;DR: 3D-CNNs were more efficient in distinguishing coniferous species from each other, with a concurrent high accuracy for aspen classification, which can benefit both sustainable forestry and biodiversity conservation.

...read moreread less

Journal Article•DOI•

Fault diagnosis based on extremely randomized trees in wireless sensor networks

[...]

Umer Saeed¹, Sana Ullah Jan¹, YoungDoo Lee¹, Insoo Koo¹•Institutions (1)

University of Ulsan¹

01 Jan 2021-Reliability Engineering & System Safety

TL;DR: An ensemble learning-based lightweight technique called Extremely Randomized Trees or Extra-Trees-based detection scheme has the ability of robustness towards signal noise and strong reduction of bias and variance error and the performances were compared with those of the state-of-the-art machine learning algorithms.

...read moreread less

Collapse