scispace - formally typeset
Search or ask a question

Showing papers on "Random forest published in 2021"


Journal ArticleDOI
TL;DR: A comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed and indicates that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small.
Abstract: The family of gradient boosting algorithms has been recently extended with several interesting proposals (i.e. XGBoost, LightGBM and CatBoost) that focus on both speed and accuracy. XGBoost is a scalable ensemble technique that has demonstrated to be a reliable and efficient machine learning challenge solver. LightGBM is an accurate model focused on providing extremely fast training performance using selective sampling of high gradient instances. CatBoost modifies the computation of gradients to avoid the prediction shift in order to improve the accuracy of the model. This work proposes a practical analysis of how these novel variants of gradient boosting work in terms of training speed, generalization performance and hyper-parameter setup. In addition, a comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed using carefully tuned models as well as using their default settings. The results of this comparison indicate that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small. LightGBM is the fastest of all methods but not the most accurate. Finally, XGBoost places second both in accuracy and in training speed. Finally an extensive analysis of the effect of hyper-parameter tuning in XGBoost, LightGBM and CatBoost is carried out using two novel proposed tools.

375 citations


Journal ArticleDOI
TL;DR: This article has proposed an ensemble classification model for detection of the fake news that has achieved a better accuracy compared to the state-of-the-art.

186 citations


Journal ArticleDOI
TL;DR: The method of calculating the importance of the random forest algorithm to variables and the calculation formula of the weight of the security risk index leads to the conclusion that the random Forest algorithm has good predictive ability in the risk assessment of large-scale group activities.

172 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a model that incorporates different methods to achieve effective prediction of heart disease, which used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model.
Abstract: Cardiovascular diseases (CVD) are among the most common serious illnesses affecting human health. CVDs may be prevented or mitigated by early diagnosis, and this may reduce mortality rates. Identifying risk factors using machine learning models is a promising approach. We would like to propose a model that incorporates different methods to achieve effective prediction of heart disease. For our proposed model to be successful, we have used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model. We have used a combined dataset (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). Suitable features are selected by using the Relief, and Least Absolute Shrinkage and Selection Operator (LASSO) techniques. New hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process. We have also instrumented some machine learning algorithms to calculate the Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE) and F1 Score (F1) of our model, along with the Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate (FNR). The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy while using RFBM and Relief feature selection methods (99.05%).

169 citations


Journal ArticleDOI
TL;DR: In this paper, the authors analyzed the heart failure survivors from the dataset of 299 patients admitted in hospital and found significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient's survivor prediction.
Abstract: Cardiovascular disease is a substantial cause of mortality and morbidity in the world. In clinical data analytics, it is a great challenge to predict heart disease survivor. Data mining transforms huge amounts of raw data generated by the health industry into useful information that can help in making informed decisions. Various studies proved that significant features play a key role in improving performance of machine learning models. This study analyzes the heart failure survivors from the dataset of 299 patients admitted in hospital. The aim is to find significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient’s survivor prediction. To predict patient’s survival, this study employs nine classification models: Decision Tree (DT), Adaptive boosting classifier (AdaBoost), Logistic Regression (LR), Stochastic Gradient classifier (SGD), Random Forest (RF), Gradient Boosting classifier (GBM), Extra Tree Classifier (ETC), Gaussian Naive Bayes classifier (G-NB) and Support Vector Machine (SVM). The imbalance class problem is handled by Synthetic Minority Oversampling Technique (SMOTE). Furthermore, machine learning models are trained on the highest ranked features selected by RF. The results are compared with those provided by machine learning algorithms using full set of features. Experimental results demonstrate that ETC outperforms other models and achieves 0.9262 accuracy value with SMOTE in prediction of heart patient’s survival.

162 citations


Journal ArticleDOI
TL;DR: This study uses machine intelligence algorithms with individual learners and ensemble learners (bagging, boosting) to predict the strength of (HPC) prepared with waste materials and suggested that the individual model response is enhanced by using the bagging and boosting learners.

152 citations


Journal ArticleDOI
01 Jun 2021
TL;DR: The proposed ensemble soft voting classifier gives binary classification and uses the ensemble of three machine learning algorithms viz. random forest, logistic regression, and Naive Bayes for the classification.
Abstract: Diabetes is a dreadful disease identified by escalated levels of glucose in the blood Machine learning algorithms help in identification and prediction of diabetes at an early stage The main objective of this study is to predict diabetes mellitus with better accuracy using an ensemble of machine learning algorithms The Pima Indians Diabetes dataset has been considered for experimentation, which gathers details of patients with and without having diabetes The proposed ensemble soft voting classifier gives binary classification and uses the ensemble of three machine learning algorithms viz random forest, logistic regression, and Naive Bayes for the classification Empirical evaluation of the proposed methodology has been conducted with state-of-the-art methodologies and base classifiers such as AdaBoost, Logistic Regression,Support Vector machine, Random forest, Naive Bayes, Bagging, GradientBoost, XGBoost, CatBoost by taking accuracy, precision, recall, F1-score as the evaluation criteria The proposed ensemble approach gives the highest accuracy, precision, recall, and F1_score value with 7904%, 7348%, 7145% and 806% respectively on the PIMA diabetes dataset Further, the efficiency of the proposed methodology has also been compared and analysed with breast cancer dataset The proposed ensemble soft voting classifier has given 9702% accuracy on the breast cancer dataset

141 citations


Posted ContentDOI
TL;DR: This paper aggregates some of the literature on missing data particularly focusing on machine learning techniques, and gives insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for.
Abstract: Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

138 citations


Journal ArticleDOI
TL;DR: From the first empirical results, it is concluded that GRF can be more predictive when an appropriate spatial scale is selected to model the data, with reduced residual autocorrelation and lower Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) values.
Abstract: Machine learning algorithms such as Random Forest (RF) are being increasingly applied on traditionally geographical topics such as population estimation. Even though RF is a well performing and gen...

130 citations


Journal ArticleDOI
01 Jan 2021
TL;DR: CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score, and the study outcomes demonstrate that the model’s performance varies depending on the data scaling method.
Abstract: Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.

128 citations


Journal ArticleDOI
TL;DR: 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML–AIDS of networks and computers are applied and the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models are evaluated.
Abstract: An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS) Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML–AIDS of networks and computers These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results

Proceedings ArticleDOI
M. Kavitha1, G. Gnaneswar1, R. Dinesh1, Y. Rohith Sai1, R. Sai Suraj1 
20 Jan 2021
TL;DR: In this article, a hybrid model of decision tree and random forest was used to predict heart disease in the Cleveland heart disease dataset, which achieved an accuracy of 88.7% through the prediction model with the hybrid model.
Abstract: Heart disease causes a significant mortality rate around the world, and it has become a health threat for many people. Early prediction of heart disease may save many lives; detecting cardiovascular diseases like heart attacks, coronary artery diseases etc., is a critical challenge by the regular clinical data analysis. Machine learning (ML) can bring an effective solution for decision making and accurate predictions. The medical industry is showing enormous development in using machine learning techniques. In the proposed work, a novel machine learning approach is proposed to predict heart disease. The proposed study used the Cleveland heart disease dataset, and data mining techniques such as regression and classification are used. Machine learning techniques Random Forest and Decision Tree are applied. The novel technique of the machine learning model is designed. In implementation, 3 machine learning algorithms are used, they are 1. Random Forest, 2. Decision Tree and 3. Hybrid model (Hybrid of random forest and decision tree). Experimental results show an accuracy level of 88.7% through the heart disease prediction model with the hybrid model. The interface is designed to get the user's input parameter to predict the heart disease, for which we used a hybrid model of Decision Tree and Random Forest.

Journal ArticleDOI
TL;DR: It is shown that the ensemble machine learnning models are significantly superior to mechanics-driven models in both predicting accuracy and discrepancy.

Journal ArticleDOI
TL;DR: Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.
Abstract: Due to the rapidly increasing demand for groundwater, as one of the principal freshwater resources, there is an urge to advance novel prediction systems to more accurately estimate the groundwater potential for an informed groundwater resource management. Ensemble machine learning methods are generally reported to produce more accurate results. However, proposing the novel ensemble models along with running comparative studies for performance evaluation of these models would be equally essential to precisely identify the suitable methods. Thus, the current study is designed to provide knowledge on the performance of the four ensemble models i.e., Boosted generalized additive model (GamBoost), adaptive Boosting classification trees (AdaBoost), Bagged classification and regression trees (Bagged CART), and random forest (RF). To build the models, 339 groundwater resources’ locations and the spatial groundwater potential conditioning factors were used. Thereafter, the recursive feature elimination (RFE) method was applied to identify the key features. The RFE specified that the best number of features for groundwater potential modeling was 12 variables among 15 (with a mean Accuracy of about 0.84). The modeling results indicated that the Bagging models (i.e., RF and Bagged CART) had a higher performance than the Boosting models (i.e., AdaBoost and GamBoost). Overall, the RF model outperformed the other models (with accuracy = 0.86, Kappa = 0.67, Precision = 0.85, and Recall = 0.91). Also, the topographic position index’s predictive variables, valley depth, drainage density, elevation, and distance from stream had the highest contribution in the modeling process. Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.

Journal ArticleDOI
TL;DR: The artificial intelligence has been used with Naive Bayes classification and random forest classification algorithm to classify many disease datasets like diabetes, heart disease, and cancer to check whether the patient is affected by that disease or not.
Abstract: Healthcare practices include collecting all kinds of patient data which would help the doctor correctly diagnose the health condition of the patient. These data could be simple symptoms observed by the subject, initial diagnosis by a physician or a detailed test result from a laboratory. Thus, these data are only utilized for analysis by a doctor who then ascertains the disease using his/her personal medical expertise. The artificial intelligence has been used with Naive Bayes classification and random forest classification algorithm to classify many disease datasets like diabetes, heart disease, and cancer to check whether the patient is affected by that disease or not. A performance analysis of the disease data for both algorithms is calculated and compared. The results of the simulations show the effectiveness of the classification techniques on a dataset, as well as the nature and complexity of the dataset used.

Journal ArticleDOI
TL;DR: Machine learning classification algorithms reach about 55–65% predictive accuracy on average at the daily or minute level frequencies, while the support vector machines demonstrate the best and consistent results in terms of predictive accuracy compared to the logistic regression, artificial neural networks and random forest classification algorithms.
Abstract: In this study, the predictability of the most liquid twelve cryptocurrencies are analyzed at the daily and minute level frequencies using the machine learning classification algorithms including the support vector machines, logistic regression, artificial neural networks, and random forests with the past price information and technical indicators as model features. The average classification accuracy of four algorithms are consistently all above the 50% threshold for all cryptocurrencies and for all the timescales showing that there exists predictability of trends in prices to a certain degree in the cryptocurrency markets. Machine learning classification algorithms reach about 55–65% predictive accuracy on average at the daily or minute level frequencies, while the support vector machines demonstrate the best and consistent results in terms of predictive accuracy compared to the logistic regression, artificial neural networks and random forest classification algorithms.

Journal ArticleDOI
TL;DR: It is found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT.
Abstract: Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naive Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.

Journal ArticleDOI
TL;DR: In the wake of COVID-19 disease, caused by the SARS-CoV-2 virus, a predictive model based on Artificial Intelligence (AI) and Machine Learning (ML) algorithms to determine the health risk and predict the mortality risk of patients with CoV-19 was developed in this paper.

Journal ArticleDOI
19 Mar 2021
TL;DR: An overview of the machine learning algorithms that are applied for the identification and prediction of many diseases such as Naïve Bayes, logistic regression, support vector machine, K-nearest neighbor,K-means clustering, decision tree, and random forest are given.
Abstract: Nowadays, machine learning algorithms have become very important in the medical sector, especially for diagnosing disease from the medical database. Many companies using these techniques for the early prediction of diseases and enhance medical diagnostics. The motivation of this paper is to give an overview of the machine learning algorithms that are applied for the identification and prediction of many diseases such as Naive Bayes, logistic regression, support vector machine, K-nearest neighbor, K-means clustering, decision tree, and random forest. In this work, many previous studies were reviewed that used machine learning algorithms for detecting various diseases in the medical area in the last three years. A comparison is provided concerning these algorithms, assessment processes, and the obtained results. Finally, a discussion of the previous works is presented.

Journal ArticleDOI
TL;DR: A comparative analysis view among various feature descriptors algorithms and classification models for 2D object recognition reveals that a hybridization of SIFT, SURF and ORB method with Random Forest classification model accomplishes the best results as compared to other state-of-the-art work.
Abstract: Object recognition is a key research area in the field of image processing and computer vision, which recognizes the object in an image and provides a proper label. In the paper, three popular feature descriptor algorithms that are Scale Invariant Feature Transform (SIFT), Speeded Up Robust Feature (SURF) and Oriented Fast and Rotated BRIEF (ORB) are used for experimental work of an object recognition system. A comparison among these three descriptors is exhibited in the paper by determining them individually and with different combinations of these three methodologies. The amount of the features extracted using these feature extraction methods are further reduced using a feature selection (k-means clustering) and a dimensionality reduction method (Locality Preserving Projection). Various classifiers i.e. K-Nearest Neighbor, Naive Bayes, Decision Tree, and Random Forest are used to classify objects based on their similarity. The focus of this article is to present a study of the performance comparison among these three feature extraction methods, particularly when their combination derives in recognizing the object more efficiently. In this paper, the authors have presented a comparative analysis view among various feature descriptors algorithms and classification models for 2D object recognition. The Caltech-101 public dataset is considered in this article for experimental work. The experiment reveals that a hybridization of SIFT, SURF and ORB method with Random Forest classification model accomplishes the best results as compared to other state-of-the-art work. The comparative analysis has been presented in terms of recognition accuracy, True Positive Rate (TPR), False Positive Rate (FPR), and Area Under Curve (AUC) parameters.

Journal ArticleDOI
01 Jan 2021
TL;DR: This research gathered data from the microblogging website Twitter concerning farmers’ protest to understand the sentiments that the public shared on an international level and used models to categorize and analyze the sentiments based on a collection of around 20,000 tweets on the protest.
Abstract: Protests are an integral part of democracy and an important source for citizens to convey their demands and/or dissatisfaction to the government. As citizens become more aware of their rights, there has been an increasing number of protests all over the world for various reasons. With the advancement of technology, there has also been an exponential rise in the use of social media to exchange information and ideas. In this research, we gathered data from the microblogging website Twitter concerning farmers’ protest to understand the sentiments that the public shared on an international level. We used models to categorize and analyze the sentiments based on a collection of around 20,000 tweets on the protest. We conducted our analysis using Bag of Words and TF-IDF and discovered that Bag of Words performed better than TF-IDF. In addition, we also used Naive Bayes, Decision Trees, Random Forests, and Support Vector Machines and also discovered that Random Forest had the highest classification accuracy.

Journal ArticleDOI
TL;DR: In this paper, a convolutional neural network was developed focusing on the simplicity of the model to extract deep and high-level features from X-ray images of patients infected with COVID-19.

Journal ArticleDOI
TL;DR: The results revealed that random forest (RF) classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve, lower value of mean absolute error, and higher value of Kappa index.
Abstract: Hazards and disasters have always negative impacts on the way of life. Landslide is an overwhelming natural as well as man-made disaster that causes loss of natural resources and human properties throughout the world. The present study aimed to assess and compare the prediction efficiency of different models in landslide susceptibility in the Kysuca river basin, Slovakia. In this regard, the fuzzy decision-making trial and evaluation laboratory combining with the analytic network process (FDEMATEL-ANP), Naive Bayes (NB) classifier, and random forest (RF) classifier were considered. Initially, a landslide inventory map was produced with 2000 landslide and non-landslide points by randomly divided with a ratio of 70%:30% for training and testing, respectively. The geospatial database for assessing the landslide susceptibility was generated with the help of 16 landslide conditioning factors by allowing for topographical, hydrological, lithological, and land cover factors. The ReliefF method was considered for determining the significance of selected conditioning factors and inclusion in the model building. Consequently, the landslide susceptibility maps (LSMs) were generated using the FDEMATEL-ANP, Naive Bayes (NB) classifier, and random forest (RF) classifier models. Finally, the area under curve (AUC) and different arithmetic evaluation were used for validating and comparing the results and models. The results revealed that random forest (RF) classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve (AUC = 0.954), lower value of mean absolute error (MAE = 0.1238) and root mean square error (RMSE = 0.2555), and higher value of Kappa index (K = 0.8435) and overall accuracy (OAC = 92.2%).

Journal ArticleDOI
TL;DR: The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification.

Journal ArticleDOI
TL;DR: The proposed ensemble model, which consists of two advance base models, namely extreme gradient boosting forest and deep neural networks (XGBF-DNN), is proposed for hourly global horizontal irradiance forecast and exhibits the best combination of stability and prediction accuracy irrespective of seasonal variations in weather conditions.

Journal ArticleDOI
TL;DR: An extensive analysis based on a practical dataset of 5000 customers reveals that bagging models outperform other algorithms and the precision analysis shows that the proposed bagging methods perform better.

Journal ArticleDOI
TL;DR: The idea is to use a random forests methodology as an efficient non-parametric approach for building meta-models that allow an efficient sensitivity analysis, and an adequate set of tools for quantifying variable importance are reviewed.

Journal ArticleDOI
TL;DR: This paper provides a comprehensive overview of Machine Learning applications used in EEG analysis and gives an overview of each of the methods and general applications that each is best suited to.
Abstract: Electroencephalography (EEG) has been a staple method for identifying certain health conditions in patients since its discovery. Due to the many different types of classifiers available to use, the analysis methods are also equally numerous. In this review, we will be examining specifically machine learning methods that have been developed for EEG analysis with bioengineering applications. We reviewed literature from 1988 to 2018 to capture previous and current classification methods for EEG in multiple applications. From this information, we are able to determine the overall effectiveness of each machine learning method as well as the key characteristics. We have found that all the primary methods used in machine learning have been applied in some form in EEG classification. This ranges from Naive-Bayes to Decision Tree/Random Forest, to Support Vector Machine (SVM). Supervised learning methods are on average of higher accuracy than their unsupervised counterparts. This includes SVM and KNN. While each of the methods individually is limited in their accuracy in their respective applications, there is hope that the combination of methods when implemented properly has a higher overall classification accuracy. This paper provides a comprehensive overview of Machine Learning applications used in EEG analysis. It also gives an overview of each of the methods and general applications that each is best suited to.

Journal ArticleDOI
TL;DR: 3D-CNNs were more efficient in distinguishing coniferous species from each other, with a concurrent high accuracy for aspen classification, which can benefit both sustainable forestry and biodiversity conservation.

Journal ArticleDOI
TL;DR: An ensemble learning-based lightweight technique called Extremely Randomized Trees or Extra-Trees-based detection scheme has the ability of robustness towards signal noise and strong reduction of bias and variance error and the performances were compared with those of the state-of-the-art machine learning algorithms.