scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Literature Review on Supervised Machine Learning Algorithms and Boosting Process

17 Jul 2017-International Journal of Computer Applications (Foundation of Computer Science (FCS), NY, USA)-Vol. 169, Iss: 8, pp 32-35
TL;DR: From this survey research it is learnt that connecting supervised machine learning algorithm with boosting process increased prediction efficiency and there is a wide scope in this research element.
Abstract: Data mining is one amid the core research areas in the field of computer science. Yet there is a knowledge data detection process helps the data mining to extract hidden information from the dataset there is a big scope of machine learning algorithms. Especially supervised machine learning algorithms gain extensive importance in data mining research. Boosting action is regularly helps the supervised machine learning algorithms for rising the predictive / classification veracity. This survey research article prefer two famous supervised machine learning algorithms that is decision trees and support vector machine and presented the recent research works carried out. Also recent improvement on Adaboost algorithms (boosting process) is also granted. From this survey research it is learnt that connecting supervised machine learning algorithm with boosting process increased prediction efficiency and there is a wide scope in this research element.

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI
01 Jan 2020
TL;DR: A systematic review of scholarly articles published between 2015 and 2018 addressing or implementing supervised and unsupervised machine learning techniques in different problem-solving paradigms revealed decision tree, support vector machine, and Naive Bayes algorithms appeared to be the most cited, discussed, and implemented supervised learners.
Abstract: Machine learning is as growing as fast as concepts such as Big data and the field of data science in general. The purpose of the systematic review was to analyze scholarly articles that were published between 2015 and 2018 addressing or implementing supervised and unsupervised machine learning techniques in different problem-solving paradigms. Using the elements of PRISMA, the review process identified 84 scholarly articles that had been published in different journals. Of the 84 articles, 6 were published before 2015 despite their metadata indicating that they were published in 2015. The existence of the six articles in the final papers was attributed to errors in indexing. Nonetheless, from the reviewed papers, decision tree, support vector machine, and Naive Bayes algorithms appeared to be the most cited, discussed, and implemented supervised learners. Conversely, k-means, hierarchical clustering, and principal component analysis also emerged as the commonly used unsupervised learners. The review also revealed other commonly used algorithms that include ensembles and reinforce learners, and future systematic reviews can focus on them because of the developments that machine learning and data science is undergoing at the moment.

206 citations

Journal ArticleDOI
01 Mar 2021
TL;DR: This article highlights well-known ML algorithms for classification and prediction and demonstrates how they have been used in the healthcare sector and provides some examples of IoT and machine learning to predict future healthcare system trends.
Abstract: Machine learning (ML) is a powerful tool that delivers insights hidden in Internet of Things (IoT) data. These hybrid technologies work smartly to improve the decision-making process in different areas such as education, security, business, and the healthcare industry. ML empowers the IoT to demystify hidden patterns in bulk data for optimal prediction and recommendation systems. Healthcare has embraced IoT and ML so that automated machines make medical records, predict disease diagnoses, and, most importantly, conduct real-time monitoring of patients. Individual ML algorithms perform differently on different datasets. Due to the predictive results varying, this might impact the overall results. The variation in prediction results looms large in the clinical decision-making process. Therefore, it is essential to understand the different ML algorithms used to handle IoT data in the healthcare sector. This article highlights well-known ML algorithms for classification and prediction and demonstrates how they have been used in the healthcare sector. The aim of this paper is to present a comprehensive overview of existing ML approaches and their application in IoT medical data. In a thorough analysis, we observe that different ML prediction algorithms have various shortcomings. Depending on the type of IoT dataset, we need to choose an optimal method to predict critical healthcare data. The paper also provides some examples of IoT and machine learning to predict future healthcare system trends.

60 citations


Cites background from "A Literature Review on Supervised M..."

  • ...Despite the evident effectiveness of supervised learning, it has the drawback that it requires numerous labelled data to develop a large-scale labelled dataset [24]....

    [...]

Journal ArticleDOI
TL;DR: In this study, a contactless procedure for drivers’ stress state assessment by means of thermal infrared imaging was investigated and the predicted SI showed a good correlation with the real SI.
Abstract: Traffic accidents determine a large number of injuries, sometimes fatal, every year. Among other factors affecting a driver’s performance, an important role is played by stress which can decrease decision-making capabilities and situational awareness. In this perspective, it would be beneficial to develop a non-invasive driver stress monitoring system able to recognize the driver’s altered state. In this study, a contactless procedure for drivers’ stress state assessment by means of thermal infrared imaging was investigated. Thermal imaging was acquired during an experiment on a driving simulator, and thermal features of stress were investigated with comparison to a gold-standard metric (i.e., the stress index, SI) extracted from contact electrocardiography (ECG). A data-driven multivariate machine learning approach based on a non-linear support vector regression (SVR) was employed to estimate the SI through thermal features extracted from facial regions of interest (i.e., nose tip, nostrils, glabella). The predicted SI showed a good correlation with the real SI (r = 0.61, p = ~0). A two-level classification of the stress state (STRESS, SI ≥ 150, versus NO STRESS, SI < 150) was then performed based on the predicted SI. The ROC analysis showed a good classification performance with an AUC of 0.80, a sensitivity of 77%, and a specificity of 78%.

35 citations

Journal ArticleDOI
TL;DR: A software bug prediction model is proposed which uses machine learning classifiers in conjunction with the Artificial Immune Network (AIN) to improve bug prediction accuracy through its hyper-parameter optimization.
Abstract: Software testing is an important task in software development activities, and it requires most of the resources, namely, time, cost and effort. To minimize this fatigue, software bug prediction (SBP) models are applied to improve the software quality assurance (SQA) processes by predicting buggy components. The bug prediction models use machine learning classifiers so that bugs can be predicted in software components in some software metrics. These classifiers are characterized by some configurable parameters, called hyper-parameters that need to be optimized to ensure better performance. Many methods have been proposed by researchers to predict the defective components but these classifiers sometimes not perform well when default settings are used for machine learning classifiers. In this paper, software bug prediction model is proposed which uses machine learning classifiers in conjunction with the Artificial Immune Network (AIN) to improve bug prediction accuracy through its hyper-parameter optimization. For this purpose, seven machine learning classifiers, such as support vector machine Radial base function (SVM-RBF), K-nearest neighbor (KNN) (Minkowski metric), KNN (Euclidean metric), Naive Bayes (NB), Decision Tree (DT), Linear discriminate analysis (LDA), Random forest (RF) and adaptive boosting (AdaBoost), were used. The experiment was carried out on bug prediction dataset. The results showed that hyper-parameter optimization of machine learning classifiers, using AIN and its applications for software bug prediction, performed better than when classifiers with their default hyper-parameters were used.

31 citations

Book ChapterDOI
01 Jan 2019
TL;DR: This paper indicates that the timeframe utilized to formulate precision and model is a different factor from the mean absolute error and the kappa statistic in obtaining predictive SML.
Abstract: Supervised Machine Learning (SML) is a critical analysis of algorithms, which conform to exterior abounding instances that determine the overall hypotheses and future instance predictions. In intelligent systems, supervised categorization is a critical aspect of machine learning. This article comprehends on fundamental SML techniques, through the comparison of different SML algorithms and determination of the most crucial supervised classification algorithm. These techniques: Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), JRip, Neural Networks, and the Decision tree, through the application of the Waikato Environmental of Knowledge Networks (WEKA) as a machine learning application. Considering the implementation of algorithm, the dataset on Diabetes was utilized during the classification process considering 789 instances composing of 8 attributes as a dependent variable and another as an independent variable in the analysis. Considering the discussion and the results, it was evident that SVM is precise and accurate, termed as an algorithm (Zhang N, Int. J. Collab. Intell. 1:298, 2016). The Random Forest and Naive Bayes categorizing algorithm was denoted to be precise subsequent to the SVM. This paper indicates that the timeframe utilized to formulate precision and model is a different factor from the mean absolute error and the kappa statistic. Resultantly, the machine learning algorithm necessitate accuracy, precision, and minimal error in obtaining predictive SML.

22 citations

References
More filters
Journal ArticleDOI
TL;DR: The results show that the proposed algorithm is the best performing method for all databases, and when compared against a standard decision tree, the method builds significantly smaller trees in only a fifth of the time, while having a superior performance measured by cost savings.
Abstract: Example-dependent cost-sensitive tree algorithm.Each example is assumed to have different financial cost.Application on credit card fraud detection, credit scoring and direct marketing.Focus on maximizing the financial savings instead of accuracy.Code is open source and available at albahnsen.com/CostSensitiveClassification. Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. State-of-the-art example-dependent cost-sensitive techniques only introduce the cost to the algorithm, either before or after training, therefore, leaving opportunities to investigate the potential impact of algorithms that take into account the real financial example-dependent costs during an algorithm training. In this paper, we propose an example-dependent cost-sensitive decision tree algorithm, by incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Then, using three different databases, from three real-world applications: credit card fraud detection, credit scoring and direct marketing, we evaluate the proposed method. The results show that the proposed algorithm is the best performing method for all databases. Furthermore, when compared against a standard decision tree, our method builds significantly smaller trees in only a fifth of the time, while having a superior performance measured by cost savings, leading to a method that not only has more business-oriented results, but also a method that creates simpler models that are easier to analyze.

164 citations


"A Literature Review on Supervised M..." refers methods in this paper

  • ...2015 [2] proposed an example-reliant costsensitive decision tree algorithm, by incorporating the different example-reliant costs into a new cost-based impurity measure and new cost-based pruning criteria....

    [...]

  • ...Bahnsen et al. 2015 [2] proposed an example-reliant costsensitive decision tree algorithm, by incorporating the different example-reliant costs into a new cost-based impurity measure and new cost-based pruning criteria....

    [...]

Journal ArticleDOI
TL;DR: Results using live data show the fuzzy models have increased the predictive accuracy of OSCAR-CITS across four learning style dimensions and facilitated the discovery of some interesting relationships amongst behaviour variables.
Abstract: Intelligent Tutoring Systems personalise learning for students with different backgrounds, abilities, behaviours and knowledge. One way to personalise learning is through consideration of individual differences in preferred learning style. OSCAR is the name of a Conversational Intelligent Tutoring System that models a person's learning style using natural language dialogue during tutoring in order to dynamically predict, and personalise, their tutoring session. Prediction of learning style is undertaken by capturing independent behaviour variables during the tutoring conversation with the highest value variable determining the student's learning style. A weakness of this approach is that it does not take into consideration the interactions between behaviour variables and, due to the uncertainty inherently present in modelling learning styles, small differences in behaviour can lead to incorrect predictions. Consequently, the learner is presented with tutoring material not suited to their learning style. This paper proposes a new method that uses fuzzy decision trees to build a series of fuzzy predictive models combining these variables for all dimensions of the Felder Silverman Learning Styles model. Results using live data show the fuzzy models have increased the predictive accuracy of OSCAR-CITS across four learning style dimensions and facilitated the discovery of some interesting relationships amongst behaviour variables.

77 citations

Journal ArticleDOI
TL;DR: A robust multi-class AdaBoost algorithm (Rob_MulAda) is proposed whose key ingredients consist in a noise-detection based multi- class loss function and a new weight updating scheme that is more robust to mislabeled noises than that of ND_AdaBoost in both two-class and multi- Class scenarios.
Abstract: AdaBoost has been theoretically and empirically proved to be a very successful ensemble learning algorithm, which iteratively generates a set of diverse weak learners and combines their outputs using the weighted majority voting rule as the final decision. However, in some cases, AdaBoost leads to overfitting especially for mislabeled noisy training examples, resulting in both its degraded generalization performance and non-robustness. Recently, a representative approach named noise-detection based AdaBoost (ND_AdaBoost) has been proposed to improve the robustness of AdaBoost in the two-class classification scenario, however, in the multi-class scenario, this approach can hardly achieve satisfactory performance due to the following three reasons. (1) If we decompose a multi-class classification problem using such strategies as one-versus-all or one-versus-one, the obtained two-class problems usually have imbalanced training sets, which negatively influences the performance of ND_AdaBoost. (2) If we directly apply ND_AdaBoost to the multi-class classification scenario, its two-class loss function is no longer applicable and its accuracy requirement for the (weak) base classifiers, i.e., greater than 0.5, is too strong to be almost satisfied. (3) ND_AdaBoost still has the tendency of overfitting as it increases the weights of correctly classified noisy examples, which could make it focus on learning these noisy examples in the subsequent iterations. To solve the dilemma, in this paper, we propose a robust multi-class AdaBoost algorithm (Rob_MulAda) whose key ingredients consist in a noise-detection based multi-class loss function and a new weight updating scheme. Experimental study indicates that our newly-proposed weight updating scheme is indeed more robust to mislabeled noises than that of ND_AdaBoost in both two-class and multi-class scenarios. In addition, through the comparison experiments, we also verify the effectiveness of Rob_MulAda and provide a suggestion in choosing the most appropriate noise-alleviating approach according to the concrete noise level in practical applications.

63 citations


"A Literature Review on Supervised M..." refers methods in this paper

  • ...Sun et al., 2016 [10] quoted a representative approach named noise-detection based AdaBoost (ND_AdaBoost) in order to improve the robustness of AdaBoost in the two-class classification scenario....

    [...]

  • ..., 2016 [10] quoted a representative approach named noise-detection based AdaBoost (ND_AdaBoost) in order to improve the robustness of AdaBoost in the two-class classification scenario....

    [...]

Journal ArticleDOI
TL;DR: A boosting-based method of learning a feed-forward artificial neural network (ANN) with a single layer of hidden neurons and a single output neuron is presented and a novel method is introduced to incorporate non-linear activation functions in artificial Neural network learning.

58 citations


"A Literature Review on Supervised M..." refers methods in this paper

  • ..., 2017 [11] presented a boosting-based method of learning a feed-forward artificial neural network (ANN) with a single layer of hidden neurons and a single output neuron....

    [...]

  • ...Baig et al., 2017 [11] presented a boosting-based method of learning a feed-forward artificial neural network (ANN) with a single layer of hidden neurons and a single output neuron....

    [...]

Journal ArticleDOI
TL;DR: Experimental results show that SNPSVM is superior to the other current algorithms based on structural information of data in both computation time and classification accuracy.

31 citations


"A Literature Review on Supervised M..." refers background or methods in this paper

  • ...2016 [6], designed a new structural nonparallel support vector machine (called SNPSVM)....

    [...]

  • ...Connecting the structural information with nonparallel support vector machine (NPSVM), D. Chen et al. 2016 [6], designed a new structural nonparallel support vector machine (called SNPSVM)....

    [...]