scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Using Ensemble Learning and Association Rules to Help Car Buyers Make Informed Choices

10 Nov 2016-pp 8
TL;DR: Bagging, boosting and voting ensemble learning have been used to improve the precision rate i.e. accuracy of classification and class association rules are performed to see if it performs better than collaborative filtering for suggesting item to the user.
Abstract: Cars are an essential part of our everyday life. Nowadays we have a wide plethora of cars produced by a number of companies in all segments. The buyer has to consider a lot of factors while buying a car which makes the whole process a lot more difficult. So in this paper we have developed a method of ensemble learning to aid people in making the decision. Bagging, boosting and voting ensemble learning have been used to improve the precision rate i.e. accuracy of classification. Also we have performed class association rules to see if it performs better than collaborative filtering for suggesting item to the user.
Citations
More filters
Journal ArticleDOI
TL;DR: This study found that the classification model from SVM algorithm provided the best result with 86.45% accuracy to correctly classify ‘Eligible’ status of candidates, while RT was the weakest model with the lowest accuracy rate for this purpose.
Abstract: Scholarship is a financial facility given to eligible students to extend Higher Education. Limited funding sources with the growing number of applicants force the Government to find solutions to help speed up and facilitate the selection of eligible students and then adopt a systematic approach for this purpose. In this study, a data mining approach was used to propose a classification model of scholarship award result determination. A dataset of successful and unsuccessful applicants was taken and processed as training data and testing data used in the modelling process. Five algorithms were employed to develop a classification model in determining the award of the scholarship, namely J48, SVM, NB, ANN and RT algorithms. Each model was evaluated using technical evaluation metric , such contingency table metrics, accuracy, precision , and recall measures. As a result, the best models were classified into two different categories: The best model classified for ‘Eligible’ status, and the best model classified for ‘Not Eligible’ status. The knowledge obtained from the rules-based model was evaluated through knowledge analysis conducted by technical and domain experts. This study found that the classification model from SVM algorithm provided the best result with 86.45% accuracy to correctly classify ‘Eligible’ status of candidates, while RT was the weakest model with the lowest accuracy rate of for this purpose, with only 82.9% accuracy. The model that had the highest accuracy rate for ‘Not Eligible’ status of scholarship offered was NB model, whereas SVM model was the weakest model to classify ‘Not Eligible’ status. In addition, the knowledge analysis of the decision tree model was also made and found that some new information derived from the acquisition of this research information may help the stakeholders in making new policies and scholarship programmes in the future.

5 citations

Journal ArticleDOI
TL;DR: An ensemble knowledge model is proposed to support the scholarship award decision made by the organization and generates list of eligible candidates to reduce human error and time taken to select the eligible candidate manually.
Abstract: The role of higher learning in Malaysia is to ensure high quality educational ecosystems in developing individual potentials to fulfill the national aspiration. To implement this role with success, scholarship offer is an important part of strategic plan. Since the increasing number of undergraduates’ student every year, the government must consider to apply a systematic strategy to manage the scholarship offering to ensure the scholarship recipient must be selected in effective way. The use of predictive model has shown effective can be made. In this paper, an ensemble knowledge model is proposed to support the scholarship award decision made by the organization. It generates list of eligible candidates to reduce human error and time taken to select the eligible candidate manually. Two approached of ensemble are presented. Firstly, ensembles of model and secondly ensembles of rule-based knowledge. The ensemble learning techniques, namely, boosting, bagging, voting and rules-based ensemble technique and five base learners’ algorithm, namely, J48, Support Vector Machine (SVM), Artificial Neuron Network (ANN), Naive Bayes (NB) and Random Tree (RT) are used to develop the model. Total of 87,000 scholarship application data are used in modelling process. The result on accuracy, precision, recall and F-measure measurement shows that the ensemble voting techniques gives the best accuracy of 86.9% compare to others techniques. This study also explores the rules obtained from the rules-based model J48 and Apriori and managed to select the best rules to develop an ensemble rules-based models which is improved the study for classification model for scholarship award.

1 citations


Cites methods from "Using Ensemble Learning and Associa..."

  • ...Based on the review by [17] the single-core tree algorithm and decision tree will produce different tree outputs....

    [...]

References
More filters
Journal ArticleDOI
01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

16,118 citations

Journal ArticleDOI
TL;DR: Simulation studies show that the performance of the combining techniques is strongly affected by the small sample size properties of the base classifier: boosting is useful for large training sample sizes, while bagging and the random subspace method are useful for criticalTraining sample sizes.
Abstract: Recently bagging, boosting and the random subspace method have become popular combining techniques for improving weak classifiers. These techniques are designed for, and usually applied to, decision trees. In this paper, in contrast to a common opinion, we demonstrate that they may also be useful in linear discriminant analysis. Simulation studies, carried out for several artificial and real data sets, show that the performance of the combining techniques is strongly affected by the small sample size properties of the base classifier: boosting is useful for large training sample sizes, while bagging and the random subspace method are useful for critical training sample sizes. Finally, a table describing the possible usefulness of the combining techniques for linear classifiers is presented.

449 citations

01 Jan 2011
TL;DR: Results show that Naive Bayes is the best classifiers against several common classifiers (such as decision tree, neural network, and support vector machines) in term of accuracy and computational efficiency.
Abstract: Document classification is a growing interest in the research of text mining. Correctly identifying the documents into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the existing classifying approaches, Naive Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this paper is to highlight the performance of employing Naive Bayes in document classification. Results show that Naive Bayes is the best classifiers against several common classifiers (such as decision tree, neural network, and support vector machines) in term of accuracy and computational efficiency.

140 citations

Posted Content
TL;DR: This paper conducts a study comparing several collaborative ltering techniques, both classic and recent state-of-the-art, in a variety of experimental contexts to identify what algorithms work well and in what conditions.
Abstract: Collaborative ltering is a rapidly advancing research area. Every year several new techniques are proposed and yet it is not clear which of the techniques work best and under what conditions. In this paper we conduct a study comparing several collaborative ltering techniques { both classic and recent state-of-the-art { in a variety of experimental contexts. Specically, we report conclusions controlling for number of items, number of users, sparsity level, performance criteria, and computational complexity. Our conclusions identify what algorithms work well and in what conditions, and contribute to both industrial deployment collaborative ltering algorithms and to the research community.

130 citations

Proceedings Article
08 Jul 1997
TL;DR: A new machine learning method is presented that, given a set of training examples, induces a definition of the target concept in terms of a hierarchy of intermediate concepts and their definitions, which effectively decomposes the problem into smaller, less complex problems.
Abstract: We present a new machine learning method that, given a set of training examples, induces a definition of the target concept in terms of a hierarchy of intermediate concepts and their definitions. This effectively decomposes the problem into smaller, less complex problems. The method is inspired by the Boolean function decomposition approach to the design of digital circuits. To cope with high time complexity of finding an optimal decomposition, we propose a suboptimal heuristic algorithm. The method, implemented in program HINT (HIerarchy Induction Tool), is experimentally evaluated using a set of artificial and real-world learning problems. It is shown that the method performs well both in terms of classification accuracy and discovery of meaningful concept hierarchies.

85 citations


"Using Ensemble Learning and Associa..." refers methods in this paper

  • ...Data Mining [3], [10] is the process of discovering patterns in large datasets by using methods from various fields of interest....

    [...]

  • ...In [10] proper use of car evaluation dataset is demonstrated by developing a new machine learning method....

    [...]