scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Using Ensemble Learning and Association Rules to Help Car Buyers Make Informed Choices

10 Nov 2016-pp 8
TL;DR: Bagging, boosting and voting ensemble learning have been used to improve the precision rate i.e. accuracy of classification and class association rules are performed to see if it performs better than collaborative filtering for suggesting item to the user.
Abstract: Cars are an essential part of our everyday life. Nowadays we have a wide plethora of cars produced by a number of companies in all segments. The buyer has to consider a lot of factors while buying a car which makes the whole process a lot more difficult. So in this paper we have developed a method of ensemble learning to aid people in making the decision. Bagging, boosting and voting ensemble learning have been used to improve the precision rate i.e. accuracy of classification. Also we have performed class association rules to see if it performs better than collaborative filtering for suggesting item to the user.
Citations
More filters
Journal ArticleDOI
TL;DR: This study found that the classification model from SVM algorithm provided the best result with 86.45% accuracy to correctly classify ‘Eligible’ status of candidates, while RT was the weakest model with the lowest accuracy rate for this purpose.
Abstract: Scholarship is a financial facility given to eligible students to extend Higher Education. Limited funding sources with the growing number of applicants force the Government to find solutions to help speed up and facilitate the selection of eligible students and then adopt a systematic approach for this purpose. In this study, a data mining approach was used to propose a classification model of scholarship award result determination. A dataset of successful and unsuccessful applicants was taken and processed as training data and testing data used in the modelling process. Five algorithms were employed to develop a classification model in determining the award of the scholarship, namely J48, SVM, NB, ANN and RT algorithms. Each model was evaluated using technical evaluation metric , such contingency table metrics, accuracy, precision , and recall measures. As a result, the best models were classified into two different categories: The best model classified for ‘Eligible’ status, and the best model classified for ‘Not Eligible’ status. The knowledge obtained from the rules-based model was evaluated through knowledge analysis conducted by technical and domain experts. This study found that the classification model from SVM algorithm provided the best result with 86.45% accuracy to correctly classify ‘Eligible’ status of candidates, while RT was the weakest model with the lowest accuracy rate of for this purpose, with only 82.9% accuracy. The model that had the highest accuracy rate for ‘Not Eligible’ status of scholarship offered was NB model, whereas SVM model was the weakest model to classify ‘Not Eligible’ status. In addition, the knowledge analysis of the decision tree model was also made and found that some new information derived from the acquisition of this research information may help the stakeholders in making new policies and scholarship programmes in the future.

5 citations

Journal ArticleDOI
TL;DR: An ensemble knowledge model is proposed to support the scholarship award decision made by the organization and generates list of eligible candidates to reduce human error and time taken to select the eligible candidate manually.
Abstract: The role of higher learning in Malaysia is to ensure high quality educational ecosystems in developing individual potentials to fulfill the national aspiration. To implement this role with success, scholarship offer is an important part of strategic plan. Since the increasing number of undergraduates’ student every year, the government must consider to apply a systematic strategy to manage the scholarship offering to ensure the scholarship recipient must be selected in effective way. The use of predictive model has shown effective can be made. In this paper, an ensemble knowledge model is proposed to support the scholarship award decision made by the organization. It generates list of eligible candidates to reduce human error and time taken to select the eligible candidate manually. Two approached of ensemble are presented. Firstly, ensembles of model and secondly ensembles of rule-based knowledge. The ensemble learning techniques, namely, boosting, bagging, voting and rules-based ensemble technique and five base learners’ algorithm, namely, J48, Support Vector Machine (SVM), Artificial Neuron Network (ANN), Naive Bayes (NB) and Random Tree (RT) are used to develop the model. Total of 87,000 scholarship application data are used in modelling process. The result on accuracy, precision, recall and F-measure measurement shows that the ensemble voting techniques gives the best accuracy of 86.9% compare to others techniques. This study also explores the rules obtained from the rules-based model J48 and Apriori and managed to select the best rules to develop an ensemble rules-based models which is improved the study for classification model for scholarship award.

1 citations


Cites methods from "Using Ensemble Learning and Associa..."

  • ...Based on the review by [17] the single-core tree algorithm and decision tree will produce different tree outputs....

    [...]

References
More filters
Proceedings Article
07 Aug 2002
TL;DR: This paper shows that active learning methods are often suboptimal and presents a tractable method for incorporating knowledge of the budget in the information acquisition process, and compares methods for sequentially choosing which feature value to purchase next.
Abstract: There is almost always a cost associated with acquiring training data. We consider the situation where the learner, with a fixed budget, may 'purchase' data during training. In particular, we examine the case where observing the value of a feature of a training example has an associated cost, and the total cost of all feature values acquired during training must remain less than this fixed budget. This paper compares methods for sequentially choosing which feature value to purchase next, given the budget and user's current knowledge of Naive Bayes model parameters. Whereas active learning has traditionally focused on myopic (greedy) approaches and uniform/round-robin policies for query selection, this paper shows that such methods are often suboptimal and presents a tractable method for incorporating knowledge of the budget in the information acquisition process.

84 citations

Journal Article
TL;DR: EnsembleSVM as mentioned in this paper is a free software package containing efficient routines to perform ensemble learning with SVM base models, which avoids duplicate storage and evaluation of support vectors which are shared between constituent models.
Abstract: EnsembleSVM is a free software package containing efficient routines to perform ensemble learning with support vector machine (SVM) base models. It currently offers ensemble methods based on binary SVM models. Our implementation avoids duplicate storage and evaluation of support vectors which are shared between constituent models. Experimental results show that using ensemble approaches can drastically reduce training complexity while maintaining high predictive accuracy. The EnsembleSVM software package is freely available online at http://esat.kuleuven.be/stadius/ensemblesvm.

82 citations

Posted Content
TL;DR: Whereas active learning has traditionally focused on myopic strategies FOR query selection, this paper presents a tractable method for incorporating knowledge of the budget INTO the decision making process, which improves performance.
Abstract: Frequently, acquiring training data has an associated cost. We consider the situation where the learner may purchase data during training, subject TO a budget. IN particular, we examine the CASE WHERE each feature label has an associated cost, AND the total cost OF ALL feature labels acquired during training must NOT exceed the budget.This paper compares methods FOR choosing which feature label TO purchase next, given the budget AND the CURRENT belief state OF naive Bayes model parameters.Whereas active learning has traditionally focused ON myopic(greedy) strategies FOR query selection, this paper presents a tractable method FOR incorporating knowledge OF the budget INTO the decision making process, which improves performance.

76 citations

Journal Article
TL;DR: The aim of the present paper is to relax the class constraint, and extend the contribution to multiclass problems, showing the benefits that the boosting-derived weighting rule brings to weighted nearest neighbor classifiers.
Abstract: So far, boosting has been used to improve the quality of moderately accurate learning algorithms, by weighting and combining many of their weak hypotheses into a final classifier with theoretically high accuracy. In a recent work (Sebban, Nock and Lallich, 2001), we have attempted to adapt boosting properties to data reduction techniques. In this particular context, the objective was not only to improve the success rate, but also to reduce the time and space complexities due to the storage requirements of some costly learning algorithms, such as nearest-neighbor classifiers. In that framework, each weak hypothesis, which is usually built and weighted from the learning set, is replaced by a single learning instance. The weight given by boosting defines in that case the relevance of the instance, and a statistical test allows one to decide whether it can be discarded without damaging further classification tasks. In Sebban, Nock and Lallich (2001), we addressed problems with two classes. It is the aim of the present paper to relax the class constraint, and extend our contribution to multiclass problems. Beyond data reduction, experimental results are also provided on twenty-three datasets, showing the benefits that our boosting-derived weighting rule brings to weighted nearest neighbor classifiers.

39 citations


"Using Ensemble Learning and Associa..." refers background in this paper

  • ...k classifier by ting [6][7] is n and regressi he training o ifier learns fro om the learnin the reference lighted and the a higher chan er....

    [...]