scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Classification Models for Higher Learning Scholarship Award Decisions

30 Dec 2018-Asia-Pacific Journal of Information Technology and Multimedia (Penerbit Universiti Kebangsaan Malaysia (UKM Press))-Vol. 7, Iss: 2, pp 131-145
TL;DR: This study found that the classification model from SVM algorithm provided the best result with 86.45% accuracy to correctly classify ‘Eligible’ status of candidates, while RT was the weakest model with the lowest accuracy rate for this purpose.
Abstract: Scholarship is a financial facility given to eligible students to extend Higher Education. Limited funding sources with the growing number of applicants force the Government to find solutions to help speed up and facilitate the selection of eligible students and then adopt a systematic approach for this purpose. In this study, a data mining approach was used to propose a classification model of scholarship award result determination. A dataset of successful and unsuccessful applicants was taken and processed as training data and testing data used in the modelling process. Five algorithms were employed to develop a classification model in determining the award of the scholarship, namely J48, SVM, NB, ANN and RT algorithms. Each model was evaluated using technical evaluation metric , such contingency table metrics, accuracy, precision , and recall measures. As a result, the best models were classified into two different categories: The best model classified for ‘Eligible’ status, and the best model classified for ‘Not Eligible’ status. The knowledge obtained from the rules-based model was evaluated through knowledge analysis conducted by technical and domain experts. This study found that the classification model from SVM algorithm provided the best result with 86.45% accuracy to correctly classify ‘Eligible’ status of candidates, while RT was the weakest model with the lowest accuracy rate of for this purpose, with only 82.9% accuracy. The model that had the highest accuracy rate for ‘Not Eligible’ status of scholarship offered was NB model, whereas SVM model was the weakest model to classify ‘Not Eligible’ status. In addition, the knowledge analysis of the decision tree model was also made and found that some new information derived from the acquisition of this research information may help the stakeholders in making new policies and scholarship programmes in the future.
Citations
More filters
Journal ArticleDOI
TL;DR: The efforts in narrowing the gap between low relevancy human text descriptions for Malaysian users and image scene color appearances have been brought into attention and the agreement analysis indicates that the Bright category is the most comprehensible by humans and subsequently followed by the Pastel and Dark categories.
Abstract: Institutions that possess certain collections of digital image libraries, such as museums, are progressively interested in making such collections accessible anytime and anywhere for any Image Retrieval (IR) activities namely browsing and searching. Many researchers have shown that IR methods, in filtering images based on their features such as colors, would provide better indexing and can be able to deliver/provide more accurate results. The color composition of an image, e.g. color histogram has proven to be a powerful feature that can be analyzed and used for image indexing because of its robust standardization of image transformation such as scaling and orientation. In this research, the efforts in narrowing the gap between low relevancy human text descriptions for Malaysian users and image scene color appearances have been brought into attention. The methods are first, to investigate the color concepts and color appearance descriptions of a scene and secondly, to identify a set of ground-truth images for each color appearance category. Psychophysical experiments are conducted to determine a collection of ground-truth images that effectively match five color appearance descriptions for image scenes in accordance with human judgement and perception. The results of the experiments are presented together with the inter-rater agreement analysis. These descriptions that are commonly queried by humans are the following keywords, Bright, Pastel, Dull, Pale, and Dark. The agreement analysis indicates that the Bright category is the most comprehensible by humans and subsequently followed by the Pastel and Dark categories. Dull and Pale categories, on the other hand are fairly understood by humans. All the images involved in this research are landscape painting collections from the internet and they are used for academic purposes only. The results show the top ten ground-truth images for each category that encapsulates a high level of agreeability between humans.

5 citations

Journal ArticleDOI
TL;DR: In this paper, a fine-grained analysis of the academic data is proposed to enhance the credibility of the ranking process through the fine-ground analysis of academic data, and the resulting academic rankings with respect to the Research Faculty, Research Productivity, and Research Impact make the academic ranking process more transparent and finegrained.
Abstract: The academic ranking process has considerably evolved in the past fifteen years and the evolution has gained the momentum in last few years. Starting with the holistic rankings of world universities in 2003, it has crossed the milestone of subject-specific rankings. Nevertheless, the academic rankings published by even the reputed ranking entities are facing various criticism, in terms of their transparency, validity, and coverage. This research effort focuses on enhancing the credibility of the ranking process through the fine-grained analysis of the academic data. The proposed fine-grained analysis drives the researcher’s profiles from the Google Scholar Citations repository. While the DBpedia repository is employed for the information about HEIs and countries. The influential researchers are identified using the ResRank methodology. While, for consistent comparison of the subject-specific rankings of global HEIs, the Grand Average Rank (GAR) metric is employed. The resultant academic rankings with respect to the Research Faculty, Research Productivity, and Research Impact make the ranking process more transparent and fine-grained. The analysis also helps in understanding the causes of differences among the academic rankings published by the ARWU, THE, and QS rankings systems. The growing interest in the subject-specific and sub-discipline-specific rankings is irreversible. The fine-grained analysis is a response to the need.

2 citations

Journal ArticleDOI
TL;DR: In this paper, the association rules mining technique is used to mine the implicit patterns of wound up companies by analyzing relationships between attributes such as total asset, total liability and profit and loss.
Abstract: A company is wound up when the company is unable to pay financial debts or is experiencing serious financial distress. From the year 1998 until 2003, an average of 1166 companies were wound up yearly. This research focuses on the knowledge exploration of wound up companies in Malaysia using association rules mining techniques (quantitative) and the involvement of domain expert in knowledge evaluation (qualitative). Association Rules Mining technique is used to mine the implicit patterns of wound up companies by analyzing relationships between attributes such as total asset, total liability and profit and loss. The human expert functions to verify the significant relation between attributes and the mined patterns. This research succeeded to mine 2 quantitative criteria and 9 qualitative criteria related to wound up companies. The criteria combination can be utilized for evaluating the risk of wound up Malaysian companies in the future.

1 citations

Proceedings ArticleDOI
01 Jan 2018
TL;DR: The simulation results show that the quantitative analysis method proposed can analyze the political instructors' work accurately, and it has high degree of confidence, and improve the ideological and political education quantitative evaluation ability of counselors.
Abstract: The information of Ideological and political instructors' work quantitative evaluation is a set of statistical feature data, in order to improve the ability of quantitative evaluation of Ideological and political instructors, a quantitative analysis method of Ideological and political instructors' work is proposed based on data mining, high dimensional phase space reconstruction method is used for feature data reorganization of political instructors' work in quantitative evaluation. In the reconstruction phase space, fuzzy clustering method is used to classify feature processing quantitative evaluation data. The association rules features of Ideological and political counselors' quantitative evaluation are mined, the adaptive learning algorithm is used for convergence process control, and accurate quantitative analysis of political instructors' work is obtained. The simulation results show that the method can analyze the political instructors' work accurately, and it has high degree of confidence, and improve the ideological and political education quantitative evaluation ability of counselors.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.

2,962 citations

Journal ArticleDOI
TL;DR: A software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures is developed, the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets.
Abstract: Motivation: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. Results: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. Availability: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. Contact: alexander.statnikov@vanderbilt.edu

841 citations


"Classification Models for Higher Le..." refers background in this paper

  • ...It can be seen as a rounded node represented as an artificial neuron that represents the output of one neuron to another input (Statnikov et al. 2005)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a comprehensive review of the artificial neural network (ANN) based model predictive control (MPC) system design is carried out followed by a case study in which ANN models of a residential house located in Ontario, Canada are developed and calibrated with the data measured from site.

427 citations


"Classification Models for Higher Le..." refers methods in this paper

  • ...This is supported by a study conducted by (Afram et al. 2017) which stated that the SVM algorithm is very effective in classifying small-sized data....

    [...]

Journal ArticleDOI
TL;DR: Experimental results reveal that the three ensemble methods can substantially improve individual base learners, and in particular, Bagging performs better than Boosting across all credit datasets.
Abstract: Both statistical techniques and Artificial Intelligence (AI) techniques have been explored for credit scoring, an important finance activity. Although there are no consistent conclusions on which ones are better, recent studies suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this study, we conduct a comparative assessment of the performance of three popular ensemble methods, i.e., Bagging, Boosting, and Stacking, based on four base learners, i.e., Logistic Regression Analysis (LRA), Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). Experimental results reveal that the three ensemble methods can substantially improve individual base learners. In particular, Bagging performs better than Boosting across all credit datasets. Stacking and Bagging DT in our experiments, get the best performance in terms of average accuracy, type I error and type II error.

414 citations


"Classification Models for Higher Le..." refers result in this paper

  • ...The accuracy of a model can be measured by comparing the actual results with predicted results generated by the model (Wang et al. 2011)....

    [...]

  • ...Random tree algorithm has the option of estimating class probabilities for classification (Wang et al. 2011)....

    [...]

Journal ArticleDOI
TL;DR: A two-layer ensemble learning approach TLEL which leverages decision tree and ensemble learning to improve the performance of just-in-time defect prediction and can achieve a substantial and statistically significant improvement over the state-of-the-art methods.
Abstract: Context Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time [1]. Objective Ensemble learning becomes a hot topic in recent years. There have been several studies about applying ensemble learning to defect prediction [2–5]. Traditional ensemble learning approaches only have one layer, i.e., they use ensemble learning once. There are few studies that leverages ensemble learning twice or more. To bridge this research gap, we try to hybridize various ensemble learning methods to see if it will improve the performance of just-in-time defect prediction. In particular, we focus on one way to do this by hybridizing bagging and stacking together and leave other possibly hybridization strategies for future work. Method In this paper, we propose a two-layer ensemble learning approach TLEL which leverages decision tree and ensemble learning to improve the performance of just-in-time defect prediction. In the inner layer, we combine decision tree and bagging to build a Random Forest model. In the outer layer, we use random under-sampling to train many different Random Forest models and use stacking to ensemble them once more. Results To evaluate the performance of TLEL, we use two metrics, i.e., cost effectiveness and F1-score. We perform experiments on the datasets from six large open source projects, i.e., Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. Also, we compare our approach with three baselines, i.e., Deeper, the approach proposed by us [6], DNC, the approach proposed by Wang et al. [2], and MKEL, the approach proposed by Wang et al. [3]. The experimental results show that on average across the six datasets, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code, as compared with about 50% for the baselines. In addition, the F1-scores TLEL can achieve are substantially and statistically significantly higher than those of three baselines across the six datasets. Conclusion TLEL can achieve a substantial and statistically significant improvement over the state-of-the-art methods, i.e., Deeper, DNC and MKEL. Moreover, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code.

167 citations


"Classification Models for Higher Le..." refers methods in this paper

  • ...The complete and correct data preparation of the classification modelling process is important to ensure that the developed model is accurate (Aruna & Nandakishore 2011; Jiawei Han 2006; Yang et al. 2017)....

    [...]