scispace - formally typeset
Journal ArticleDOI

The Prediction of Students' Academic Performance Using Classification Data Mining Techniques

Fadhilah Ahmad, +2 more
- 01 Jan 2015 - 
- Vol. 9, pp 6415-6426
TLDR
This paper proposes a framework for predicting students’ academic performance of first year bachelor students in Computer Science course and shows the Rule Based is a best model among the other techniques by receiving the highest accuracy value.
Abstract
Data Mining provides powerful techniques for various fields including education. The research in the educational field is rapidly increasing due to the massive amount of students’ data which can be used to discover valuable pattern pertaining students’ learning behaviour. This paper proposes a framework for predicting students’ academic performance of first year bachelor students in Computer Science course. The data were collected from 8 year period intakes from July 2006/2007 until July 2013/2014 that contains the students’ demographics, previous academic records, and family background information. Decision Tree, Naive Bayes, and Rule Based classification techniques are applied to the students’ data in order to produce the best students’ academic performance prediction model. The experiment result shows the Rule Based is a best model among the other techniques by receiving the highest accuracy value of 71.3%. The extracted knowledge from prediction model will be used to identify and profile the student to determine the students’ level of success in the first semester.

read more

Content maybe subject to copyright    Report

Applied Mathematical Sciences, Vol. 9, 2015, no. 129, 6415 - 6426
HIKARI Ltd, www.m-hikari.com
http://dx.doi.org/10.12988/ams.2015.53289
The Prediction of Students’ Academic Performance
Using Classification Data Mining Techniques
Fadhilah Ahmad
*
, Nur Hafieza Ismail and Azwa Abdul Aziz
Faculty of Informatics and Computing
Universiti Sultan Zainal Abidin (UniSZA), Kuala Terengganu, Malaysia
*
Corresponding author
Copyright © 2015 Fadhilah Ahmad et al. This article is distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Abstract
Data Mining provides powerful techniques for various fields including education.
The research in the educational field is rapidly increasing due to the massive
amount of students’ data which can be used to discover valuable pattern
pertaining students’ learning behaviour. This paper proposes a framework for
predicting students’ academic performance of first year bachelor students in
Computer Science course. The data were collected from 8 year period intakes
from July 2006/2007 until July 2013/2014 that contains the students’
demographics, previous academic records, and family background information.
Decision Tree, Naïve Bayes, and Rule Based classification techniques are applied
to the students’ data in order to produce the best students’ academic performance
prediction model. The experiment result shows the Rule Based is a best model
among the other techniques by receiving the highest accuracy value of 71.3%. The
extracted knowledge from prediction model will be used to identify and profile
the student to determine the students’ level of success in the first semester.
Keywords: Educational data mining; Decision Tree; Naïve Bayes; Rule Based;
students’ academic performance
1. Introduction
Data Mining (DM) concept is to extract hidden pattern and to discover
relationships between parameters in a vast amount of data. There are many
achievements of DM techniques in many areas such as engineering, education,
marketing, medical, financial, and sport. It shows the DM technique's ability in

6416 Fadhilah Ahmad et al.
providing the alternative solution for decision makers in solving problem arise in
particular areas. The exploration data in educational field using DM techniques
are called as Educational Data Mining (EDM). EDM is concerned with extracting
a pattern to discover hidden information from educational data.
Nowadays, the Institutions of Higher Learning (IHL) database contains so much
information about their students. The information is kept increasing by times, but
there is no action taken to gain knowledge from it. DM is the suitable techniques
in managing the IHL data to discover new information and knowledge about
students. DM consists of machine learning, statistical and visualization techniques
to discover and extract knowledge in such a way that humans can easily interpret
[1].
DM provides various methods for analysis process which include classification,
clustering, and association rule. Classification, which is one of the prediction
types classifies data (constructs a pattern) based on the training set and uses the
pattern to classify a new data (testing set). Clustering is the process of grouping
records in classes that are similar, and dissimilar to records in other classes. In
relationship mining, the goal is to discover the relationship exist between
parameters [2, 3, 4].
In this study, the classification method is selected to be applied on the students’
data. This research aims to do a comparative analysis among the three selected
classification algorithms; Decision Tree (DT), Naïve Bayes (NB), and Rule Based
(RB). The comparative analysis is done to discover the best techniques to develop
a predictive model for SAP. The patterns obtained will use to predict the first
semester of the first year in two Bachelor of Computer Science (BCS) courses;
Bachelor of Computer Science with specialization in Software Development
(BCSSD) and Science with specialization in Network Security (BCSNS) at the
Faculty of Informatics and Computing (FIC), Universiti Sultan Zainal Abidin
(UniSZA), Terengganu, Malaysia. This pattern will be used to improve the SAP
and to overcome the issues of low grades obtained by students.
There are several studies conducted using students’ data comes from IHL
Malaysia and these study become the main guideline of this research [5, 6, 7, 8,
9]. All of these studies conducted to find the relationship between independent
parameters and dependent parameter selected in their studies. Mostly, the
Cumulative Grade Point Average (CGPA), Grade Point Average (GPA), students’
grade, and students’ mark are normally used as a predictive parameter (dependent
parameter) to measure the Students’ Academic Performance (SAP) in particular
courses or subjects.
In this study, the students’ GPA is selected as a dependent parameter. The GPA
values of the first semester of the first year BCS students are categories into three
different classes; poor, average, and good. The other parameters used are race,

Prediction of students’ academic performance 6417
gender, family income, university entry mode, and Malaysia Certificate of
Education (SPM) grades in three subjects; Malay Language, English, and
Mathematics. The WEKA tool used to conduct the experiment process. WEKA is
an open source machine learning software written in Java that's widely used by
many researchers in various fields of studies [10].
This paper is organized into several sections. The background and related works
section briefly describe the previous works on SAP and classification techniques.
Followed by, the current problem section and a proposed framework for
predicting SAP section. Next, the results of the three prediction algorithms are
compared in result and discussion section. Finally, the conclusion, limitation, and
future work for this study were discussed in the conclusion section.
2. Background and Related Work
IHL faces a major challenge in order to improve and manage the organization to
be more efficient in managing students’ activities. To achieve this target, DM is
considered as the one of most suitable technique in giving additional insights to
the IHL community to help them make better decisions in educational activities
[11]. There are various previous studies conducted to predict the SAP by using
DM techniques. The next subsections will present the other author is works and
selected classification techniques applied in this study. A more detailed
explanation about SAP and classification method will discuss in the next
subsection.
2.1 Students’ Academic Performance (SAP)
The SAP prediction on will allow IHL to study what features of a model are
important for prediction and to get the hidden information in students’ data [2].
There are a lot of researches conducted to develop an SAP prediction model for
particular courses or subjects. These studies used various types of students’ data
with a variety of parameters to identify and classify their students [12, 13, 14].
The SAP prediction of Introductory Engineering Course is done to understand and
identify the students’ level of performance. For example, if the result of the
prediction shows there are some students that will perform poorly in the course, so
the lecturers can take appropriate action to help those students. The additional
exercise, assignment, or lesson given by lecturers may help the students to
improve their understanding in subject taken [12].
The study is also conducted in Malaysia using students’ data taken from
University Malaysia Pahang (UMP) database management system. The 1000 of
student records with three courses in the Faculty of Computer System and
Software Engineering, UMP contained students’ personal, academic, and course
information. The students’ grade is selected as a predictor parameter and was divided

6418 Fadhilah Ahmad et al.
into five categories which are excellent, very good, good, average, and poor. The
result indicated that the proposed model is suitable to be used as an SAP
prediction [13].
The students’ information such as exam scores, grades of team work, attendance,
and practical exams are used for profiling and grouping the SAP using selected
DM algorithms. The output from analysis process will help the institution to
predict academic trends and patterns by categorizing the students into good,
satisfactory, or poor group. It allows the lecturers to get a better understanding
about students’ learning styles and behaviors [15].
The study involving first year students of school engineering at the National
Autonomous University of Mexico (UNAM) is conducted using students’ socio-
demographic and previous academic information. The data were divided into three
categories; students who passed none or up to two courses (low group), students
who passed three or four courses (middle group), and students who passed all five
courses (high group). The extract patterns from the experiment will allow the IHL
to predict academic performance of the new students so that the lecturers will
know the level of the new students’ preparedness at admission [16].
2.2 Classification Techniques
DM is the process of extracting useful information and knowledge from large data
stores or sets. It involves the use of data analysis tools to discover previously
unknown patterns and relationships in large data sets. DM not only has the
abilities in collecting and managing data, but also has the capability to conduct the
analysis and predicting tasks.
Many studies have applied DM methods to predict SAP using popular methods
such as classification, clustering, and association rule [11]. The primary goal of
using DM techniques in educational field is to develop a prediction model for the
students’ overall performance in selected courses. The students’ performance in
prior courses is used as predictor parameter. The extracted model will assist the
lecturers to identify the students’ problems in order to enhance the students’ level
of performance in academic [17].
DT is one of the most popular techniques in EDM because it provides an intuitive
and human friendly explanation for decision makers to make further action [18].
This technique was applied to the students’ data in previous researches to classify
students into successful and unsuccessful students’ category. So, the lecturers can
provide extra learning lesson to the students who are less potential to be
successful [7, 8, 11, 13, 19].
NB uses the Bayes’ probability theory which assumes the effect of parameter
value of a given class is independent of the values of the other parameters. It rep-

Prediction of students’ academic performance 6419
resents a predictive approach to make predictions on values of data using know
results found from different data [20]. Also, the output from the prediction model
using NB can be easily interpreted into the understandable human language [16,
19, 20, 21]. The generated predictive model will help the faculty staff in managing
the students’ dropout and to predict the SAP of new intake students [22].
RB is a technique for classifying records using a collection of IF…THEN…”
rules. IF-THEN rules will represent the extract knowledge from a dataset in a
form that is easy to understand. This gives the chance to the researchers or the
domain experts to analyze and validate that knowledge, and combine it with the
existing information [23]. The researchers have discovered that a set of IF-THEN
classification rules produces has a high level knowledge representation and can be
used directly for decision making [10, 24].
From the previous studies, the three classification techniques were chosen for this
study are DT, NB, and RB.
3. Current Problem
The IHL goal is to provide the finest quality of education to their students. To
achieve that goal, the new discovery about students’ learning behaviours and the
factors contribute to the students’ success should be exposed for the community
benefits. To discover hidden information and knowledge from the students’ data, a
few elements such as parameters, methods, and tools need to be identified and
considered in order to produce the best model prediction of SAP. The prediction
on SAP can be used as a guideline for the faculty management and lecturers to
prevent students from dropout [22]. The objective of this study is to get the
patterns of SAP focusing on the first semester of the first year BCS at the FIC,
UniSZA, Malaysia.
At the beginning of the semester for new students, a lecturer faces difficulty to
know and analyse the student’s performance because there is lacking of
information about students' previous background. All the information about
students is stored in a database at Academic Department, UniSZA and Student
Entry Management Department (SEMD), Ministry of Higher Education based in
different location (Kuala Lumpur, Malaysia). The parameters used are GPA, race,
gender, family income, university entry mode, and SPM grades in three subjects;
Malay Language, English, and Mathematics. The study is made to determine
whether or not the selected parameters contribute to the SAP.
Besides, this study is also conducted to find out the relationship between the
independent parameters and the dependent parameter. The discovered pattern can
be used by lecturers to make a prediction on SAP among first semester of the first
year bachelor students at FIC. The SAP prediction is very important to provide
more information about the students to the lecturers. Therefore, the lecturers would

Citations
More filters
Journal ArticleDOI

Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses

TL;DR: The results showed that the techniques analyzed are able to early identify students likely to fail, the effectiveness of some of these techniques is improved after applying the data preprocessing and/or algorithms fine-tuning, and the support vector machine technique outperforms the other ones in a statistically significant way.
Journal ArticleDOI

Predicting academic success in higher education: literature review and best practices

TL;DR: This study aims to provide a step-by-step set of guidelines for educators willing to apply data mining techniques to predict student success, and will provide to educators an easier access to datamining techniques, enabling all the potential of their application to the field of education.
Proceedings ArticleDOI

Predicting academic performance: a systematic literature review

TL;DR: In this paper, the authors present a systematic literature review of work in the area of predicting student performance, which shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used.
Journal ArticleDOI

Predicting Academic Performance of Students Using a Hybrid Data Mining Approach

TL;DR: A new prediction algorithm for evaluating student’s performance in academia has been developed based on both classification and clustering techniques and been developed on a real time basis with student dataset of various academic disciplines of higher educational institutions in Kerala, India.
Proceedings ArticleDOI

Students performance prediction using KNN and Naïve Bayesian

TL;DR: A student performance prediction model is proposed by applying two classification algorithms: KNN and Naïve Bayes on educational data set of secondary schools, collected from the ministry of education in Gaza Strip for 2015 year, which shows that Na naïve Bayes is better than KNN by receiving the highest accuracy value.
References
More filters
Book

Discovering Knowledge in Data: An Introduction to Data Mining

TL;DR: The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Proceedings Article

Generating Accurate Rule Sets Without Global Optimization

TL;DR: This paper presents an algorithm for inferring rules by repeatedly generating partial decision trees, thus combining the two major paradigms for rule generation—creating rules from decision trees and the separate-and-conquer rule-learning technique.
Proceedings Article

Data Mining Algorithms to Classify Students

TL;DR: It is claimed that a classifier model appropriate for educational use has to be both accurate and comprehensible for instructors in order to be of use for decision making.
Journal ArticleDOI

Predicting Student Performance by Using Data Mining Methods for Classification

TL;DR: The initial results from a data mining research project implemented at a Bulgarian university are presented, aimed at revealing the high potential of data mining applications for university management.
Posted Content

Data Mining : A prediction of performer or underperformer using classification

TL;DR: Data mining techniques name Byes classification method is used on these data to help an institution reduce the drop put ratio to a significant level and improve the performance level of the institution.
Related Papers (5)