The Prediction of Students' Academic Performance Using Classification Data Mining Techniques

doi:10.12988/AMS.2015.53289

Applied Mathematical Sciences, Vol. 9, 2015, no. 129, 6415 - 6426

HIKARI Ltd, www.m-hikari.com

http://dx.doi.org/10.12988/ams.2015.53289

The Prediction of Students’ Academic Performance

Using Classification Data Mining Techniques

Fadhilah Ahmad

*

, Nur Hafieza Ismail and Azwa Abdul Aziz

Faculty of Informatics and Computing

Universiti Sultan Zainal Abidin (UniSZA), Kuala Terengganu, Malaysia

*

Corresponding author

Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,

provided the original work is properly cited.

Abstract

Data Mining provides powerful techniques for various fields including education.

The research in the educational field is rapidly increasing due to the massive

amount of students’ data which can be used to discover valuable pattern

pertaining students’ learning behaviour. This paper proposes a framework for

predicting students’ academic performance of first year bachelor students in

Computer Science course. The data were collected from 8 year period intakes

from July 2006/2007 until July 2013/2014 that contains the students’

demographics, previous academic records, and family background information.

Decision Tree, Naïve Bayes, and Rule Based classification techniques are applied

to the students’ data in order to produce the best students’ academic performance

prediction model. The experiment result shows the Rule Based is a best model

among the other techniques by receiving the highest accuracy value of 71.3%. The

extracted knowledge from prediction model will be used to identify and profile

the student to determine the students’ level of success in the first semester.

Keywords: Educational data mining; Decision Tree; Naïve Bayes; Rule Based;

students’ academic performance

1. Introduction

Data Mining (DM) concept is to extract hidden pattern and to discover

relationships between parameters in a vast amount of data. There are many

achievements of DM techniques in many areas such as engineering, education,

marketing, medical, financial, and sport. It shows the DM technique's ability in

6416 Fadhilah Ahmad et al.

providing the alternative solution for decision makers in solving problem arise in

particular areas. The exploration data in educational field using DM techniques

are called as Educational Data Mining (EDM). EDM is concerned with extracting

a pattern to discover hidden information from educational data.

Nowadays, the Institutions of Higher Learning (IHL) database contains so much

information about their students. The information is kept increasing by times, but

there is no action taken to gain knowledge from it. DM is the suitable techniques

in managing the IHL data to discover new information and knowledge about

students. DM consists of machine learning, statistical and visualization techniques

to discover and extract knowledge in such a way that humans can easily interpret

[1].

DM provides various methods for analysis process which include classification,

clustering, and association rule. Classification, which is one of the prediction

types classifies data (constructs a pattern) based on the training set and uses the

pattern to classify a new data (testing set). Clustering is the process of grouping

records in classes that are similar, and dissimilar to records in other classes. In

relationship mining, the goal is to discover the relationship exist between

parameters [2, 3, 4].

In this study, the classification method is selected to be applied on the students’

data. This research aims to do a comparative analysis among the three selected

classification algorithms; Decision Tree (DT), Naïve Bayes (NB), and Rule Based

(RB). The comparative analysis is done to discover the best techniques to develop

a predictive model for SAP. The patterns obtained will use to predict the first

semester of the first year in two Bachelor of Computer Science (BCS) courses;

Bachelor of Computer Science with specialization in Software Development

(BCSSD) and Science with specialization in Network Security (BCSNS) at the

Faculty of Informatics and Computing (FIC), Universiti Sultan Zainal Abidin

(UniSZA), Terengganu, Malaysia. This pattern will be used to improve the SAP

and to overcome the issues of low grades obtained by students.

There are several studies conducted using students’ data comes from IHL

Malaysia and these study become the main guideline of this research [5, 6, 7, 8,

9]. All of these studies conducted to find the relationship between independent

parameters and dependent parameter selected in their studies. Mostly, the

Cumulative Grade Point Average (CGPA), Grade Point Average (GPA), students’

grade, and students’ mark are normally used as a predictive parameter (dependent

parameter) to measure the Students’ Academic Performance (SAP) in particular

courses or subjects.

In this study, the students’ GPA is selected as a dependent parameter. The GPA

values of the first semester of the first year BCS students are categories into three

different classes; poor, average, and good. The other parameters used are race,

Prediction of students’ academic performance 6417

gender, family income, university entry mode, and Malaysia Certificate of

Education (SPM) grades in three subjects; Malay Language, English, and

Mathematics. The WEKA tool used to conduct the experiment process. WEKA is

an open source machine learning software written in Java that's widely used by

many researchers in various fields of studies [10].

This paper is organized into several sections. The background and related works

section briefly describe the previous works on SAP and classification techniques.

Followed by, the current problem section and a proposed framework for

predicting SAP section. Next, the results of the three prediction algorithms are

compared in result and discussion section. Finally, the conclusion, limitation, and

future work for this study were discussed in the conclusion section.

2. Background and Related Work

IHL faces a major challenge in order to improve and manage the organization to

be more efficient in managing students’ activities. To achieve this target, DM is

considered as the one of most suitable technique in giving additional insights to

the IHL community to help them make better decisions in educational activities

[11]. There are various previous studies conducted to predict the SAP by using

DM techniques. The next subsections will present the other author is works and

selected classification techniques applied in this study. A more detailed

explanation about SAP and classification method will discuss in the next

subsection.

2.1 Students’ Academic Performance (SAP)

The SAP prediction on will allow IHL to study what features of a model are

important for prediction and to get the hidden information in students’ data [2].

There are a lot of researches conducted to develop an SAP prediction model for

particular courses or subjects. These studies used various types of students’ data

with a variety of parameters to identify and classify their students [12, 13, 14].

The SAP prediction of Introductory Engineering Course is done to understand and

identify the students’ level of performance. For example, if the result of the

prediction shows there are some students that will perform poorly in the course, so

the lecturers can take appropriate action to help those students. The additional

exercise, assignment, or lesson given by lecturers may help the students to

improve their understanding in subject taken [12].

The study is also conducted in Malaysia using students’ data taken from

University Malaysia Pahang (UMP) database management system. The 1000 of

student records with three courses in the Faculty of Computer System and

Software Engineering, UMP contained students’ personal, academic, and course

information. The students’ grade is selected as a predictor parameter and was divided

6418 Fadhilah Ahmad et al.

into five categories which are excellent, very good, good, average, and poor. The

result indicated that the proposed model is suitable to be used as an SAP

prediction [13].

The students’ information such as exam scores, grades of team work, attendance,

and practical exams are used for profiling and grouping the SAP using selected

DM algorithms. The output from analysis process will help the institution to

predict academic trends and patterns by categorizing the students into good,

satisfactory, or poor group. It allows the lecturers to get a better understanding

about students’ learning styles and behaviors [15].

The study involving first year students of school engineering at the National

Autonomous University of Mexico (UNAM) is conducted using students’ socio-

demographic and previous academic information. The data were divided into three

categories; students who passed none or up to two courses (low group), students

who passed three or four courses (middle group), and students who passed all five

courses (high group). The extract patterns from the experiment will allow the IHL

to predict academic performance of the new students so that the lecturers will

know the level of the new students’ preparedness at admission [16].

2.2 Classification Techniques

DM is the process of extracting useful information and knowledge from large data

stores or sets. It involves the use of data analysis tools to discover previously

unknown patterns and relationships in large data sets. DM not only has the

abilities in collecting and managing data, but also has the capability to conduct the

analysis and predicting tasks.

Many studies have applied DM methods to predict SAP using popular methods

such as classification, clustering, and association rule [11]. The primary goal of

using DM techniques in educational field is to develop a prediction model for the

students’ overall performance in selected courses. The students’ performance in

prior courses is used as predictor parameter. The extracted model will assist the

lecturers to identify the students’ problems in order to enhance the students’ level

of performance in academic [17].

DT is one of the most popular techniques in EDM because it provides an intuitive

and human friendly explanation for decision makers to make further action [18].

This technique was applied to the students’ data in previous researches to classify

students into successful and unsuccessful students’ category. So, the lecturers can

provide extra learning lesson to the students who are less potential to be

successful [7, 8, 11, 13, 19].

NB uses the Bayes’ probability theory which assumes the effect of parameter

value of a given class is independent of the values of the other parameters. It rep-

Prediction of students’ academic performance 6419

resents a predictive approach to make predictions on values of data using know

results found from different data [20]. Also, the output from the prediction model

using NB can be easily interpreted into the understandable human language [16,

19, 20, 21]. The generated predictive model will help the faculty staff in managing

the students’ dropout and to predict the SAP of new intake students [22].

RB is a technique for classifying records using a collection of “IF…THEN…”

rules. IF-THEN rules will represent the extract knowledge from a dataset in a

form that is easy to understand. This gives the chance to the researchers or the

domain experts to analyze and validate that knowledge, and combine it with the

existing information [23]. The researchers have discovered that a set of IF-THEN

classification rules produces has a high level knowledge representation and can be

used directly for decision making [10, 24].

From the previous studies, the three classification techniques were chosen for this

study are DT, NB, and RB.

3. Current Problem

The IHL goal is to provide the finest quality of education to their students. To

achieve that goal, the new discovery about students’ learning behaviours and the

factors contribute to the students’ success should be exposed for the community

benefits. To discover hidden information and knowledge from the students’ data, a

few elements such as parameters, methods, and tools need to be identified and

considered in order to produce the best model prediction of SAP. The prediction

on SAP can be used as a guideline for the faculty management and lecturers to

prevent students from dropout [22]. The objective of this study is to get the

patterns of SAP focusing on the first semester of the first year BCS at the FIC,

UniSZA, Malaysia.

At the beginning of the semester for new students, a lecturer faces difficulty to

know and analyse the student’s performance because there is lacking of

information about students' previous background. All the information about

students is stored in a database at Academic Department, UniSZA and Student

Entry Management Department (SEMD), Ministry of Higher Education based in

different location (Kuala Lumpur, Malaysia). The parameters used are GPA, race,

gender, family income, university entry mode, and SPM grades in three subjects;

Malay Language, English, and Mathematics. The study is made to determine

whether or not the selected parameters contribute to the SAP.

Besides, this study is also conducted to find out the relationship between the

independent parameters and the dependent parameter. The discovered pattern can

be used by lecturers to make a prediction on SAP among first semester of the first

year bachelor students at FIC. The SAP prediction is very important to provide

more information about the students to the lecturers. Therefore, the lecturers would

The Prediction of Students' Academic Performance Using Classification Data Mining Techniques

Citations

Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses

Predicting academic success in higher education: literature review and best practices

Predicting academic performance: a systematic literature review

Predicting Academic Performance of Students Using a Hybrid Data Mining Approach

Students performance prediction using KNN and Naïve Bayesian

References

Discovering Knowledge in Data: An Introduction to Data Mining

Generating Accurate Rule Sets Without Global Optimization

Data Mining Algorithms to Classify Students

Predicting Student Performance by Using Data Mining Methods for Classification

Data Mining : A prediction of performer or underperformer using classification

Related Papers (5)

Educational Data Mining: A Review of the State of the Art

A Review on Predicting Student's Performance Using Data Mining Techniques

Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models

Models for early prediction of at-risk students in a course using standards-based grading

Students' LMS interaction patterns and their relationship with achievement