scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Analyzing undergraduate students\' performance using educational data mining

TL;DR: The results indicate that by focusing on a small number of courses that are indicators of particularly good or poor performance, it is possible to provide timely warning and support to low achieving students, and advice and opportunities to high performing students.
Abstract: The tremendous growth in electronic data of universities creates the need to have some meaningful information extracted from these large volumes of data The advancement in the data mining field makes it possible to mine educational data in order to improve the quality of the educational processes This study, thus, uses data mining methods to study the performance of undergraduate students Two aspects of students' performance have been focused upon First, predicting students' academic achievement at the end of a four-year study programme Second, studying typical progressions and combining them with prediction results Two important groups of students have been identified: the low and high achieving students The results indicate that by focusing on a small number of courses that are indicators of particularly good or poor performance, it is possible to provide timely warning and support to low achieving students, and advice and opportunities to high performing students
Citations
More filters
Journal ArticleDOI
TL;DR: This study aims to provide a step-by-step set of guidelines for educators willing to apply data mining techniques to predict student success, and will provide to educators an easier access to datamining techniques, enabling all the potential of their application to the field of education.
Abstract: Student success plays a vital role in educational institutions, as it is often used as a metric for the institution’s performance. Early detection of students at risk, along with preventive measures, can drastically improve their success. Lately, machine learning techniques have been extensively used for prediction purpose. While there is a plethora of success stories in the literature, these techniques are mainly accessible to “computer science”, or more precisely, “artificial intelligence” literate educators. Indeed, the effective and efficient application of data mining methods entail many decisions, ranging from how to define student’s success, through which student attributes to focus on, up to which machine learning method is more appropriate to the given problem. This study aims to provide a step-by-step set of guidelines for educators willing to apply data mining techniques to predict student success. For this, the literature has been reviewed, and the state-of-the-art has been compiled into a systematic process, where possible decisions and parameters are comprehensively covered and explained along with arguments. This study will provide to educators an easier access to data mining techniques, enabling all the potential of their application to the field of education.

180 citations

Journal ArticleDOI
TL;DR: In this article, a predictive analysis of the academic performance of students in public schools of the Federal District of Brazil during the school terms of 2015 and 2016 was presented, where two datasets were obtained: the first dataset contains variables obtained prior to the start of the school year and the second included academic variables collected two months after the semester began.

174 citations

Journal ArticleDOI
TL;DR: Different classification models for predicting student performance are created, using data collected from an Australian university, to validate the hypothesis that the models trained with instances in student sub-populations outperform those constructed using all data instances.
Abstract: The capacity to predict student academic outcomes is of value for any educational institution aiming to improve student performance and persistence. Based on the generated predictions, students identified as being at risk of academic retention or performance can be provided support in a more timely manner. This study creates different classification models for predicting student performance, using data collected from an Australian university. The data include student enrolment details as well as the activity data generated from the university learning management system (LMS). The enrolment data contain student information such as socio-demographic features, university admission basis (e.g. via entry exam or past experience) and attendance type (e.g. full-time vs. part-time). The LMS data record student engagement with their online learning activities. An important contribution of this study is the consideration of student heterogeneity in constructing the predictive models. This is based on the observation that students with different socio-demographic features or study modes may exhibit varying learning motivations. The experiments validated the hypothesis that the models trained with instances in student sub-populations outperform those constructed using all data instances. Furthermore, the experiments revealed that considering both enrolment and course activity features aids in identifying vulnerable students more precisely. The experiments determined that no individual method exhibits superior performance in all aspects. However, the rule-based and tree-based methods generate models with higher interpretability, making them more useful for designing effective student support.

117 citations

Journal ArticleDOI
TL;DR: The main aim of this study is to identify the most commonly studied factors that affect the students’ performance, as well as, the most common data mining techniques applied to identify these factors.
Abstract: Predicting the students’ performance has become a challenging task due to the increasing amount of data in educational systems. In keeping with this, identifying the factors affecting the students’ performance in higher education, especially by using predictive data mining techniques, is still in short supply. This field of research is usually identified as educational data mining. Hence, the main aim of this study is to identify the most commonly studied factors that affect the students’ performance, as well as, the most common data mining techniques applied to identify these factors. In this study, 36 research articles out of a total of 420 from 2009 to 2018 were critically reviewed and analyzed by applying a systematic literature review approach. The results showed that the most common factors are grouped under four main categories, namely students’ previous grades and class performance, students’ e-Learning activity, students’ demographics, and students’ social information. Additionally, the results also indicated that the most common data mining techniques used to predict and classify students’ factors are decision trees, Naive Bayes classifiers, and artificial neural networks.

101 citations

Journal ArticleDOI
01 Feb 2019-Heliyon
TL;DR: Predictive analysis was carried out to determine the extent to which the fifth year and final Cumulative Grade Point Average (CGPA) of engineering students in a Nigerian University can be determined using the program of study, the year of entry and the Grade point Average for the first three years of study as inputs into a Konstanz Information Miner (KNIME) based data mining model.

95 citations


Cites background from "Analyzing undergraduate students\' ..."

  • ...Educational data mining is a machine learning process that has been applied for studying and predicting student performance (Asif et al., 2017; Gasevic et al., 2014; Kostopoulos et al., 2018), for evaluating learning technologies integration process (Angeli et al....

    [...]

  • ...Educational data mining is a machine learning process that has been applied for studying and predicting student performance (Asif et al., 2017; Gasevic et al., 2014; Kostopoulos et al., 2018), for evaluating learning technologies integration process (Angeli et al., 2017), and for identifying…...

    [...]

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Proceedings Article
29 Jun 2000
TL;DR: A new algorithm is introduced that eeciently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criteria (AIC) measure.
Abstract: Despite its popularity for general clustering, K-means suuers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the rst two problems, and a partial remedy for the third. Building on prior work for algorithmic acceleration that is not based on approximation, we introduce a new algorithm that eeciently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) measure. The innovations include two new ways of exploiting cached suucient statistics and a new very eecient test that in one K-means sweep selects the most promising subset of classes for reenement. This gives rise to a fast, statistically founded algorithm that outputs both the number of classes and their parameters. Experiments show this technique reveals the true number of classes in the underlying distribution , and that it is much faster than repeatedly using accelerated K-means for different values of K.

2,466 citations

Proceedings ArticleDOI
01 Oct 2009
TL;DR: This paper reviewed the history and current trends in the field of EDM and discussed trends and shifts in the research conducted by this community, and discussed the increased emphasis on prediction, the emergence of work using existing models to make scientific discoveries, and the reduction in the frequency of relationship mining within the EDM community.
Abstract: We review the history and current trends in the field of Educational Data Mining (EDM). We consider the methodological profile of research in the early years of EDM, compared to in 2008 and 2009, and discuss trends and shifts in the research conducted by this community. In particular, we discuss the increased emphasis on prediction, the emergence of work using existing models to make scientific discoveries ("discovery with models"), and the reduction in the frequency of relationship mining within the EDM community. We discuss two ways that researchers have attempted to categorize the diversity of research in educational data mining research, and review the types of research problems that these methods have been used to address. The most cited papers in EDM between 1995 and 2005 are listed, and their influence on the EDM community (and beyond the EDM community) is discussed.

1,217 citations

Proceedings ArticleDOI
29 Apr 2012
TL;DR: An early intervention solution for collegiate faculty called Course Signals, developed to allow instructors the opportunity to employ the power of learner analytics to provide real-time feedback to a student, is discussed.
Abstract: In this paper, an early intervention solution for collegiate faculty called Course Signals is discussed. Course Signals was developed to allow instructors the opportunity to employ the power of learner analytics to provide real-time feedback to a student. Course Signals relies not only on grades to predict students' performance, but also demographic characteristics, past academic history, and students' effort as measured by interaction with Blackboard Vista, Purdue's learning management system. The outcome is delivered to the students via a personalized email from the faculty member to each student, as well as a specific color on a stoplight -- traffic signal -- to indicate how each student is doing. The system itself is explained in detail, along with retention and performance outcomes realized since its implementation. In addition, faculty and student perceptions will be shared.

864 citations


"Analyzing undergraduate students\' ..." refers background in this paper

  • ...), years of enrolment, delayed courses, type of dedication (full-time, part-time), and debt situation; ElGamal (2013) predicts students' grades in a programming course by considering different factors like the students' mathematical background, programming aptitude, problem solving skills, gender, prior experience, high school mathematics grade, locality, previous computer programming experience, and e-learning usage; Huang and Fang (2013) predict course performance on the basis of students' performance in prerequisite courses and midterm examinations; Romero, Lopez, Luna, and Ventura (2013) investigated the appropriateness of quantitative, qualitative and social network information about forum usage as well as the appropriateness of classical classification algorithms and clustering algorithms to predict students' success or failure in a course; Arnold and Pistilli (2012) provide an early intervention solution for difficult courses based on students' activity in a Learning Management System. A number of studies predict students' passing/failing or overall academic achievement (total marks/CGPA) at the end of a degree programme; these studies are described in greater detail in the ‘Related work’ section. In clustering, the goal is to group objects into classes of similar objects. Though clustering has been used in educational data mining for a wide variety of tasks, an interesting sub-area is grouping students to study patterns of typical behaviours. The work by Cobo et al. (2012) finds typical behaviours in forums such as high-level workers, i.e. students that read all messages and post many messages in the forum, or lurkers, i.e. students who read all messages without posting any; Bower (2010) identifies groups of students with similar performance from Kindergarten till the end of high school; while Talavera and Gaudioso (2004) cluster students' interaction data to build profiles of students. Distillation of data for human judgment accords with what others call overview statistics and visualizations (Baker, 2010). Its aim is to help in understanding the results of analyses. For example, Elkina, Fortenbacher, and Merceron (2013) use an intuitive visualization of analytic results that provides insight about learning processes to teachers, E-learning providers and researchers....

    [...]

  • ...), years of enrolment, delayed courses, type of dedication (full-time, part-time), and debt situation; ElGamal (2013) predicts students' grades in a programming course by considering different factors like the students' mathematical background, programming aptitude, problem solving skills, gender, prior experience, high school mathematics grade, locality, previous computer programming experience, and e-learning usage; Huang and Fang (2013) predict course performance on the basis of students' performance in prerequisite courses and midterm examinations; Romero, Lopez, Luna, and Ventura (2013) investigated the appropriateness of quantitative, qualitative and social network information about forum usage as well as the appropriateness of classical classification algorithms and clustering algorithms to predict students' success or failure in a course; Arnold and Pistilli (2012) provide an early intervention solution for difficult courses based on students' activity in a Learning Management System....

    [...]

  • ...), years of enrolment, delayed courses, type of dedication (full-time, part-time), and debt situation; ElGamal (2013) predicts students' grades in a programming course by considering different factors like the students' mathematical background, programming aptitude, problem solving skills, gender, prior experience, high school mathematics grade, locality, previous computer programming experience, and e-learning usage; Huang and Fang (2013) predict course performance on the basis of students' performance in prerequisite courses and midterm examinations; Romero, Lopez, Luna, and Ventura (2013) investigated the appropriateness of quantitative, qualitative and social network information about forum usage as well as the appropriateness of classical classification algorithms and clustering algorithms to predict students' success or failure in a course; Arnold and Pistilli (2012) provide an early intervention solution for difficult courses based on students' activity in a Learning Management System. A number of studies predict students' passing/failing or overall academic achievement (total marks/CGPA) at the end of a degree programme; these studies are described in greater detail in the ‘Related work’ section. In clustering, the goal is to group objects into classes of similar objects. Though clustering has been used in educational data mining for a wide variety of tasks, an interesting sub-area is grouping students to study patterns of typical behaviours. The work by Cobo et al. (2012) finds typical behaviours in forums such as high-level workers, i.e. students that read all messages and post many messages in the forum, or lurkers, i.e. students who read all messages without posting any; Bower (2010) identifies groups of students with similar performance from Kindergarten till the end of high school; while Talavera and Gaudioso (2004) cluster students' interaction data to build profiles of students. Distillation of data for human judgment accords with what others call overview statistics and visualizations (Baker, 2010). Its aim is to help in understanding the results of analyses. For example, Elkina, Fortenbacher, and Merceron (2013) use an intuitive visualization of analytic results that provides insight about learning processes to teachers, E-learning providers and researchers. Bower's (2010) work combines dendrograms with heat map to provide an intuitive visualization of distinctive groups of students....

    [...]

  • ...), years of enrolment, delayed courses, type of dedication (full-time, part-time), and debt situation; ElGamal (2013) predicts students' grades in a programming course by considering different factors like the students' mathematical background, programming aptitude, problem solving skills, gender, prior experience, high school mathematics grade, locality, previous computer programming experience, and e-learning usage; Huang and Fang (2013) predict course performance on the basis of students' performance in prerequisite courses and midterm examinations; Romero, Lopez, Luna, and Ventura (2013) investigated the appropriateness of quantitative, qualitative and social network information about forum usage as well as the appropriateness of classical classification algorithms and clustering algorithms to predict students' success or failure in a course; Arnold and Pistilli (2012) provide an early intervention solution for difficult courses based on students' activity in a Learning Management System. A number of studies predict students' passing/failing or overall academic achievement (total marks/CGPA) at the end of a degree programme; these studies are described in greater detail in the ‘Related work’ section. In clustering, the goal is to group objects into classes of similar objects. Though clustering has been used in educational data mining for a wide variety of tasks, an interesting sub-area is grouping students to study patterns of typical behaviours. The work by Cobo et al. (2012) finds typical behaviours in forums such as high-level workers, i....

    [...]

  • ...), years of enrolment, delayed courses, type of dedication (full-time, part-time), and debt situation; ElGamal (2013) predicts students' grades in a programming course by considering different factors like the students' mathematical background, programming aptitude, problem solving skills, gender, prior experience, high school mathematics grade, locality, previous computer programming experience, and e-learning usage; Huang and Fang (2013) predict course performance on the basis of students' performance in prerequisite courses and midterm examinations; Romero, Lopez, Luna, and Ventura (2013) investigated the appropriateness of quantitative, qualitative and social network information about forum usage as well as the appropriateness of classical classification algorithms and clustering algorithms to predict students' success or failure in a course; Arnold and Pistilli (2012) provide an early intervention solution for difficult courses based on students' activity in a Learning Management System. A number of studies predict students' passing/failing or overall academic achievement (total marks/CGPA) at the end of a degree programme; these studies are described in greater detail in the ‘Related work’ section. In clustering, the goal is to group objects into classes of similar objects. Though clustering has been used in educational data mining for a wide variety of tasks, an interesting sub-area is grouping students to study patterns of typical behaviours. The work by Cobo et al. (2012) finds typical behaviours in forums such as high-level workers, i.e. students that read all messages and post many messages in the forum, or lurkers, i.e. students who read all messages without posting any; Bower (2010) identifies groups of students with similar performance from Kindergarten till the end of high school; while Talavera and Gaudioso (2004) cluster students' interaction data to build profiles of students....

    [...]

Journal ArticleDOI
TL;DR: To determine how the selection of instances and attributes, the use of different classification algorithms and the date when data is gathered affect the accuracy and comprehensibility of the prediction, a new Moodle module for gathering forum indicators was developed and different executions were carried out.
Abstract: On-line discussion forums constitute communities of people learning from each other, which not only inform the students about their peers' doubts and problems but can also inform instructors about their students' knowledge of the course contents In fact, nowadays there is increasing interest in the use of discussion forums as an indicator of student performance In this respect, this paper proposes the use of different data mining approaches for improving prediction of students' final performance starting from participation indicators in both quantitative, qualitative and social network forums Our objective is to determine how the selection of instances and attributes, the use of different classification algorithms and the date when data is gathered affect the accuracy and comprehensibility of the prediction A new Moodle's module for gathering forum indicators was developed and different executions were carried out using real data from 114 university students during a first-year course in computer science A representative set of traditional classification algorithms have been used and compared versus classification via clustering algorithms for predicting whether students will pass or fail the course on the basis of data about their forum usage The results obtained indicate the suitability of performing both a final prediction at the end of the course and an early prediction before the end of the course; of applying clustering plus class association rules mining instead of traditional classification for obtaining highly interpretable student performance models; and of using a subset of attributes instead of all available attributes, and not all forum messages but only students' messages with content related to the subject of the course for improving classification accuracy

485 citations