scispace - formally typeset
Search or ask a question

Showing papers on "Educational data mining published in 2020"


Journal ArticleDOI
TL;DR: The current state of the art in data mining in education is provided by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, themain objectives, and the future trends in this research area.
Abstract: This survey is an updated and improved version of the previous one published in 2013 in this journal with the title “data mining in education”. It reviews in a comprehensible and very general way how Educational Data Mining and Learning Analytics have been applied over educational data. In the last decade, this research area has evolved enormously and a wide range of related terms are now used in the bibliography such as Academic Analytics, Institutional Analytics, Teaching Analytics, Data‐Driven Education, Data‐Driven Decision‐Making in Education, Big Data in Education, and Educational Data Science. This paper provides the current state of the art by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, the main objectives, and the future trends in this research area.

350 citations


Journal ArticleDOI
01 Jan 2020
TL;DR: A comprehensive and systematic review of influential AIEd studies indicated that there was a continuingly increasing interest in and impact of AIEd research, but little work had been conducted to bring deep learning technologies into educational contexts.
Abstract: Considering the increasing importance of Artificial Intelligence in Education (AIEd) and the absence of a comprehensive review on it, this research aims to conduct a comprehensive and systematic review of influential AIEd studies. We analyzed 45 articles in terms of annual distribution, leading journals, institutions, countries/regions, the most frequently used terms, as well as theories and technologies adopted. We also evaluated definitions of AIEd from broad and narrow perspectives and clarified the relationship among AIEd, Educational Data Mining, Computer-Based Education, and Learning Analytics. Results indicated that: 1) there was a continuingly increasing interest in and impact of AIEd research; 2) little work had been conducted to bring deep learning technologies into educational contexts; 3) traditional AI technologies, such as natural language processing were commonly adopted in educational contexts, while more advanced techniques were rarely adopted, 4) there was a lack of studies that both employ AI technologies and engage deeply with educational theories. Findings suggested scholars to 1) seek the potential of applying AI in physical classroom settings; 2) spare efforts to recognize detailed entailment relationships between learners’ answers and the desired conceptual understanding within intelligent tutoring systems; 3) pay more attention to the adoption of advanced deep learning algorithms such as generative adversarial network and deep neural network; 4) seek the potential of NLP in promoting precision or personalized education; 5) combine biomedical detection and imaging technologies such as electroencephalogram, and target at issues regarding learners’ during the learning process; and 6) closely incorporate the application of AI technologies with educational theories.

171 citations


Journal ArticleDOI
TL;DR: To exploit the full potential of the student exam performance prediction, it was concluded that adequate data acquisition functionalities and the student interaction with the learning environment is a prerequisite to ensure sufficient amount of data for analysis.
Abstract: Recent increase in the availability of learning data has given educational data mining an importance and momentum, in order to better understand and optimize the learning process and environments in which it occurs. The aim of this paper is to provide a comprehensive analysis and comparison of state of the art supervised machine learning techniques applied for solving the task of student exam performance prediction, i.e. discovering students at a “high risk” of dropping out from the course, and predicting their future achievements, such as for instance, the final exam scores. For both classification and regression tasks, the overall highest precision was obtained with artificial neural networks by feeding the student engagement data and past performance data, while the usage of demographic data did not show significant influence on the precision of predictions. To exploit the full potential of the student exam performance prediction, it was concluded that adequate data acquisition functionalities and the student interaction with the learning environment is a prerequisite to ensure sufficient amount of data for analysis.

170 citations


Journal ArticleDOI
TL;DR: A decade of research work conducted between 2010 and November 2020 was surveyed to present a fundamental understanding of the intelligent techniques used for the prediction of student performance, where academic success is strictly measured using student learning outcomes as discussed by the authors.
Abstract: The prediction of student academic performance has drawn considerable attention in education. However, although the learning outcomes are believed to improve learning and teaching, prognosticating the attainment of student outcomes remains underexplored. A decade of research work conducted between 2010 and November 2020 was surveyed to present a fundamental understanding of the intelligent techniques used for the prediction of student performance, where academic success is strictly measured using student learning outcomes. The electronic bibliographic databases searched include ACM, IEEE Xplore, Google Scholar, Science Direct, Scopus, Springer, and Web of Science. Eventually, we synthesized and analyzed a total of 62 relevant papers with a focus on three perspectives, (1) the forms in which the learning outcomes are predicted, (2) the predictive analytics models developed to forecast student learning, and (3) the dominant factors impacting student outcomes. The best practices for conducting systematic literature reviews, e.g., PICO and PRISMA, were applied to synthesize and report the main results. The attainment of learning outcomes was measured mainly as performance class standings (i.e., ranks) and achievement scores (i.e., grades). Regression and supervised machine learning models were frequently employed to classify student performance. Finally, student online learning activities, term assessment grades, and student academic emotions were the most evident predictors of learning outcomes. We conclude the survey by highlighting some major research challenges and suggesting a summary of significant recommendations to motivate future works in this field.

116 citations


Journal ArticleDOI
TL;DR: It is demonstrated that applicants’ early university performance can be predicted before admission based on certain pre-admission criteria (high school grade average, Scholastic Achievement Admission Test score, and General Aptitude Test score) and the Artificial Neural Network technique has an accuracy rate above 79%, making it superior to other classification techniques considered.
Abstract: An admissions system based on valid and reliable admissions criteria is very important to select candidates likely to perform well academically at institutions of higher education. This study focuses on ways to support universities in admissions decision making using data mining techniques to predict applicants’ academic performance at university. A data set of 2,039 students enrolled in a Computer Science and Information College of a Saudi public university from 2016 to 2019 was used to validate the proposed methodology. The results demonstrate that applicants’ early university performance can be predicted before admission based on certain pre-admission criteria (high school grade average, Scholastic Achievement Admission Test score, and General Aptitude Test score). The results also show that Scholastic Achievement Admission Test score is the pre-admission criterion that most accurately predicts future student performance. Therefore, this score should be assigned more weight in admissions systems. We also found that the Artificial Neural Network technique has an accuracy rate above 79%, making it superior to other classification techniques considered (Decision Trees, Support Vector Machines, and Naive Bayes).

102 citations


Book ChapterDOI
08 Apr 2020
TL;DR: This review has the objective of examining the way data mining was handled by researchers in the past and the most recent trends on data mining in educational research, as well as to evaluate the likelihood of employing machine learning in the field of education.
Abstract: One of the developing fields of the present times is educational data mining that pertains to developing methods that help in examining various kinds of data obtained from the educational field. A vital part is played by data mining in the education field, particularly when behavior is being assessed in an online learning setting. This is because data mining is capable of analyzing and identifying the hidden information regarding the data itself, which is very difficult and takes up a lot of time if performed manually. This review has the objective of examining the way data mining was handled by researchers in the past and the most recent trends on data mining in educational research, as well as to evaluate the likelihood of employing machine learning in the field of education. The various limitations inherent in the current research are examined and recommendations are made for future research.

88 citations


Journal ArticleDOI
TL;DR: Two approaches of machine learning, logistic regressions and decision trees are performed to predict student dropout at the Karlsruhe Institute of Technology (KIT), finding decision trees to produce slightly better results than logistic regression.
Abstract: We perform two approaches of machine learning, logistic regressions and decision trees, to predict student dropout at the Karlsruhe Institute of Technology (KIT). The models are computed on the bas...

83 citations


Journal ArticleDOI
TL;DR: This work proposes a systematic approach based on Gini index and p -value to select a suitable ensemble learner from a combination of six potential machine learning algorithms.
Abstract: A plethora of research has been done in the past focusing on predicting student’s performance in order to support their development. Many institutions are focused on improving the performance and the education quality; and this can be achieved by utilizing data mining techniques to analyze and predict students’ performance and to determine possible factors that may affect their final marks. To address this issue, this work starts by thoroughly exploring and analyzing two different datasets at two separate stages of course delivery (20% and 50% respectively) using multiple graphical, statistical, and quantitative techniques. The feature analysis provides insights into the nature of the different features considered and helps in the choice of the machine learning algorithms and their parameters. Furthermore, this work proposes a systematic approach based on Gini index and p -value to select a suitable ensemble learner from a combination of six potential machine learning algorithms. Experimental results show that the proposed ensemble models achieve high accuracy and low false positive rate at all stages for both datasets.

82 citations


Proceedings ArticleDOI
19 Oct 2020
TL;DR: A novel Relation-aware self-attention model for Knowledge Tracing that outperforms state-of-the-art knowledge tracing methods and interpretable attention weights help visualize the relation between interactions and temporal patterns in the human learning process.
Abstract: The world has transitioned into a new phase of online learning in response to the recent Covid19 pandemic. Now more than ever, it has become paramount to push the limits of online learning in every manner to keep flourishing the education system. One crucial component of online learning is Knowledge Tracing (KT). The aim of KT is to model student's knowledge level based on their answers to a sequence of exercises referred as interactions. Students acquire their skills while solving exercises and each such interaction has a distinct impact on student ability to solve a future exercise. This impact is characterized by 1) the relation between exercises involved in the interactions and 2) student forget behavior. Traditional studies on knowledge tracing do not explicitly model both the components jointly to estimate the impact of these interactions. In this paper, we propose a novel Relation-aware self-attention model for Knowledge Tracing (RKT). We introduce a relation-aware self-attention layer that incorporates the contextual information. This contextual information integrates both the exercise relation information through their textual content as well as student performance data and the forget behavior information through modeling an exponentially decaying kernel function. Extensive experiments on three real-world datasets, among which two new collections are released to the public, show that our model outperforms state-of-the-art knowledge tracing methods. Furthermore, the interpretable attention weights help visualize the relation between interactions and temporal patterns in the human learning process.

71 citations


Journal ArticleDOI
TL;DR: This review sought to outline recent advances and the trends in this area to make it more efficient for researchers to establish the empirical studies and research patterns among different studies in the field of SRL.
Abstract: For the last one decade, research in self-regulated learning (SRL) and educational psychology has proliferated. Researchers and educators have focused on how to support leaners grow their SRL skills on both face-to-face and e-learning environments. In addition, recent studies and meta-analysis have greatly contributed to the domain knowledge on the use of SRL strategies and how they contribute and boost academic performance for learners. However, there is little systematic review on the literature on the techniques and tools used to measure SRL on e-learning platforms. This review sought to outline recent advances and the trends in this area to make it more efficient for researchers to establish the empirical studies and research patterns among different studies in the field of SRL. The findings from this study are concurrent with existing empirical evidence that traditional methods designed for classroom supports are being used for measuring SRL on e-learning environments. Few studies have used learner analytics and educational data mining (EDM) techniques to measure and promote SRL strategies for learners. The paper finally points out the existing gaps with the tools presently used to measure and support SRL on learning management systems and recommends further studies on the areas of EDM which can support SRL.

63 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate that the prognosis of students at risk of failure can be achieved with satisfactory accuracy in most cases, provided that datasets of students who have attended other related courses are available.
Abstract: Transferring knowledge from one domain to another has gained a lot of attention among scientists in recent years. Transfer learning is a machine learning approach aiming to exploit the knowledge retrieved from one problem for improving the predictive performance of a learning model for a different but related problem. This is particularly the case when there is a lack of data regarding a problem, but there is plenty of data about another related one. To this end, the present study intends to investigate the effectiveness of transfer learning from deep neural networks for the task of students’ performance prediction in higher education. Since building predictive models in the Educational Data Mining field through transfer learning methods has been poorly studied so far, we consider this study as an important step in this direction. Therefore, a plethora of experiments were conducted based on data originating from five compulsory courses of two undergraduate programs. The experimental results demonstrate that the prognosis of students at risk of failure can be achieved with satisfactory accuracy in most cases, provided that datasets of students who have attended other related courses are available.

Journal ArticleDOI
TL;DR: In this study, performances of 3518 university students were tried to be predicted by artificial neural networks in terms of gender, content score, time spent on the content, number of entries to content, homework score, and the number of attendance to live sessions, total time spent in live sessions and archived courses variables.
Abstract: Prediction of student performance is one of the most important subjects of educational data mining. Artificial neural networks are seen to be an effective tool in predicting student performance in e-learning environments. In the studies carried out with artificial neural networks, performance predictions based on student scores are generally made, but students’ use of learning management system is not focused. In this study, performances of 3518 university students, who studying and actively participating in a learning management system, were tried to be predicted by artificial neural networks in terms of gender, content score, time spent on the content, number of entries to content, homework score, number of attendance to live sessions, total time spent in live sessions, number of attendance to archived courses and total time spent in archived courses variables. Since it is difficult to interpret how much input variables in artificial neural networks contribute to predicting output variables, these networks are called black boxes. Also, in this study the amount of contribution of input variables on the prediction of output variable was also examined. The artificial neural network created as a result of the study makes a prediction with an accuracy of 80.47%. Finally, it was found that the variables of number of attendance to the live classes, the number of attendance to archived courses and the time spent in the content contributed most to the prediction of the output variable.

Journal ArticleDOI
TL;DR: The use of the supervised Machine Learning for text classification to predict students’ final course grades in a hybrid Advanced Statistics course and the potential of using ML classified messages to identify students at risk of course failure are exhibited.
Abstract: This paper demonstrated the use of the supervised Machine Learning (ML) for text classification to predict students’ final course grades in a hybrid Advanced Statistics course and exhibited...

Journal ArticleDOI
TL;DR: It is argued that statistical learning techniques should be selected to maximize interpretability and should contribute to the authors' understanding of educational and learning phenomena; hence, in most cases, educational data mining and learning analytics researchers should aim for explanation over prediction.
Abstract: Large swaths of data are readily available in various fields, and education is no exception. In tandem, the impetus to derive meaningful insights from data gains urgency. Recent advances in deep learning, particularly in the area of voice and image recognition and so-called complete knowledge games like chess, go, and StarCraft, have resulted in a flurry of research. Using two educational datasets, we explore the utility and applicability of deep learning for educational data mining and learning analytics. We compare the predictive accuracy of popular deep learning frameworks/libraries, including, Keras, Theano, Tensorflow, fast.ai, and Pytorch. Experimental results reveal that performance, as assessed by predictive accuracy, varies depending on the optimizer used. Further, findings from additional experiments by tuning network parameters yield similar results. Moreover, we find that deep learning displays comparable performance to other machine learning algorithms such as support vector machines, k-nearest neighbors, naive Bayes classifier, and logistic regression. We argue that statistical learning techniques should be selected to maximize interpretability and should contribute to our understanding of educational and learning phenomena; hence, in most cases, educational data mining and learning analytics researchers should aim for explanation over prediction.

Journal ArticleDOI
TL;DR: Both ensemble and filtering approaches have demonstrated substantial improvement in predicting the performance of students than the application of conventional classifiers and two novel prediction models have been propounded after conducting performance analysis on each approach.

Journal ArticleDOI
TL;DR: The concept of online learning has witnessed an increase in the higher education sector, where enrolment rates in online courses have significantly grown in recent years as mentioned in this paper, according to the literatur...
Abstract: The concept of online learning has witnessed an increase in the higher education sector, where enrolment rates in online courses have significantly grown in recent years. According to the literatur...

Journal ArticleDOI
TL;DR: This survey presents an in-depth analysis of the state-of-the-art literature in the field of SDP, under the central perspective of machine learning predictive algorithms, and proposes a comprehensive hierarchical classification of existing literature that follows the workflow of design choices in the SDP.
Abstract: The recent diffusion of online education (both MOOCs and e-courses) has led to an increased economic and scientific interest in e-learning environments. As widely documented, online students have a much higher chance of dropping out than those attending conventional classrooms. It is of paramount interest for institutions, students, and faculty members to find more efficient methodologies to mitigate withdrawals. Following the rise of attention on the Student Dropout Prediction (SDP) problem, the literature has witnessed a significant increase in contributions to this subject. In this survey, we present an in-depth analysis of the state-of-the-art literature in the field of SDP, under the central perspective, but not exclusive, of machine learning predictive algorithms. Our main contributions are the following: (i) we propose a comprehensive hierarchical classification of existing literature that follows the workflow of design choices in the SDP; (ii) to facilitate the comparative analysis, we introduce a formal notation to describe in a uniform way the alternative dropout models investigated by the researchers in the field; (iii) we analyse some other relevant aspects to which the literature has given less attention, such as evaluation metrics, gathered data, and privacy concerns; (iv) we pay specific attention to deep sequential machine learning methods—recently proposed by some contributors—which represent one of the most effective solutions in this area. Overall, our survey provides novice readers who address these topics with practical guidance on design choices, as well as directs researchers to the most promising approaches, highlighting current limitations and open challenges in the field.

Book ChapterDOI
06 Jul 2020
TL;DR: A tool that allows to predict the dropout of a first-year undergraduate student by exploiting machine learning techniques, and it can be used either during the application phase or during the first year.
Abstract: Among the many open problems in the learning process, students dropout is one of the most complicated and negative ones, both for the student and the institutions, and being able to predict it could help to alleviate its social and economic costs. To address this problem we developed a tool that, by exploiting machine learning techniques, allows to predict the dropout of a first-year undergraduate student. The proposed tool allows to estimate the risk of quitting an academic course, and it can be used either during the application phase or during the first year, since it selectively accounts for personal data, academic records from secondary school and also first year course credits. Our experiments have been performed by considering real data of students from eleven schools of a major University.

Journal ArticleDOI
01 Feb 2020-Symmetry
TL;DR: Teachers can use the mid-term forecasting system to find high-risk groups during the semester and remedy their learning behaviors in the future with the results of this research.
Abstract: From traditional face-to-face courses, asynchronous distance learning, synchronous live learning, to even blended learning approaches, the learning approach can be more learner-centralized, enabling students to learn anytime and anywhere. In this study, we applied educational data mining to explore the learning behaviors in data generated by students in a blended learning course. The experimental data were collected from two classes of Python programming related courses for first-year students in a university in northern Taiwan. During the semester, high-risk learners could be predicted accurately by data generated from the blended educational environment. The f1-score of the random forest model was 0.83, which was higher than the f1-score of logistic regression and decision tree. The model built in this study could be extrapolated to other courses to predict students’ learning performance, where the F1-score was 0.77. Furthermore, we used machine learning and symmetry-based learning algorithms to explore learning behaviors. By using the hierarchical clustering heat map, this study could define the students’ learning patterns including the positive interactive group, stable learning group, positive teaching material group, and negative learning group. These groups also corresponded with the student conscious questionnaire. With the results of this research, teachers can use the mid-term forecasting system to find high-risk groups during the semester and remedy their learning behaviors in the future.

Journal ArticleDOI
TL;DR: Potential is found for predicting individual student interim and final assessment marks in small student cohorts with very limited attributes and that these predictions could be useful to support module leaders in identifying students potentially ?
Abstract: The measurement of student performance during their progress through university study provides academic leadership with critical information on each student?s likelihood of success. Academics have traditionally used their interactions with individual students through class activities and interim assessments to identify those ?at risk? of failure/withdrawal. However, modern university environments, offering easy on-line availability of course material, may see reduced lecture/tutorial attendance, making such identification more challenging. Modern data mining and machine learning techniques provide increasingly accurate predictions of student examination assessment marks, although these approaches have focussed upon large student populations and wide ranges of data attributes per student. However, many university modules comprise relatively small student cohorts, with institutional protocols limiting the student attributes available for analysis. It appears that very little research attention has been devoted to this area of analysis and prediction. We describe an experiment conducted on a final-year university module student cohort of 23, where individual student data are limited to lecture/tutorial attendance, virtual learning environment accesses and intermediate assessments. We found potential for predicting individual student interim and final assessment marks in small student cohorts with very limited attributes and that these predictions could be useful to support module leaders in identifying students potentially ?at risk.?

Journal ArticleDOI
TL;DR: It is argued that higher education institutions are paradigms of information fiduciaries and have a special responsibility to their students, and cases when learning analytics violate an institution's responsibility to its students are analyzed.
Abstract: Higher education institutions are mining and analyzing student data to effect educational, political, and managerial outcomes. Done under the banner of “learning analytics,” this work can—and often does—surface sensitive data and information about, inter alia, a student's demographics, academic performance, offline and online movements, physical fitness, mental wellbeing, and social network. With these data, institutions and third parties are able to describe student life, predict future behaviors, and intervene to address academic or other barriers to student success (however defined). Learning analytics, consequently, raise serious issues concerning student privacy, autonomy, and the appropriate flow of student data. We argue that issues around privacy lead to valid questions about the degree to which students should trust their institution to use learning analytics data and other artifacts (algorithms, predictive scores) with their interests in mind. We argue that higher education institutions are paradigms of information fiduciaries. As such, colleges and universities have a special responsibility to their students. In this article, we use the information fiduciary concept to analyze cases when learning analytics violate an institution's responsibility to its students.

Journal ArticleDOI
TL;DR: A survey of recent research publications that use Soft Computing methods to answer education-related problems based on the analysis of educational data ‘mined’ mainly from interactive/e-learning systems finds that top research questions in education today seeking answers through soft computing methods refer directly to the issue of quality.
Abstract: The aim of this paper is to survey recent research publications that use Soft Computing methods to answer education-related problems based on the analysis of educational data ‘mined’ mainly from interactive/e-learning systems. Such systems are known to generate and store large volumes of data that can be exploited to assess the learner, the system and the quality of the interaction between them. Educational Data Mining (EDM) and Learning Analytics (LA) are two distinct and yet closely related research areas that focus on this data aiming to address open education-related questions or issues. Besides ‘classic’ data analysis methods such as clustering, classification, identification or regression/analysis of variances, soft computing methods are often employed by EDM and LA researchers to achieve their various tasks. Their very nature as iterative optimization algorithms that avoid the exhaustive search of the solutions space and go for possibly suboptimal solutions yet at realistic time and effort, along with their heavy reliance on rich data sets for training, make soft computing methods ideal tools for the EDM or LA type of problems. Decision trees, random forests, artificial neural networks, fuzzy logic, support vector machines and genetic/evolutionary algorithms are a few examples of soft computing approaches that, given enough data, can successfully deal with uncertainty, qualitatively stated problems and incomplete, imprecise or even contradictory data sets – features that the field of education shares with all humanities/social sciences fields. The present review focuses, therefore, on recent EDM and LA research that employs at least one soft computing method, and aims to identify (i) the major education problems/issues addressed and, consequently, research goals/objectives set, (ii) the learning contexts/settings within which relevant research and educational interventions take place, (iii) the relation between classic and soft computing methods employed to solve specific problems/issues, and (iv) the means of dissemination (publication journals) of the relevant research results. Selection and analysis of a body of 300 journal publications reveals that top research questions in education today seeking answers through soft computing methods refer directly to the issue of quality – a critical issue given the currently dominant educational/pedagogical models that favor e-learning or computer- or technology-mediated learning contexts. Moreover, results identify the most frequently used methods and tools within EDM/LA research and, comparatively, within their soft computing subsets, along with the major journals relevant research is being published worldwide. Weaknesses and issues that need further attention in order to fully exploit the benefits of research results to improve both the learning experience and the learning outcomes are discussed in the conclusions.

Journal ArticleDOI
TL;DR: The suitability of using decision trees in data originating from large-scale assessments is analyzed, and the obtained factors associated with school effectiveness are examined.

Journal ArticleDOI
TL;DR: Practical information regarding the main issues and a guideline to fully utilise e‑learning for policy makers and e-learning developers, particularly in newly established institutions or developing countries are offered.
Abstract: Electronic learning (e‑learning) plays a significant role in improving the efficiency of the education process. However, in many cases in developing countries, technology transfer without consideration of technology acceptance factors has limited the impact of e‑learning and the expected outcome of the education process. Therefore, this shift in learning method has been met with low enthusiasm from academic staff and students owing to its low perceived usefulness and perceived ease‑of‑use. The University of Kufa (UoK) in Iraq is considered a good case study because it has implemented the e‑learning platform since 2013. The UoK platform is based on open‑source Moodle owing to the latter’s advantages, such as low implementation cost, open community for support and continuous update and development. To identify and evaluate the challenges, this study uses a questionnaire survey that targets the level of adoption, implementation, familiarity and technology acceptance of staff and students. A total of 242 educators participate in the survey, and the data are subsequently analysed. Important information is extracted using data mining techniques, namely clustering and decision trees. One of the main crucial factors extracted from the analysis results is the perception that social media is easier to use compared with a dedicated e‑learning platform such as Moodle. This factor may also discourage educators/learners from adopting an offered e‑learning platform, regardless of actual usefulness, motivation and training programs. Therefore, this paper offers practical information regarding the main issues and a guideline to fully utilise e‑learning for policy makers and e‑learning developers, particularly in newly established institutions or developing countries.

Journal ArticleDOI
01 Nov 2020
TL;DR: In this article, the authors compared the performances of several supervised machine learning algorithms, such as Decision Tree, Naive Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbour, Sequential Minimal Optimization and Neural Network, and found that logistic regression classifier is the most accurate in predicting the exact final grades of students.
Abstract: Higher education institutions aim to forecast student success which is an important research subject. Forecasting student success can enable teachers to prevent students from dropping out before final examinations, identify those who need additional help and boost institution ranking and prestige. Machine learning techniques in educational data mining aim to develop a model for discovering meaningful hidden patterns and exploring useful information from educational settings. The key traditional characteristics of students (demographic, academic background and behavioural features) are the main essential factors that can represent the training dataset for supervised machine learning algorithms. In this study, we compared the performances of several supervised machine learning algorithms, such as Decision Tree, Naive Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbour, Sequential Minimal Optimisation and Neural Network. We trained a model by using datasets provided by courses in the bachelor study programmes of the College of Computer Science and Information Technology, University of Basra, for academic years 2017–2018 and 2018–2019 to predict student performance on final examinations. Results indicated that logistic regression classifier is the most accurate in predicting the exact final grades of students (68.7% for passed and 88.8% for failed).


Journal ArticleDOI
TL;DR: An Educational Data Mining approach to detect and analyze factors linked to academic performance and analyze its potential contribution to support the decision making processes regarding educational policies is proposed.
Abstract: International large-scale assessments, such as PISA, provide structured and static data. However, due to its extensive databases, several researchers place it as a reference in Big Data in Education. With the goal of exploring which factors at country, school and student level have a higher relevance in predicting student performance, this paper proposes an Educational Data Mining approach to detect and analyze factors linked to academic performance. To this end, we conducted a secondary data analysis and built decision trees (C4.5 algorithm) to obtain a predictive model of school performance. Specifically, we selected as predictor variables a set of socioeconomic, process and outcome variables from PISA 2018 and other sources (World Bank, 2020). Since the unit of analysis were schools from all the countries included in PISA 2018 (n = 21,903), student and teacher predictor variables were imputed to the school database. Based on the available student performance scores in Reading, Math, and Science, we applied k-means clustering to obtain a categorized (three categories) target variable of global school performance. Results show the existence of two main branches in the decision tree, split according to the schools' mean socioeconomic status (SES). While performance in high-SES schools is influenced by educational factors such as metacognitive strategies or achievement motivation, performance in low-SES schools is affected in greater measure by country-level socioeconomic indicators such as GDP, and individual educational indicators are relegated to a secondary level. Since these evidences are in line and delve into previous research, this work concludes by analyzing its potential contribution to support the decision making processes regarding educational policies.

Journal ArticleDOI
TL;DR: In this article, a multi-split approach based on Gini index and p-value was adopted to predict students' academic performance at two stages of course delivery (20% and 50% respectively).
Abstract: Predicting students’ academic performance has been a research area of interest in recent years, with many institutions focusing on improving the students’ performance and the education quality. The analysis and prediction of students’ performance can be achieved using various data mining techniques. Moreover, such techniques allow instructors to determine possible factors that may affect the students’ final marks. To that end, this work analyzes two different undergraduate datasets at two different universities. Furthermore, this work aims to predict the students’ performance at two stages of course delivery (20% and 50% respectively). This analysis allows for properly choosing the appropriate machine learning algorithms to use as well as optimize the algorithms’ parameters. Furthermore, this work adopts a systematic multi-split approach based on Gini index and p-value. This is done by optimizing a suitable bagging ensemble learner that is built from any combination of six potential base machine learning algorithms. It is shown through experimental results that the posited bagging ensemble models achieve high accuracy for the target group for both datasets.

Journal ArticleDOI
TL;DR: The results obtained show that it is only feasible to directly transfer predictive models or apply them to different courses with an acceptable accuracy and without losing portability under some circumstances.
Abstract: Predicting students’ academic performance is one of the older challenges faced by the educational scientific community. However, most of the research carried out in this area has focused on obtaining the best accuracy models for their specific single courses and only a few works have tried to discover under which circumstances a prediction model built on a source course can be used in other different but similar courses. Our motivation in this work is to study the portability of models obtained directly from Moodle logs of 24 university courses. The proposed method intends to check if grouping similar courses by the degree or the similar level of usage of activities provided by the Moodle logs, and if the use of numerical or categorical attributes affect in the portability of the prediction models. We have carried out two experiments by executing the well-known classification algorithm over all the datasets of the courses in order to obtain decision tree models and to test their portability to the other courses by comparing the obtained accuracy and loss of accuracy evaluation measures. The results obtained show that it is only feasible to directly transfer predictive models or apply them to different courses with an acceptable accuracy and without losing portability under some circumstances.

Proceedings ArticleDOI
23 Mar 2020
TL;DR: The results suggest that complementing an ERS with an OLM can have a positive effect on student engagement and their perception about the effectiveness of the system despite potentially making the system harder to navigate.
Abstract: Educational recommender systems (ERSs) aim to adaptively recommend a broad range of personalised resources and activities to students that will most meet their learning needs. Commonly, ERSs operate as a "black box" and give students no insight into the rationale of their choice. Recent contributions from the learning analytics and educational data mining communities have emphasised the importance of transparent, understandable and open learner models (OLMs) that provide insight and enhance learners' understanding of interactions with learning environments. In this paper, we aim to investigate the impact of complementing ERSs with transparent and understandable OLMs that provide justification for their recommendations. We conduct a randomised control trial experiment using an ERS with two interfaces ("Non-Complemented Interface" and "Complemented Interface") to determine the effect of our approach on student engagement and their perception of the effectiveness of the ERS. Overall, our results suggest that complementing an ERS with an OLM can have a positive effect on student engagement and their perception about the effectiveness of the system despite potentially making the system harder to navigate. In some cases, complementing an ERS with an OLM has the negative consequence of decreasing engagement, understandability and sense of fairness.