scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data mining in education

TL;DR: Key milestones and the current state of affairs in the field of EDM are reviewed, together with specific applications, tools, and future insights.
Abstract: Applying data mining DM in education is an emerging interdisciplinary research field also known as educational data mining EDM. It is concerned with developing methods for exploring the unique types of data that come from educational environments. Its goal is to better understand how students learn and identify the settings in which they learn to improve educational outcomes and to gain insights into and explain educational phenomena. Educational information systems can store a huge amount of potential data from multiple sources coming in different formats and at different granularity levels. Each particular educational problem has a specific objective with special characteristics that require a different treatment of the mining problem. The issues mean that traditional DM techniques cannot be applied directly to these types of data and problems. As a consequence, the knowledge discovery process has to be adapted and some specific DM techniques are needed. This paper introduces and reviews key milestones and the current state of affairs in the field of EDM, together with specific applications, tools, and future insights. © 2012 Wiley Periodicals, Inc.
Citations
More filters
01 Oct 2012
TL;DR: This issue brief is intended to help policymakers and administrators understand how analytics and data mining have been—and can be—applied for educational improvement.
Abstract: The authors are grateful for the deliberations of our technical working group (TWG) of academic experts in educational data mining and learning analytics. These experts provided constructive guidance and comments for this issue brief. The TWG comprised Ryan S. In data mining and data analytics, tools and techniques once confined to research laboratories are being adopted by forward-looking industries to generate business intelligence for improving decision making. Higher education institutions are beginning to use analytics for improving the services they provide and for increasing student grades and retention. The U.S. Department of Education's National Education Technology Plan, as one part of its model for 21st-century learning powered by technology, envisions ways of using data from online learning systems to improve instruction. With analytics and data mining experiments in education starting to proliferate, sorting out fact from fiction and identifying research possibilities and practical applications are not easy. This issue brief is intended to help policymakers and administrators understand how analytics and data mining have been—and can be—applied for educational improvement. At present, educational data mining tends to focus on developing new tools for discovering patterns in data. These patterns are generally about the microconcepts involved in learning: one-digit multiplication, subtraction with carries, and so on. Learning analytics—at least as it is currently contrasted with data mining—focuses on applying tools and techniques at larger scales, such as in courses and at schools and postsecondary institutions. But both disciplines work with patterns and prediction: If we can discern the pattern in the data and make sense of what is happening, we can predict what should come next and take the appropriate action. Educational data mining and learning analytics are used to research and build models in several areas that can influence online learning systems. One area is user modeling, which encompasses what a learner knows, what a learner's behavior and motivation are, what the user experience is like, and how satisfied users are with online learning. At the simplest level, analytics can detect when a student in an online course is going astray and nudge him or her on to a course correction. At the most complex, they hold promise of detecting boredom from patterns of key clicks and redirecting the student's attention. Because these data are gathered in real time, there is a real possibility of continuous improvement via multiple feedback loops that operate at different time scales—immediate to the student …

509 citations

Journal Article
TL;DR: An overview of empirical evidence behind key objectives of the potential adoption of LA/EDM in generic educational strategic planning is presented and thoughts on possible uncharted key questions to investigate are set.
Abstract: This paper aims to provide the reader with a comprehensive background for understanding current knowledge on Learning Analytics (LA) and Educational Data Mining (EDM) and its impact on adaptive learning. It constitutes an overview of empirical evidence behind key objectives of the potential adoption of LA/EDM in generic educational strategic planning. We examined the literature on experimental case studies conducted in the domain during the past six years (2008-2013). Search terms identified 209 mature pieces of research work, but inclusion criteria limited the key studies to 40. We analyzed the research questions, methodology and findings of these published papers and categorized them accordingly. We used non-statistical methods to evaluate and interpret findings of the collected studies. The results have highlighted four distinct major directions of the LA/EDM empirical research. We discuss on the emerged added value of LA/EDM research and highlight the significance of further implications. Finally, we set our thoughts on possible uncharted key questions to investigate both from pedagogical and technical considerations.

507 citations


Cites background or methods from "Data mining in education"

  • ...Finally, we set our thoughts on possible uncharted key questions to investigate both from pedagogical and technical considerations....

    [...]

  • ...EDM is concerned with “developing, researching, and applying computerized methods to detect patterns in large collections of educational data that would otherwise be hard or impossible to analyze due to the enormous volume of data within which they exist” (Romero & Ventura, 2013, p. 12)....

    [...]

  • ...Romero and Ventura (2013) presented an up-to-date comprehensive overview of the current state in data mining in education....

    [...]

  • ...However, they differ in their origins, techniques, fields of emphasis and types of discovery (Chatti et al., 2012; Romero & Ventura, 2013; Siemens & Baker, 2012)....

    [...]

Journal ArticleDOI
TL;DR: To determine how the selection of instances and attributes, the use of different classification algorithms and the date when data is gathered affect the accuracy and comprehensibility of the prediction, a new Moodle module for gathering forum indicators was developed and different executions were carried out.
Abstract: On-line discussion forums constitute communities of people learning from each other, which not only inform the students about their peers' doubts and problems but can also inform instructors about their students' knowledge of the course contents In fact, nowadays there is increasing interest in the use of discussion forums as an indicator of student performance In this respect, this paper proposes the use of different data mining approaches for improving prediction of students' final performance starting from participation indicators in both quantitative, qualitative and social network forums Our objective is to determine how the selection of instances and attributes, the use of different classification algorithms and the date when data is gathered affect the accuracy and comprehensibility of the prediction A new Moodle's module for gathering forum indicators was developed and different executions were carried out using real data from 114 university students during a first-year course in computer science A representative set of traditional classification algorithms have been used and compared versus classification via clustering algorithms for predicting whether students will pass or fail the course on the basis of data about their forum usage The results obtained indicate the suitability of performing both a final prediction at the end of the course and an early prediction before the end of the course; of applying clustering plus class association rules mining instead of traditional classification for obtaining highly interpretable student performance models; and of using a subset of attributes instead of all available attributes, and not all forum messages but only students' messages with content related to the subject of the course for improving classification accuracy

485 citations

Journal ArticleDOI
TL;DR: This empirical contribution provides an application of Buckingham Shum and Deakin Crick's theoretical framework of dispositional learning analytics: an infrastructure that combines learning dispositions data with data extracted from computer-assisted, formative assessments and LMSs.

352 citations


Cites background from "Data mining in education"

  • ...…at least five of these objectives of applying learning analytics(as described in Narciss & Huth, 2006)(as described in Narciss & Huth, 2006), we will focus in this contribution on the first objective: predictive modelling of performance and learning behaviour (Baker, 2010; Sao Pedro et al., 2013)....

    [...]

  • ...The prime aim of the analysis is predictive modelling (Baker, 2010; Sao Pedro et al., 2013; Wolff et al., 2013), with a focus on the role each of these data sources can play in generating timely, informative feedback for students....

    [...]

  • ...With the increased availability of large datasets, powerful analytics engines (Tobarra et al., 2014), and skilfully designed visualisations of analytics results (González-Torres, García-Peñalvo, & Therón, 2013), institutions may be able to use the experience of the past to create supportive, insightful models of primary (and perhaps real-time) learning processes (Author B, Submitted; Baker, 2010; Stiles, 2012)....

    [...]

  • ...…designed visualisations of analytics results (González-Torres, García-Peñalvo, & Therón, 2013), institutions may be able to use the experience of the past to create supportive, insightful models of primary (and perhaps real-time) learning processes (Author B, Submitted; Baker, 2010; Stiles, 2012)....

    [...]

  • ...The prime aim of the analysis is predictive modelling (Baker, 2010; Sao Pedro, Baker, Gobert, Montalvo, & Nakama, 2013), with a focus on the roles of (each of) 100+ predictor variables from the several data sources can play in generating timely, informative feedback for students....

    [...]

Journal ArticleDOI
TL;DR: The current state of the art in data mining in education is provided by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, themain objectives, and the future trends in this research area.
Abstract: This survey is an updated and improved version of the previous one published in 2013 in this journal with the title “data mining in education”. It reviews in a comprehensible and very general way how Educational Data Mining and Learning Analytics have been applied over educational data. In the last decade, this research area has evolved enormously and a wide range of related terms are now used in the bibliography such as Academic Analytics, Institutional Analytics, Teaching Analytics, Data‐Driven Education, Data‐Driven Decision‐Making in Education, Big Data in Education, and Educational Data Science. This paper provides the current state of the art by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, the main objectives, and the future trends in this research area.

350 citations

References
More filters
Journal ArticleDOI
01 Nov 2010
TL;DR: The most relevant studies carried out in educational data mining to date are surveyed and the different groups of user, types of educational environments, and the data they provide are described.
Abstract: Educational data mining (EDM) is an emerging interdisciplinary research area that deals with the development of methods to explore data originating in an educational context. EDM uses computational approaches to analyze educational data in order to study educational questions. This paper surveys the most relevant studies carried out in this field to date. First, it introduces EDM and describes the different groups of user, types of educational environments, and the data they provide. It then goes on to list the most typical/common tasks in the educational environment that have been resolved through data-mining techniques, and finally, some of the most promising future lines of research are discussed.

1,723 citations

Journal ArticleDOI
TL;DR: An effort to model students' changing knowledge state during skill acquisition and a series of studies is reviewed that examine the empirical validity of knowledge tracing and has led to modifications in the process.
Abstract: This paper describes an effort to model students' changing knowledge state during skill acquisition. Students in this research are learning to write short programs with the ACT Programming Tutor (APT). APT is constructed around a production rule cognitive model of programming knowledge, called theideal student model. This model allows the tutor to solve exercises along with the student and provide assistance as necessary. As the student works, the tutor also maintains an estimate of the probability that the student has learned each of the rules in the ideal model, in a process calledknowledge tracing. The tutor presents an individualized sequence of exercises to the student based on these probability estimates until the student has ‘mastered’ each rule. The programming tutor, cognitive model and learning and performance assumptions are described. A series of studies is reviewed that examine the empirical validity of knowledge tracing and has led to modifications in the process. Currently the model is quite successful in predicting test performance. Further modifications in the modeling process are discussed that may improve performance levels.

1,668 citations

Journal ArticleDOI
TL;DR: This paper surveys the application of data mining to traditional educational systems, particular web- based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems.
Abstract: Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.

1,357 citations

Proceedings ArticleDOI
01 Oct 2009
TL;DR: This paper reviewed the history and current trends in the field of EDM and discussed trends and shifts in the research conducted by this community, and discussed the increased emphasis on prediction, the emergence of work using existing models to make scientific discoveries, and the reduction in the frequency of relationship mining within the EDM community.
Abstract: We review the history and current trends in the field of Educational Data Mining (EDM). We consider the methodological profile of research in the early years of EDM, compared to in 2008 and 2009, and discuss trends and shifts in the research conducted by this community. In particular, we discuss the increased emphasis on prediction, the emergence of work using existing models to make scientific discoveries ("discovery with models"), and the reduction in the frequency of relationship mining within the EDM community. We discuss two ways that researchers have attempted to categorize the diversity of research in educational data mining research, and review the types of research problems that these methods have been used to address. The most cited papers in EDM between 1995 and 2005 are listed, and their influence on the EDM community (and beyond the EDM community) is discussed.

1,217 citations

Journal ArticleDOI
TL;DR: This work describes the full process for mining e-learning data step by step as well as how to apply the main data mining techniques used, such as statistics, visualization, classification, clustering and association rule mining of Moodle data.
Abstract: Educational data mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. This work is a survey of the specific application of data mining in learning management systems and a case study tutorial with the Moodle system. Our objective is to introduce it both theoretically and practically to all users interested in this new research area, and in particular to online instructors and e-learning administrators. We describe the full process for mining e-learning data step by step as well as how to apply the main data mining techniques used, such as statistics, visualization, classification, clustering and association rule mining of Moodle data. We have used free data mining tools so that any user can immediately begin to apply data mining without having to purchase a commercial tool or program a specific personalized tool.

1,049 citations