scispace - formally typeset
Open AccessJournal Article

Comparison of Dimension Reduction Methods for Automated Essay Grading

Tuomo Kakkonen, +3 more
- 01 Jul 2008 - 
- Vol. 11, Iss: 3, pp 275-288
Reads0
Chats0
TLDR
The results show that the use of learning materials as training data for the grading model outperforms the k-NN-based grading methods and the division of the learning materials in the training data is crucial.
Abstract
Automatic Essay Assessor (AEA) is a system that utilizes information retrieval techniques such as Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), and Latent Dirichlet Allocation (LDA) for automatic essay grading. The system uses learning materials and relatively few teacher-graded essays for calibrating the scoring mechanism before grading. We performed a series of experiments using LSA, PLSA and LDA for document comparisons in AEA. In addition to comparing the methods on a theoretical level, we compared the applicability of LSA, PLSA, and LDA to essay grading with empirical data. The results show that the use of learning materials as training data for the grading model outperforms the k-NN-based grading methods. In addition to this, we found that using LSA yielded slightly more accurate grading than PLSA and LDA. We also found that the division of the learning materials in the training data is crucial. It is better to divide learning materials into sentences than paragraphs.

read more

Citations
More filters
Journal ArticleDOI

A Survey of Topic Modeling in Text Mining

TL;DR: Different models, such as topic over time (TOT), dynamic topic models (DTM), multiscale topic tomography, dynamic topic correlation detection, detecting topic evolution in scientific literature, etc. are discussed.
Journal ArticleDOI

An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis

TL;DR: PlaGate is described, a novel tool that can be integrated with existing plagiarism detection tools to improve plagiarism Detection performance and implements a new approach for investigating the similarity between source-code files with a view to gathering evidence for proving plagiarism.
Journal ArticleDOI

Performance Analysis of Multi-Motion Sensor Behavior for Active Smartphone Authentication

TL;DR: This paper investigates the reliability and applicability of using motion-sensor behavior for active and continuous smartphone authentication across various operational scenarios, and presents a systematic evaluation of the distinctiveness and permanence properties of the behavior.
Journal ArticleDOI

The shifting sands of disciplinary development: Analyzing North American Library and Information Science dissertations using latent Dirichlet allocation

TL;DR: The findings indicate that the main topics in LIS have changed substantially from those in the initial period (1930–1969) to the present (2000–2009), including the diminishing use of the word library.
Journal ArticleDOI

A tool for addressing construct identity in literature reviews and meta-analyses

TL;DR: The construct identity detector (CID) is designed and evaluated, the first tool with large-scale construct identity detection properties and the first tools that does not require respondent data.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Journal ArticleDOI

Machine learning in automated text categorization

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Related Papers (5)