scispace - formally typeset
Search or ask a question
Author

Le An Ha

Bio: Le An Ha is an academic researcher from University of Wolverhampton. The author has contributed to research in topics: Gaze & Computational linguistics. The author has an hindex of 12, co-authored 37 publications receiving 769 citations.

Papers
More filters
Proceedings ArticleDOI
31 May 2003
TL;DR: The results from the conducted evaluation suggest that the new procedure is very effective saving time and labour considerably and that the test items produced with the help of the program are not of inferior quality to those produced manually.
Abstract: This paper describes a novel computer-aided procedure for generating multiple-choice tests from electronic instructional documents. In addition to employing various NLP techniques including term extraction and shallow parsing, the program makes use of language resources such as a corpus and WordNet. The system generates test questions and distractors, offering the user the option to post-edit the test items.

226 citations

Journal ArticleDOI
TL;DR: A novel computer-aided procedure for generating multiple-choice test items from electronic documents that makes use of language resources such as corpora and ontologies, and saves both time and production costs.
Abstract: This paper describes a novel computer-aided procedure for generating multiple-choice test items from electronic documents. In addition to employing various Natural Language Processing techniques, including shallow parsing, automatic term extraction, sentence transformation and computing of semantic distance, the system makes use of language resources such as corpora and ontologies. It identifies important concepts in the text and generates questions about these concepts as well as multiple-choice distractors, offering the user the option to post-edit the test items by means of a user-friendly interface. In assisting test developers to produce items in a fast and expedient manner without compromising quality, the tool saves both time and production costs.

216 citations

Proceedings ArticleDOI
31 Mar 2009
TL;DR: The evaluation results show that the methods based on Lin's measure and on the mixed strategy outperform the rest, albeit not in a statistically significant fashion.
Abstract: Mitkov and Ha (2003) and Mitkov et al. (2006) offered an alternative to the lengthy and demanding activity of developing multiple-choice test items by proposing an NLP-based methodology for construction of test items from instructive texts such as textbook chapters and encyclopaedia entries. One of the interesting research questions which emerged during these projects was how better quality distractors could automatically be chosen. This paper reports the results of a study seeking to establish which similarity measures generate better quality distractors of multiple-choice tests. Similarity measures employed in the procedure of selection of distractors are collocation patterns, four different methods of WordNet-based semantic similarity (extended gloss overlap measure, Leacock and Chodorow's, Jiang and Conrath's as well as Lin's measures), distributional similarity, phonetic similarity as well as a mixed strategy combining the aforementioned measures. The evaluation results show that the methods based on Lin's measure and on the mixed strategy outperform the rest, albeit not in a statistically significant fashion.

69 citations

Proceedings ArticleDOI
15 Jul 2006
TL;DR: The results of a pilot study on generating Multiple-Choice Test Items from medical text are reported and the main tasks involved and how the system was evaluated are discussed.
Abstract: We report the results of a pilot study on generating Multiple-Choice Test Items from medical text and discuss the main tasks involved in this process and how our system was evaluated by domain experts.

66 citations

Proceedings ArticleDOI
23 Apr 2018
TL;DR: Preliminary results show that the differences in the way people with autism process web content could be used for the future development of serious games for autism screening and the effects of the type of the task performed are explored.
Abstract: The ASD diagnosis requires a long, elaborate, and expensive procedure, which is subjective and is currently restricted to behavioural, historical, and parent-report information. In this paper, we present an alternative way for detecting the condition based on the atypical visual-attention patterns of people with autism. We collect gaze data from two different kinds of tasks related to processing of information from web pages: Browsing and Searching. The gaze data is then used to train a machine learning classifier whose aim is to distinguish between participants with autism and a control group of participants without autism. In addition, we explore the effects of the type of the task performed, different approaches to defining the areas of interest, gender, visual complexity of the web pages and whether or not an area of interest contained the correct answer to a searching task. Our best-performing classifier achieved 0.75 classification accuracy for a combination of selected web pages using all gaze features. These preliminary results show that the differences in the way people with autism process web content could be used for the future development of serious games for autism screening. The gaze data, R code, visual stimuli and task descriptions are made freely available for replication purposes.

42 citations


Cited by
More filters
Proceedings Article
02 Jun 2010
TL;DR: This work uses manually written rules to perform a sequence of general purpose syntactic transformations to turn declarative sentences into questions, which are ranked by a logistic regression model trained on a small, tailored dataset consisting of labeled output from the system.
Abstract: We address the challenge of automatically generating questions from reading materials for educational practice and assessment. Our approach is to overgenerate questions, then rank them. We use manually written rules to perform a sequence of general purpose syntactic transformations (e.g., subject-auxiliary inversion) to turn declarative sentences into questions. These questions are then ranked by a logistic regression model trained on a small, tailored dataset consisting of labeled output from our system. Experimental results show that ranking nearly doubles the percentage of questions rated as acceptable by annotators, from 27% of all questions to 52% of the top ranked 20% of questions.

426 citations

Patent
15 Mar 2010
TL;DR: In this article, a system, method and/or computer program product for automatically generating questions and answers based on any corpus of data is presented, given a collection of textual documents, automatically generating collections of questions about the documents together with answers to those questions.
Abstract: A system, method and/or computer program product for automatically generating questions and answers based on any corpus of data. The computer system, given a collection of textual documents, automatically generates collections of questions about the documents together with answers to those questions. In particular, such a process can be applied to the so called ‘open’ domain, where the type of the corpus is not given in advance, and neither is the ontology of the corpus. The system improves the exploring of large bodies of textual information. Applications implementing the system and method include new types of tutoring systems, educational question-answering games, national security and business analysis systems, etc.

371 citations

Proceedings ArticleDOI
19 Mar 2016
TL;DR: This paper introduces the novel task of Visual Question Generation, where the system is tasked with asking a natural and engaging question when shown an image, and provides three datasets which cover a variety of images from object-centric to event-centric.
Abstract: There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.

300 citations

Journal ArticleDOI
TL;DR: This paper contains a large literature review in the research field of Text Summarisation (TS) based on Human Language Technologies, where the existing methodologies and systems are explained, as well as new research that has emerged concerning the automatic evaluation of summaries’ quality.
Abstract: This paper contains a large literature review in the research field of Text Summarisation (TS) based on Human Language Technologies (HLT). TS helps users manage the vast amount of information available, by condensing documents' content and extracting the most relevant facts or topics included in them. The rapid development of emerging technologies poses new challenges to this research field, which still need to be solved. Therefore, it is essential to analyse its progress over the years, and provide an overview of the past, present and future directions, highlighting the main advances achieved and outlining remaining limitations. With this purpose, several important aspects are addressed within the scope of this survey. On the one hand, the paper aims at giving a general perspective on the state-of-the-art, describing the main concepts, as well as different summarisation approaches, and relevant international forums. Furthermore, it is important to stress upon the fact that the birth of new requirements and scenarios has led to new types of summaries with specific purposes (e.g. sentiment-based summaries), and novel domains within which TS has proven to be also suitable for (e.g. blogs). In addition, TS is successfully combined with a number of intelligent systems based on HLT (e.g. information retrieval, question answering, and text classification). On the other hand, a deep study of the evaluation of summaries is also conducted in this paper, where the existing methodologies and systems are explained, as well as new research that has emerged concerning the automatic evaluation of summaries' quality. Finally, some thoughts about TS in general and its future will encourage the reader to think of novel approaches, applications and lines to conduct research in the next years. The analysis of these issues allows the reader to have a wide and useful background on the main important aspects of this research field.

234 citations

Journal ArticleDOI
TL;DR: This article provides a comprehensive and comparative overview of question answering technology and suggests a general question answering architecture that steadily increases the complexity of the representation level of questions and information objects.

227 citations