scispace - formally typeset
Open AccessProceedings ArticleDOI

A Report on the First Native Language Identification Shared Task

Reads0
Chats0
TLDR
The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy, and multiple classifier systems were the most effective in all tasks, with most based on traditional classifiers with lexical/syntactic features.
Abstract
Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Language (Technology) is Power: A Critical Survey of "Bias" in NLP

TL;DR: The authors survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing bias is an inherently normative process.
Journal Article

Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review]

Anj Foley
- 01 Feb 2002 - 
TL;DR: Review(s) of: Learner English: A Teacher's Guide to Interference and Other Problems, Second Edition, by Michael Swan and Bernard Smith.
Journal ArticleDOI

Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds

TL;DR: Differences in the syntactic complexity in English writing among college-level writers with different first language (L1) backgrounds are explored and varied patterns for L2 writing research and pedagogy and for automatic native language identification of learner texts are considered.
Proceedings ArticleDOI

Do Characters Abuse More Than Words

TL;DR: This study investigates the effectiveness of character-based features for abusive language detection in user-generated online comments, and shows that such methods outperform previous state-of-theart approaches and other strong baselines.
Journal ArticleDOI

Toefl11: a corpus of non‐native english

TL;DR: A new corpus of non-native English writing will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring.
References
More filters
Proceedings ArticleDOI

A re-examination of text categorization methods

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.

Decision templates for multiple classi"er fusion: an experimental comparison

TL;DR: This work presents here a simple rule for adapting the class combiner to the application and shows that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.
Journal ArticleDOI

Decision templates for multiple classifier fusion: an experimental comparison.

TL;DR: In this article, a simple rule for adapting the class combiner to the application is presented, where decision templates (one per class) are estimated with the same training set that is used for the set of classifiers.
Journal ArticleDOI

Comparison of four approaches to automatic language identification of telephone speech

TL;DR: Four approaches for automatic language identification of speech utterances are compared: Gaussian mixture model (GMM) classification; single-language phone recognition followed by languaged dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single- language phone recognizers, each trained in a different language; and languagedependent parallel phone recognition (PPR).
Proceedings Article

A New Dataset and Method for Automatically Grading ESOL Texts

TL;DR: It is demonstrated how supervised discriminative machine learning techniques can be used to automate the assessment of 'English as a Second or Other Language' (ESOL) examination scripts by using rank preference learning to explicitly model the grade relationships between scripts.
Related Papers (5)