A Report on the First Native Language Identification Shared Task

doi:10.18653/V1/W17-5007

Open AccessProceedings ArticleDOI

A Report on the First Native Language Identification Shared Task

Joel Tetreault, +2 more

- pp 48-57

Chats0

TLDR

The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy, and multiple classifier systems were the most effective in all tasks, with most based on traditional classifiers with lexical/syntactic features.

Abstract:

Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features.

Citations

PDF

Open Access

More filters

Posted Content

Language (Technology) is Power: A Critical Survey of "Bias" in NLP

Su Lin Blodgett, +3 more

- 28 May 2020 -

arXiv: Computation and Language

TL;DR: The authors survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing bias is an inherently normative process.

...read moreread less

Journal Article

Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review]

Anj Foley

- 01 Feb 2002 -

TESOL in context

TL;DR: Review(s) of: Learner English: A Teacher's Guide to Interference and Other Problems, Second Edition, by Michael Swan and Bernard Smith.

...read moreread less

Journal ArticleDOI

Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds

Xiaofei Lu, +1 more

- 01 Sep 2015 -

Journal of Second Language Writing

TL;DR: Differences in the syntactic complexity in English writing among college-level writers with different first language (L1) backgrounds are explored and varied patterns for L2 writing research and pedagogy and for automatic native language identification of learner texts are considered.

...read moreread less

Proceedings ArticleDOI

Do Characters Abuse More Than Words

Yashar Mehdad, +1 more

TL;DR: This study investigates the effectiveness of character-based features for abusive language detection in user-generated online comments, and shows that such methods outperform previous state-of-theart approaches and other strong baselines.

...read moreread less

Journal ArticleDOI

Toefl11: a corpus of non‐native english

Daniel Blanchard, +4 more

- 01 Dec 2013 -

ETS Research Report Series

TL;DR: A new corpus of non-native English writing will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

A re-examination of text categorization methods

Yiming Yang, +1 more

TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.

...read moreread less

Decision templates for multiple classi"er fusion: an experimental comparison

Ludmila I. Kuncheva, +2 more

TL;DR: This work presents here a simple rule for adapting the class combiner to the application and shows that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.

...read moreread less

Journal ArticleDOI

Decision templates for multiple classifier fusion: an experimental comparison.

Ludmila I. Kuncheva, +2 more

- 01 Feb 2001 -

Pattern Recognition

TL;DR: In this article, a simple rule for adapting the class combiner to the application is presented, where decision templates (one per class) are estimated with the same training set that is used for the set of classifiers.

...read moreread less

Journal ArticleDOI

Comparison of four approaches to automatic language identification of telephone speech

M.A. Zissman

- 01 Jan 1996 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: Four approaches for automatic language identification of speech utterances are compared: Gaussian mixture model (GMM) classification; single-language phone recognition followed by languaged dependent, interpolated n-gram language modeling (PRLM); parallel PRLM, which uses multiple single- language phone recognizers, each trained in a different language; and languagedependent parallel phone recognition (PPR).

...read moreread less

Proceedings Article

A New Dataset and Method for Automatically Grading ESOL Texts

Helen Yannakoudakis, +2 more

TL;DR: It is demonstrated how supervised discriminative machine learning techniques can be used to automate the assessment of 'English as a Second or Other Language' (ESOL) examination scripts by using rank preference learning to explicitly model the grade relationships between scripts.

...read moreread less

Collapse

Related Papers (5)

Toefl11: a corpus of non‐native english

Daniel Blanchard, +4 more

- 01 Dec 2013 -

ETS Research Report Series

A Report on the First Native Language Identification Shared Task

Citations

Language (Technology) is Power: A Critical Survey of "Bias" in NLP

Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review]

Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds

Do Characters Abuse More Than Words

Toefl11: a corpus of non‐native english

References

A re-examination of text categorization methods

Decision templates for multiple classi"er fusion: an experimental comparison

Decision templates for multiple classifier fusion: an experimental comparison.

Comparison of four approaches to automatic language identification of telephone speech

A New Dataset and Method for Automatically Grading ESOL Texts

Related Papers (5)

Toefl11: a corpus of non‐native english

Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification

Determining an author's native language by mining a text for errors

Can characters reveal your native language? A language-independent approach to native language identification

Exploiting Parse Structures for Native Language Identification