Cheap Translation for Cross-Lingual Named Entity Recognition

doi:10.18653/V1/D17-1269

Home
/
Papers
/
Cheap Translation for Cross-Lingual Named Entity Recognition

Proceedings Article•DOI•

Cheap Translation for Cross-Lingual Named Entity Recognition

Stephen Mayhew¹, Chen-Tse Tsai¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2017-pp 2536-2545

TL;DR: A simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources, and makes use of a lexicon to “translate” annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there.

read less

Abstract: Recent work in NLP has attempted to deal with low-resource languages but still assumed a resource level that is not present for most languages, eg, the availability of Wikipedia in the target language We propose a simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources Our approach makes use of a lexicon to “translate” annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there Further, when Wikipedia is available in the target language, our method can enhance Wikipedia based methods to yield state-of-the-art NER results; we evaluate on 7 diverse languages, improving the state-of-the-art by an average of 55% F1 points With the minimal resources required, this is an extremely portable cross-lingual NER approach, as illustrated using a truly low-resource language, Uyghur

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

From zero to hero: On the limitations of zero-shot language transfer with multilingual transformers

[...]

Anne Lauscher¹, Vinit Ravishankar¹, Ivan Vulić¹, Goran Glavaš²•Institutions (2)

University of Mannheim¹, University of Cambridge²

01 May 2020

TL;DR: It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.

...read moreread less

Abstract: Massively multilingual transformers (MMTs) pretrained via language modeling (e.g., mBERT, XLM-R) have become a default paradigm for zero-shot language transfer in NLP, offering unmatched transfer performance. Current evaluations, however, verify their efficacy in transfers (a) to languages with sufficiently large pretraining corpora, and (b) between close languages. In this work, we analyze the limitations of downstream language transfer with MMTs, showing that, much like cross-lingual word embeddings, they are substantially less effective in resource-lean scenarios and for distant languages. Our experiments, encompassing three lower-level tasks (POS tagging, dependency parsing, NER) and two high-level tasks (NLI, QA), empirically correlate transfer performance with linguistic proximity between source and target languages, but also with the size of target language corpora used in MMT pretraining. Most importantly, we demonstrate that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.

...read moreread less

218 citations

Cites background from "Cheap Translation for Cross-Lingual..."

...Transfer paradigms based on discrete text representations include machine translation (MT) of target language text to the source language (or vice-versa) (Mayhew et al., 2017; Eger et al., 2018), and grounding texts from both languages in multilingual knowledge bases (KBs) (Navigli and Ponzetto, 2012; Lehmann et al....
[...]
...…paradigms based on discrete text representations include machine translation (MT) of target language text to the source language (or vice-versa) (Mayhew et al., 2017; Eger et al., 2018), and grounding texts from both languages in multilingual knowledge bases (KBs) (Navigli and Ponzetto, 2012;…...
[...]

Proceedings Article•DOI•

How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions

[...]

Goran Glavaš¹, Robert Litschko¹, Sebastian Ruder², Ivan Vulić³•Institutions (3)

University of Mannheim¹, Allen Institute for Artificial Intelligence², University of Cambridge³

01 Feb 2019

TL;DR: The authors evaluate both supervised and unsupervised cross-lingual word embeddings (CLEs) for bilingual lexicon induction (BLI), and empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance.

...read moreread less

Abstract: Cross-lingual word embeddings (CLEs) facilitate cross-lingual transfer of NLP models. Despite their ubiquitous downstream usage, increasingly popular projection-based CLE models are almost exclusively evaluated on bilingual lexicon induction (BLI). Even the BLI evaluations vary greatly, hindering our ability to correctly interpret performance and properties of different CLE models. In this work, we take the first step towards a comprehensive evaluation of CLE models: we thoroughly evaluate both supervised and unsupervised CLE models, for a large number of language pairs, on BLI and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. We empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance. We indicate the most robust supervised and unsupervised CLE models and emphasize the need to reassess simple baselines, which still display competitive performance across the board. We hope our work catalyzes further research on CLE evaluation and model analysis.

...read moreread less

165 citations

Proceedings Article•DOI•

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios.

[...]

Michael A. Hedderich¹, Lukas Lange², Heike Adel², Jannik Strötgen², Dietrich Klakow¹ - Show less +1 more•Institutions (2)

Saarland University¹, Bosch²

01 Jun 2021

TL;DR: A structured overview of methods that enable learning when training data is sparse including mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision are given.

...read moreread less

Abstract: Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

...read moreread less

138 citations

Cites background from "Cheap Translation for Cross-Lingual..."

...Mayhew et al. (2017), Fang and Cohn (2017) and Karamanolakis et al. (2020) propose systems with fewer requirements based on word translations, bilingual dictionaries and taskspecific seed words, respectively....
[...]

Proceedings Article•DOI•

Unsupervised Cross-Lingual Representation Learning

[...]

Sebastian Ruder¹, Anders Søgaard², Ivan Vulić³•Institutions (3)

National University of Ireland¹, University of Copenhagen², University of Cambridge³

01 Jul 2019

TL;DR: This tutorial focuses on how to induce weakly-supervised and unsupervised cross-lingual word representations in truly resource-poor settings where bilingual supervision cannot be guaranteed.

...read moreread less

Abstract: In this tutorial, we provide a comprehensive survey of the exciting recent work on cutting-edge weakly-supervised and unsupervised cross-lingual word representations. After providing a brief history of supervised cross-lingual word representations, we focus on: 1) how to induce weakly-supervised and unsupervised cross-lingual word representations in truly resource-poor settings where bilingual supervision cannot be guaranteed; 2) critical examinations of different training conditions and requirements under which unsupervised algorithms can and cannot work effectively; 3) more robust methods for distant language pairs that can mitigate instability issues and low performance for distant language pairs; 4) how to comprehensively evaluate such representations; and 5) diverse applications that benefit from cross-lingual word representations (e.g., MT, dialogue, cross-lingual sequence labeling and structured prediction applications, cross-lingual IR).

...read moreread less

108 citations

Posted Content•

Massively Multilingual Transfer for NER

[...]

Afshin Rahimi¹, Yuan Li¹, Trevor Cohn¹•Institutions (1)

University of Melbourne¹

01 Feb 2019-arXiv: Computation and Language

TL;DR: This article proposed two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively, for cross-lingual NER using a large number of models over one or more source languages.

...read moreread less

Abstract: In cross-lingual transfer, NLP models over one or more source languages are applied to a low-resource target language. While most prior work has used a single source model or a few carefully selected models, here we consider a `massive' setting with many such models. This setting raises the problem of poor transfer, particularly from distant languages. We propose two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively. Evaluating on named entity recognition, we show that our techniques are much more effective than strong baselines, including standard ensembling, and our unsupervised method rivals oracle selection of the single best individual model.

...read moreread less

108 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Moses: Open Source Toolkit for Statistical Machine Translation

[...]

Philipp Koehn¹, Hieu Hoang¹, Alexandra Birch¹, Chris Callison-Burch¹, Marcello Federico, Nicola Bertoldi, Brooke Cowan², Wade Shen², C. Corbett Moran², Richard Zens³, Chris Dyer⁴, Ondrej Bojar⁵, Alexandra Elena Constantin⁶, Evan Herbst⁷ - Show less +10 more•Institutions (7)

University of Edinburgh¹, Massachusetts Institute of Technology², RWTH Aachen University³, University of Maryland, College Park⁴, Charles University in Prague⁵, Williams College⁶, Cornell University⁷

25 Jun 2007

TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.

...read moreread less

Abstract: We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

...read moreread less

6,008 citations

"Cheap Translation for Cross-Lingual..." refers background in this paper

...It is effectively the same as standard phrase-based statistical machine translation systems (such as MOSES (Koehn et al., 2007)), except that the translation table is not induced from expensive parallel text, but is built from a lexicon, hence the name cheap translation....
[...]

Proceedings Article•

SRILM – An Extensible Language Modeling Toolkit

[...]

Andreas Stolcke

01 Jan 2002

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

4,904 citations

Proceedings Article•DOI•

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

[...]

Erik Tjong Kim Sang¹, Fien De Meulder¹•Institutions (1)

University of Antwerp¹

31 May 2003

TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.

...read moreread less

Abstract: We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.

...read moreread less

3,554 citations

Proceedings Article•DOI•

Design Challenges and Misconceptions in Named Entity Recognition

[...]

Lev Ratinov¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

04 Jun 2009

TL;DR: Some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system are analyzed, and several solutions to these challenges are developed.

...read moreread less

Abstract: We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior knowledge and how to use them within an NER system. In the process of comparing several solutions to these challenges we reach some surprising conclusions, as well as develop an NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task, the best reported result for this dataset.

...read moreread less

1,539 citations

"Cheap Translation for Cross-Lingual..." refers methods in this paper

...In all of our work we use the Illinois NER system (Ratinov and Roth, 2009) with standard features (forms, capitalization, affixes, word prior, word after, etc....
[...]

Proceedings Article•

A Simple, Fast, and Effective Reparameterization of IBM Model 2

[...]

Chris Dyer¹, Victor Chahuneau¹, Noah A. Smith¹•Institutions (1)

Carnegie Mellon University¹

01 Jun 2013

TL;DR: A simple log-linear reparameterization of IBM Model 2 that overcomes problems arising from Model 1’'s strong assumptions and Model 2’s overparameterization is presented.

...read moreread less

Abstract: We present a simple log-linear reparameterization of IBM Model 2 that overcomes problems arising from Model 1’s strong assumptions and Model 2’s overparameterization. Efficient inference, likelihood evaluation, and parameter estimation algorithms are provided. Training the model is consistently ten times faster than Model 4. On three large-scale translation tasks, systems built using our alignment model outperform IBM Model 4. An open-source implementation of the alignment model described in this paper is available from http://github.com/clab/fast align .

...read moreread less

1,006 citations

"Cheap Translation for Cross-Lingual..." refers methods in this paper

...We aligned the source-target data using fast align (Dyer et al., 2013), and projected labels across alignments.7 Since this is high-quality translation, we treat it as an upper bound on our technique, but with the caveat that the alignments can be noisy given the relatively small amount of text....
[...]
...We aligned the source-target data using fast align (Dyer et al., 2013), and projected labels across alignments....
[...]