scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Cheap Translation for Cross-Lingual Named Entity Recognition

TL;DR: A simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources, and makes use of a lexicon to “translate” annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there.
Abstract: Recent work in NLP has attempted to deal with low-resource languages but still assumed a resource level that is not present for most languages, eg, the availability of Wikipedia in the target language We propose a simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources Our approach makes use of a lexicon to “translate” annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there Further, when Wikipedia is available in the target language, our method can enhance Wikipedia based methods to yield state-of-the-art NER results; we evaluate on 7 diverse languages, improving the state-of-the-art by an average of 55% F1 points With the minimal resources required, this is an extremely portable cross-lingual NER approach, as illustrated using a truly low-resource language, Uyghur
Citations
More filters
Proceedings ArticleDOI
01 May 2020
TL;DR: It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.
Abstract: Massively multilingual transformers (MMTs) pretrained via language modeling (e.g., mBERT, XLM-R) have become a default paradigm for zero-shot language transfer in NLP, offering unmatched transfer performance. Current evaluations, however, verify their efficacy in transfers (a) to languages with sufficiently large pretraining corpora, and (b) between close languages. In this work, we analyze the limitations of downstream language transfer with MMTs, showing that, much like cross-lingual word embeddings, they are substantially less effective in resource-lean scenarios and for distant languages. Our experiments, encompassing three lower-level tasks (POS tagging, dependency parsing, NER) and two high-level tasks (NLI, QA), empirically correlate transfer performance with linguistic proximity between source and target languages, but also with the size of target language corpora used in MMT pretraining. Most importantly, we demonstrate that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.

218 citations


Cites background from "Cheap Translation for Cross-Lingual..."

  • ...Transfer paradigms based on discrete text representations include machine translation (MT) of target language text to the source language (or vice-versa) (Mayhew et al., 2017; Eger et al., 2018), and grounding texts from both languages in multilingual knowledge bases (KBs) (Navigli and Ponzetto, 2012; Lehmann et al....

    [...]

  • ...…paradigms based on discrete text representations include machine translation (MT) of target language text to the source language (or vice-versa) (Mayhew et al., 2017; Eger et al., 2018), and grounding texts from both languages in multilingual knowledge bases (KBs) (Navigli and Ponzetto, 2012;…...

    [...]

Proceedings ArticleDOI
01 Feb 2019
TL;DR: The authors evaluate both supervised and unsupervised cross-lingual word embeddings (CLEs) for bilingual lexicon induction (BLI), and empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance.
Abstract: Cross-lingual word embeddings (CLEs) facilitate cross-lingual transfer of NLP models. Despite their ubiquitous downstream usage, increasingly popular projection-based CLE models are almost exclusively evaluated on bilingual lexicon induction (BLI). Even the BLI evaluations vary greatly, hindering our ability to correctly interpret performance and properties of different CLE models. In this work, we take the first step towards a comprehensive evaluation of CLE models: we thoroughly evaluate both supervised and unsupervised CLE models, for a large number of language pairs, on BLI and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. We empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI may hurt downstream performance. We indicate the most robust supervised and unsupervised CLE models and emphasize the need to reassess simple baselines, which still display competitive performance across the board. We hope our work catalyzes further research on CLE evaluation and model analysis.

165 citations

Proceedings ArticleDOI
01 Jun 2021
TL;DR: A structured overview of methods that enable learning when training data is sparse including mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision are given.
Abstract: Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

138 citations


Cites background from "Cheap Translation for Cross-Lingual..."

  • ...Mayhew et al. (2017), Fang and Cohn (2017) and Karamanolakis et al. (2020) propose systems with fewer requirements based on word translations, bilingual dictionaries and taskspecific seed words, respectively....

    [...]

Proceedings ArticleDOI
01 Jul 2019
TL;DR: This tutorial focuses on how to induce weakly-supervised and unsupervised cross-lingual word representations in truly resource-poor settings where bilingual supervision cannot be guaranteed.
Abstract: In this tutorial, we provide a comprehensive survey of the exciting recent work on cutting-edge weakly-supervised and unsupervised cross-lingual word representations. After providing a brief history of supervised cross-lingual word representations, we focus on: 1) how to induce weakly-supervised and unsupervised cross-lingual word representations in truly resource-poor settings where bilingual supervision cannot be guaranteed; 2) critical examinations of different training conditions and requirements under which unsupervised algorithms can and cannot work effectively; 3) more robust methods for distant language pairs that can mitigate instability issues and low performance for distant language pairs; 4) how to comprehensively evaluate such representations; and 5) diverse applications that benefit from cross-lingual word representations (e.g., MT, dialogue, cross-lingual sequence labeling and structured prediction applications, cross-lingual IR).

108 citations

Posted Content
TL;DR: This article proposed two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively, for cross-lingual NER using a large number of models over one or more source languages.
Abstract: In cross-lingual transfer, NLP models over one or more source languages are applied to a low-resource target language. While most prior work has used a single source model or a few carefully selected models, here we consider a `massive' setting with many such models. This setting raises the problem of poor transfer, particularly from distant languages. We propose two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively. Evaluating on named entity recognition, we show that our techniques are much more effective than strong baselines, including standard ensembling, and our unsupervised method rivals oracle selection of the single best individual model.

108 citations

References
More filters
Proceedings ArticleDOI
25 Jun 2007
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
Abstract: We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

6,008 citations


"Cheap Translation for Cross-Lingual..." refers background in this paper

  • ...It is effectively the same as standard phrase-based statistical machine translation systems (such as MOSES (Koehn et al., 2007)), except that the translation table is not induced from expensive parallel text, but is built from a lexicon, hence the name cheap translation....

    [...]

Proceedings Article
01 Jan 2002
TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

4,904 citations

Proceedings ArticleDOI
31 May 2003
TL;DR: The CoNLL-2003 shared task on NER as mentioned in this paper was the first NER task with language-independent named entity recognition (NER) data sets and evaluation method, and a general overview of the systems that participated in the task and their performance.
Abstract: We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance.

3,554 citations

Proceedings ArticleDOI
04 Jun 2009
TL;DR: Some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system are analyzed, and several solutions to these challenges are developed.
Abstract: We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior knowledge and how to use them within an NER system. In the process of comparing several solutions to these challenges we reach some surprising conclusions, as well as develop an NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task, the best reported result for this dataset.

1,539 citations


"Cheap Translation for Cross-Lingual..." refers methods in this paper

  • ...In all of our work we use the Illinois NER system (Ratinov and Roth, 2009) with standard features (forms, capitalization, affixes, word prior, word after, etc....

    [...]

Proceedings Article
01 Jun 2013
TL;DR: A simple log-linear reparameterization of IBM Model 2 that overcomes problems arising from Model 1’'s strong assumptions and Model 2’s overparameterization is presented.
Abstract: We present a simple log-linear reparameterization of IBM Model 2 that overcomes problems arising from Model 1’s strong assumptions and Model 2’s overparameterization. Efficient inference, likelihood evaluation, and parameter estimation algorithms are provided. Training the model is consistently ten times faster than Model 4. On three large-scale translation tasks, systems built using our alignment model outperform IBM Model 4. An open-source implementation of the alignment model described in this paper is available from http://github.com/clab/fast align .

1,006 citations


"Cheap Translation for Cross-Lingual..." refers methods in this paper

  • ...We aligned the source-target data using fast align (Dyer et al., 2013), and projected labels across alignments.7 Since this is high-quality translation, we treat it as an upper bound on our technique, but with the caveat that the alignments can be noisy given the relatively small amount of text....

    [...]

  • ...We aligned the source-target data using fast align (Dyer et al., 2013), and projected labels across alignments....

    [...]