Cross-lingual Name Tagging and Linking for 282 Languages
Xiaoman Pan,Boliang Zhang,Jonathan May,Joel Nothman,Kevin Knight,Heng Ji +5 more
- Vol. 1, pp 1946-1958
Reads0
Chats0
TLDR
This work develops a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia that is able to identify name mentions, assign a coarse-grained or fine- grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable.Abstract:
The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.read more
Citations
More filters
Posted Content
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
TL;DR: The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.
Proceedings ArticleDOI
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages
Divyanshu Kakwani,Anoop Kunchukuttan,Satish Golla,N C Gokul,Avik Bhattacharyya,Mitesh M. Khapra,Pratyush Kumar +6 more
TL;DR: This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA.
Proceedings ArticleDOI
Emerging Cross-lingual Structure in Pretrained Language Models
TL;DR: It is shown that transfer is possible even when there is no shared vocabulary across the monolingual corpora and also when the text comes from very different domains, and it is strongly suggested that, much like for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces.
Proceedings ArticleDOI
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
TL;DR: This paper proposed MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations, and introduced a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language.
Proceedings Article
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation
TL;DR: The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark as discussed by the authors is a multi-task benchmark for evaluating the crosslingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
References
More filters
Proceedings ArticleDOI
The Stanford CoreNLP Natural Language Processing Toolkit
Christopher D. Manning,Mihai Surdeanu,John Bauer,Jenny Rose Finkel,Steven Bethard,David McClosky +5 more
TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Proceedings ArticleDOI
Freebase: a collaboratively created graph database for structuring human knowledge
TL;DR: MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.
Journal ArticleDOI
A systematic comparison of various statistical alignment models
Franz Josef Och,Hermann Ney +1 more
TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.
Journal ArticleDOI
Word association norms, mutual information, and lexicography
Kenneth Church,Patrick Hanks +1 more
TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Proceedings ArticleDOI
Neural Architectures for Named Entity Recognition
TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.