scispace - formally typeset
Open AccessProceedings ArticleDOI

Cross-lingual Name Tagging and Linking for 282 Languages

Reads0
Chats0
TLDR
This work develops a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia that is able to identify name mentions, assign a coarse-grained or fine- grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable.
Abstract
The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

TL;DR: The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks.
Proceedings ArticleDOI

IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages

TL;DR: This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA.
Proceedings ArticleDOI

Emerging Cross-lingual Structure in Pretrained Language Models

TL;DR: It is shown that transfer is possible even when there is no shared vocabulary across the monolingual corpora and also when the text comes from very different domains, and it is strongly suggested that, much like for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces.
Proceedings ArticleDOI

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

TL;DR: This paper proposed MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations, and introduced a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language.
Proceedings Article

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation

TL;DR: The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark as discussed by the authors is a multi-task benchmark for evaluating the crosslingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
References
More filters
Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Proceedings ArticleDOI

Freebase: a collaboratively created graph database for structuring human knowledge

TL;DR: MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.
Journal ArticleDOI

A systematic comparison of various statistical alignment models

TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.
Journal ArticleDOI

Word association norms, mutual information, and lexicography

TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Proceedings ArticleDOI

Neural Architectures for Named Entity Recognition

TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
Related Papers (5)