Open AccessProceedings Article
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
Arij Riabi,Thomas Scialom,Rachel Keraron,Benoît Sagot,Djamé Seddah,Jacopo Staiano +5 more
- pp 7016-7030
Reads0
Chats0
TLDR
This article proposed a method to improve cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a crosslingual fashion.Abstract:
Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).read more
Citations
More filters
Posted Content
One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval
TL;DR: This article proposed a cross-lingual open-retrieval answer generation model that can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
Book ChapterDOI
The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
TL;DR: This article used a parallel corpus to make embeddings of related words across languages similar to each other, and showed that fine-tuning leads to forgetting some of the cross-lingual alignment information This article .
References
More filters
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article
The PageRank Citation Ranking : Bringing Order to the Web
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Proceedings ArticleDOI
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TL;DR: The Stanford Question Answering Dataset (SQuAD) as mentioned in this paper is a reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
Proceedings ArticleDOI
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau,Kartikay Khandelwal,Naman Goyal,Vishrav Chaudhary,Guillaume Wenzek,Francisco Guzmán,Edouard Grave,Myle Ott,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Proceedings Article
Teaching machines to read and comprehend
Karl Moritz Hermann,Tomáš Kočiský,Edward Grefenstette,Lasse Espeholt,Will Kay,Mustafa Suleyman,Phil Blunsom +6 more
TL;DR: A new methodology is defined that resolves this bottleneck and provides large scale supervised reading comprehension data that allows a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure to be developed.
Related Papers (5)
A Robust Self-Learning Framework for Cross-Lingual Text Classification
Xin Dong,Gerard de Melo +1 more