Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•

Controlled Neural Sentence-Level Reframing of News Articles.

[...]

Wei-Fan Chen¹, Khalid Al Khatib², Benno Stein³, Henning Wachsmuth¹•Institutions (3)

University of Paderborn¹, Jordan University of Science and Technology², Bauhaus University, Weimar³

01 Nov 2021

TL;DR: This article proposed three strategies: framed-language pretraining, named-entity preservation, and adversarial learning to train neural models on an existing media frame corpus to generate properly framed text.

...read moreread less

Abstract: Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become necessary to achieve the desired effect on the readers. Reframing is related to adapting style and sentiment, which can be tackled with neural text generation techniques. However, it is more challenging since changing a frame requires rewriting entire sentences rather than single phrases. In this paper, we study how to computationally reframe sentences in news articles while maintaining their coherence to the context. We treat reframing as a sentence-level fill-in-the-blank task for which we train neural models on an existing media frame corpus. To guide the training, we propose three strategies: framed-language pretraining, named-entity preservation, and adversarial learning. We evaluate respective models automatically and manually for topic consistency, coherence, and successful reframing. Our results indicate that generating properly-framed text works well but with tradeoffs.

...read moreread less

Proceedings Article•

Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems

[...]

Fei Mi¹, Wanhao Zhou², Fengyu Cai³, Lingjing Kong³, Minlie Huang³, Boi Faltings³ - Show less +2 more•Institutions (3)

Huawei¹, Tsinghua University², École Polytechnique Fédérale de Lausanne³

28 Aug 2021

TL;DR: The authors proposed a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model, and a new text augmentation technique (GradAug) is proposed to better train the Student by replacing non-crucial tokens using a masked language model.

...read moreread less

Abstract: As the labeling cost for different modules in task-oriented dialog (ToD) systems is expensive, a major challenge is to train different modules with the least amount of labeled data. Recently, large-scale pre-trained language models, have shown promising results for few-shot learning in ToD. In this paper, we devise a self-training approach to utilize the abundant unlabeled dialog data to further improve state-of-the-art pre-trained models in few-shot learning scenarios for ToD systems. Specifically, we propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model. Moreover, a new text augmentation technique (GradAug) is proposed to better train the Student by replacing non-crucial tokens using a masked language model. We conduct extensive experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection. Empirical results demonstrate that the proposed self-training approach consistently improves state-of-the-art pre-trained models (BERT, ToD-BERT) when only a small number of labeled data are available.

...read moreread less

Posted Content•

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

[...]

Nick Craswell¹, Bhaskar Mitra², Emine Yilmaz², Daniel Campos³, Jimmy Lin⁴ - Show less +1 more•Institutions (4)

Microsoft¹, University College London², University of Illinois at Urbana–Champaign³, University of Waterloo⁴

09 May 2021-arXiv: Information Retrieval

TL;DR: In this paper, the authors use the MS MARCO and TREC Deep Learning Track as their case study, comparing it to the case of TREC ad hoc ranking in the 1990s.

...read moreread less

Abstract: Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This paper uses the MS MARCO and TREC Deep Learning Track as our case study, comparing it to the case of TREC ad hoc ranking in the 1990s. We show how the design of the evaluation effort can encourage or discourage certain outcomes, and raising questions about internal and external validity of results. We provide some analysis of certain pitfalls, and a statement of best practices for avoiding such pitfalls. We summarize the progress of the effort so far, and describe our desired end state of "robust usefulness", along with steps that might be required to get us there.

...read moreread less

Evaluation of Abstractive Summarisation Models with Machine Translation in Deliberative Processes.

[...]

Miguel Arana-Catania, Rob Procter, Yulan He, Maria Liakata

01 Nov 2021

TL;DR: The authors presented work on summarizing deliberative processes for non-English languages using an off-the-shelf machine translation model and obtained promising results regarding the fluency, consistency and relevance of the summaries produced.

...read moreread less

Abstract: We present work on summarising deliberative processes for non-English languages. Unlike commonly studied datasets, such as news articles, this deliberation dataset reflects difficulties of combining multiple narratives, mostly of poor grammatical quality, in a single text. We report an extensive evaluation of a wide range of abstractive summarisation models in combination with an off-the-shelf machine translation model. Texts are translated into English, summarised, and translated back to the original language. We obtain promising results regarding the fluency, consistency and relevance of the summaries produced. Our approach is easy to implement for many languages for production purposes by simply changing the translation model.

...read moreread less

Posted Content•

MReD: A Meta-Review Dataset for Controllable Text Generation.

[...]

Chenhui Shen, Liying Cheng, Ran Zhou, Lidong Bing, Yang You, Luo Si - Show less +2 more

14 Oct 2021-arXiv: Computation and Language

TL;DR: This article introduced a new text generation dataset, named MReD, which consists of 7,089 meta-reviews and all its 45k meta review sentences are manually annotated as one of the carefully defined 9 categories, including abstract, strength, decision, etc.

...read moreread less

Abstract: When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited.A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should leverage both the input text and control variables to guide the generation, which can only be built with deep understanding of the domain knowledge. Motivated by this vi-sion, our paper introduces a new text generation dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its 45k meta-review sentences are manually annotated as one of the carefully defined 9 categories, including abstract, strength, decision, etc. We present experimental results on start-of-the-art summarization models, and propose methods for controlled generation on both extractive and abstractive models using our annotated data. By exploring various settings and analaysing the model behavior with respect to the control inputs, we demonstrate the challenges and values of our dataset. MReD allows us to have a better understanding of the meta-review corpora and enlarge the research room for controllable text generation.

...read moreread less