Journal Article•
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
Citations
More filters
•
01 Nov 2021TL;DR: The authors introduced Simplified Language Activity Traces (SPLAT) as a benchmarking approach for question-answer pairs, which allows the generation of questionanswer pairs at scale and afford complete knowledge in their closed domains.
Abstract: The capabilities of today’s natural language processing systems are typically evaluated using large datasets of curated questions and answers. While these are critical benchmarks of progress, they also suffer from weakness due to artificial distributions and incomplete knowledge. Artifacts arising from artificial distributions can overstate language model performance, while incomplete knowledge limits fine-grained analysis. In this work, we introduce a complementary benchmarking approach based on SimPlified Language Activity Traces (SPLAT). SPLATs are corpora of language encodings of activity in some closed domain (we study traces from chess and baseball games in this work). SPLAT datasets use naturally-arising distributions, allow the generation of question-answer pairs at scale, and afford complete knowledge in their closed domains. We show that language models of three different architectures can answer questions about world states using only verb-like encodings of activity. Our approach is extensible to new language models and additional question-answering tasks.
•
TL;DR: TCube as mentioned in this paper is a domain-agnostic neural framework for time-series narration that couples the representation of essential time series elements in the form of a dense knowledge graph and the translation of said knowledge graph into rich and fluent narratives through the transfer learning capabilities of pre-trained language models.
Abstract: The task of generating rich and fluent narratives that aptly describe the
characteristics, trends, and anomalies of time-series data is invaluable to the
sciences (geology, meteorology, epidemiology) or finance (trades, stocks, or
sales and inventory). The efforts for time-series narration hitherto are
domain-specific and use predefined templates that offer consistency but lead to
mechanical narratives. We present TCube (Time-series-to-text), a
domain-agnostic neural framework for time-series narration, that couples the
representation of essential time-series elements in the form of a dense
knowledge graph and the translation of said knowledge graph into rich and
fluent narratives through the transfer-learning capabilities of PLMs
(Pre-trained Language Models). TCube's design primarily addresses the challenge
that lies in building a neural framework in the complete paucity of annotated
training data for time-series. The design incorporates knowledge graphs as an
intermediary for the representation of essential time-series elements which can
be linearized for textual translation. To the best of our knowledge, TCube is
the first investigation of the use of neural strategies for time-series
narration. Through extensive evaluations, we show that TCube can improve the
lexical diversity of the generated narratives by up to 65.38% while still
maintaining grammatical integrity. The practicality and deployability of TCube
is further validated through an expert review (n=21) where 76.2% of
participating experts wary of auto-generated narratives favored TCube as a
deployable system for time-series narration due to its richer narratives. Our
code-base, models, and datasets, with detailed instructions for reproducibility
is publicly hosted at https://github.com/Mandar-Sharma/TCube.
•
TL;DR: TxT as mentioned in this paper is a transformer-based cross-modal pipeline that enables fine-tuning both language and visual components on the downstream task in a fully end-to-end manner.
Abstract: Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual representation is not specifically tuned to the multimodal task at hand. At the same time, while transformer-based object detectors have gained popularity, they have not been employed in today's multimodal pipelines. We address both shortcomings with TxT, a transformer-based crossmodal pipeline that enables fine-tuning both language and visual components on the downstream task in a fully end-to-end manner. We overcome existing limitations of transformer-based detectors for multimodal reasoning regarding the integration of global context and their scalability. Our transformer-based multimodal model achieves considerable gains from end-to-end learning for multimodal question answering.
•
TL;DR: This article presented two strategies for sentence punctuation for text sequences of game commentary, that is, punctuating sentences by two or three text sequences originally punctuated by Youtube to obtain a complete sentence of commentary.
Abstract: To solve the existing sentence punctuation problem for collaborative commentary generation in Esports live-streaming, this paper presents two strategies for sentence punctuation for text sequences of game commentary, that is, punctuating sentences by two or three text sequence(s) originally punctuated by Youtube to obtain a complete sentence of commentary. We conducted comparative experiments utilizing and fine-tuning a state-of-the-art pre-trained generative language model among two strategies and the baseline to generate collaborative commentary. Both objective evaluations by automatic metrics and subjective analyses showed that our strategy of punctuating sentences by two text sequences outperformed the baseline.
•
TL;DR: This article developed a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar or data-totext settings.
Abstract: Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepresented groups are not equally included in the evaluation. To encourage more in-depth model analyses, researchers have proposed the use of multiple test sets, also called challenge sets, that assess specific capabilities of a model. In this paper, we develop a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.