Generating Datasets with Pretrained Language Models

Open AccessPosted Content

Generating Datasets with Pretrained Language Models

- 15 Apr 2021 -

TLDR

The authors utilize the generative abilities of pretrained language models to generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models.

Abstract:

To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how large PLMs can be leveraged to obtain high-quality embeddings without requiring any labeled data, finetuning or modifications to the pretraining objective: We utilize the generative abilities of PLMs to generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models. Our fully unsupervised approach outperforms strong baselines on several English semantic textual similarity datasets.

Citations

PDF

Open Access

More filters

Posted Content

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

Peter West, +8 more

- 14 Oct 2021 -

arXiv: Computation and Language

TL;DR: The authors distill knowledge symbolically as text in addition to the neural model, allowing the student to be a different type, a commonsense model, and show that careful prompt engineering and a separately trained critic model allow them to selectively distill high-quality causal commonsense from GPT-3, a general language model.

...read moreread less

Posted Content

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Boseop Kim, +36 more

- 10 Sep 2021 -

arXiv: Computation and Language

TL;DR: HyperCLOVA as discussed by the authors is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.

...read moreread less

Proceedings ArticleDOI

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

TL;DR: In this article , the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACLT) was held in New Orleans, USA.

...read moreread less

Posted Content

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Bonan Min, +8 more

- 01 Nov 2021 -

arXiv: Computation and Language

TL;DR: This paper present a survey of recent work that uses pre-trained transformer-based language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.

...read moreread less

Posted Content

True Few-Shot Learning with Prompts - A Real-World Perspective.

Timo Schick, +1 more

- 26 Nov 2021 -

arXiv: Computation and Language

TL;DR: For example, PET as mentioned in this paper combines textual instructions with example-based finetuning and achieves state-of-the-art performance on a few-shot learning task without a dev set.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Automatic differentiation in PyTorch

Adam Paszke, +9 more

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less