scispace - formally typeset
Open AccessPosted Content

Generating Datasets with Pretrained Language Models

TLDR
The authors utilize the generative abilities of pretrained language models to generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models.
Abstract
To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how large PLMs can be leveraged to obtain high-quality embeddings without requiring any labeled data, finetuning or modifications to the pretraining objective: We utilize the generative abilities of PLMs to generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models. Our fully unsupervised approach outperforms strong baselines on several English semantic textual similarity datasets.

read more

Citations
More filters
Posted Content

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

TL;DR: The authors distill knowledge symbolically as text in addition to the neural model, allowing the student to be a different type, a commonsense model, and show that careful prompt engineering and a separately trained critic model allow them to selectively distill high-quality causal commonsense from GPT-3, a general language model.
Proceedings ArticleDOI

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

TL;DR: In this article , the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACLT) was held in New Orleans, USA.
Posted Content

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

TL;DR: This paper present a survey of recent work that uses pre-trained transformer-based language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
Posted Content

True Few-Shot Learning with Prompts - A Real-World Perspective.

TL;DR: For example, PET as mentioned in this paper combines textual instructions with example-based finetuning and achieves state-of-the-art performance on a few-shot learning task without a dev set.
References
More filters
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Automatic differentiation in PyTorch

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Related Papers (5)