Open AccessPosted Content
Generating Datasets with Pretrained Language Models
Timo Schick,Hinrich Schütze +1 more
TLDR
The authors utilize the generative abilities of pretrained language models to generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models.Abstract:
To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how large PLMs can be leveraged to obtain high-quality embeddings without requiring any labeled data, finetuning or modifications to the pretraining objective: We utilize the generative abilities of PLMs to generate entire datasets of labeled text pairs from scratch, which can then be used for regular finetuning of much smaller models. Our fully unsupervised approach outperforms strong baselines on several English semantic textual similarity datasets.read more
Citations
More filters
Posted Content
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Peter West,Chandra Bhagavatula,Jack Hessel,Jena D. Hwang,Liwei Jiang,Ronan Le Bras,Ximing Lu,Sean Welleck,Yejin Choi +8 more
TL;DR: The authors distill knowledge symbolically as text in addition to the neural model, allowing the student to be a different type, a commonsense model, and show that careful prompt engineering and a separately trained critic model allow them to selectively distill high-quality causal commonsense from GPT-3, a general language model.
Posted Content
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim,HyoungSeok Kim,Sang Woo Lee,Gichang Lee,Dong-Hyun Kwak,Dong Hyeon Jeon,Sunghyun Park,Sungju Kim,Seonhoon Kim,Dongpil Seo,Heungsub Lee,Minyoung Jeong,Sungjae Lee,Minsub Kim,Suk Hyun Ko,Seokhun Kim,Taeyong Park,Jinuk Kim,Soyoung Kang,Na-Hyeon Ryu,Kang Min Yoo,Minsuk Chang,Soobin Suh,Sookyo In,Jin-Seong Park,Kyungduk Kim,Hiun Kim,Jisu Jeong,Yong Goo Yeo,Donghoon Ham,Dongju Park,Min Young Lee,Jae-Wook Kang,Inho Kang,Jung-Woo Ha,Woo-Myoung Park,Nako Sung +36 more
TL;DR: HyperCLOVA as discussed by the authors is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.
Proceedings ArticleDOI
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
TL;DR: In this article , the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACLT) was held in New Orleans, USA.
Posted Content
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Bonan Min,Hayley Ross,Elior Sulem,Amir Pouran Ben Veyseh,Thien Huu Nguyen,Oscar Sainz,Eneko Agirre,Ilana Heintz,Dan Roth +8 more
TL;DR: This paper present a survey of recent work that uses pre-trained transformer-based language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
Posted Content
True Few-Shot Learning with Prompts - A Real-World Perspective.
Timo Schick,Hinrich Schütze +1 more
TL;DR: For example, PET as mentioned in this paper combines textual instructions with example-based finetuning and achieves state-of-the-art performance on a few-shot learning task without a dev set.
References
More filters
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI
Rethinking the Inception Architecture for Computer Vision
TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Automatic differentiation in PyTorch
Adam Paszke,Sam Gross,Soumith Chintala,Gregory Chanan,Edward Z. Yang,Zachary DeVito,Zeming Lin,Alban Desmaison,Luca Antiga,Adam Lerer +9 more
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.