Open AccessPosted Content
Transformers: State-of-the-art Natural Language Processing
Thomas Wolf,Lysandre Debut,Victor Sanh,Julien Chaumond,Clement Delangue,Anthony Moi,Pierric Cistac,Tim Rault,Rémi Louf,Morgan Funtowicz,Jamie Brew +10 more
TLDR
Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.Abstract:
Recent advances in modern Natural Language Processing (NLP) research have
been dominated by the combination of Transfer Learning methods with large-scale
Transformer language models. With them came a paradigm shift in NLP with the
starting point for training a model on a downstream task moving from a blank
specific model to a general-purpose pretrained architecture. Still, creating
these general-purpose models remains an expensive and time-consuming process
restricting the use of these methods to a small sub-set of the wider NLP
community. In this paper, we present Transformers, a library for
state-of-the-art NLP, making these developments available to the community by
gathering state-of-the-art general-purpose pretrained models under a unified
API together with an ecosystem of libraries, examples, tutorials and scripts
targeting many downstream NLP tasks. Transformers features carefully crafted
model implementations and high-performance pretrained weights for two main deep
learning frameworks, PyTorch and TensorFlow, while supporting all the necessary
tools to analyze, evaluate and use these models in downstream tasks such as
text/token classification, questions answering and language generation among
others. Transformers has gained significant organic traction and adoption among
both the researcher and practitioner communities. We are committed at Hugging
Face to pursue the efforts to develop Transformers with the ambition of
creating the standard library for building NLP systems.read more
Citations
More filters
Posted Content
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.
Posted Content
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick,Hinrich Schütze +1 more
TL;DR: This work introduces Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task.
Proceedings ArticleDOI
Heterogeneous Graph Transformer
TL;DR: The proposed HGT model consistently outperforms all the state-of-the-art GNN baselines by 9–21 on various downstream tasks, and the heterogeneous mini-batch graph sampling algorithm—HGSampling—for efficient and scalable training.
Proceedings ArticleDOI
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick,Hinrich Schütze +1 more
TL;DR: This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.
Posted Content
Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen,Kevin Lu,Aravind Rajeswaran,Kimin Lee,Aditya Grover,Michael Laskin,Pieter Abbeel,Aravind Srinivas,Igor Mordatch +8 more
TL;DR: Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
References
More filters
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings ArticleDOI
The Stanford CoreNLP Natural Language Processing Toolkit
Christopher D. Manning,Mihai Surdeanu,John Bauer,Jenny Rose Finkel,Steven Bethard,David McClosky +5 more
TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Posted Content
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Proceedings ArticleDOI
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Michael Lewis,Yinhan Liu,Naman Goyal,Marjan Ghazvininejad,Abdelrahman Mohamed,Omer Levy,Veselin Stoyanov,Luke Zettlemoyer +7 more
TL;DR: BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.
Posted Content
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.