Transformers: State-of-the-art Natural Language Processing

Open AccessPosted Content

Transformers: State-of-the-art Natural Language Processing

TLDR

Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.

Abstract:

Recent advances in modern Natural Language Processing (NLP) research have been dominated by the combination of Transfer Learning methods with large-scale Transformer language models. With them came a paradigm shift in NLP with the starting point for training a model on a downstream task moving from a blank specific model to a general-purpose pretrained architecture. Still, creating these general-purpose models remains an expensive and time-consuming process restricting the use of these methods to a small sub-set of the wider NLP community. In this paper, we present Transformers, a library for state-of-the-art NLP, making these developments available to the community by gathering state-of-the-art general-purpose pretrained models under a unified API together with an ecosystem of libraries, examples, tutorials and scripts targeting many downstream NLP tasks. Transformers features carefully crafted model implementations and high-performance pretrained weights for two main deep learning frameworks, PyTorch and TensorFlow, while supporting all the necessary tools to analyze, evaluate and use these models in downstream tasks such as text/token classification, questions answering and language generation among others. Transformers has gained significant organic traction and adoption among both the researcher and practitioner communities. We are committed at Hugging Face to pursue the efforts to develop Transformers with the ambition of creating the standard library for building NLP systems.

Citations

PDF

Open Access

More filters

Posted Content

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, +3 more

- 02 Oct 2019 -

arXiv: Computation and Language

TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.

...read moreread less

Posted Content

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

Timo Schick, +1 more

- 21 Jan 2020 -

arXiv: Computation and Language

TL;DR: This work introduces Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task.

...read moreread less

Proceedings ArticleDOI

Heterogeneous Graph Transformer

Ziniu Hu, +3 more

TL;DR: The proposed HGT model consistently outperforms all the state-of-the-art GNN baselines by 9–21 on various downstream tasks, and the heterogeneous mini-batch graph sampling algorithm—HGSampling—for efficient and scalable training.

...read moreread less

Proceedings ArticleDOI

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Timo Schick, +1 more

TL;DR: This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.

...read moreread less

Posted Content

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, +8 more

- 02 Jun 2021 -

arXiv: Learning

TL;DR: Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

Christopher D. Manning, +5 more

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.

...read moreread less

Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 23 Oct 2019 -

arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

Proceedings ArticleDOI

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Michael Lewis, +7 more

TL;DR: BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

...read moreread less

Posted Content

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, +3 more

- 02 Oct 2019 -

arXiv: Computation and Language

TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.

...read moreread less