Open AccessPosted Content
Longformer: The Long-Document Transformer
Reads0
Chats0
TLDR
Following prior work on long-sequence transformers, the Longformer is evaluated on character-level language modeling and achieves state-of-the-art results on text8 and enwik8 and pretrain Longformer and finetune it on a variety of downstream tasks.Abstract:
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization dataset.read more
Citations
More filters
Proceedings ArticleDOI
Transformers: State-of-the-Art Natural Language Processing
Thomas Wolf,Lysandre Debut,Victor Sanh,Julien Chaumond,Clement Delangue,Anthony Moi,Pierric Cistac,Clara Ma,Yacine Jernite,Julien Plu,Canwen Xu,Teven Le Scao,Sylvain Gugger,Mariama Drame,Quentin Lhoest,Alexander M. Rush +15 more
TL;DR: Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.
Posted Content
HuggingFace's Transformers: State-of-the-art Natural Language Processing.
Thomas Wolf,Lysandre Debut,Victor Sanh,Julien Chaumond,Clement Delangue,Anthony Moi,Pierric Cistac,Tim Rault,Rémi Louf,Morgan Funtowicz,Jamie Brew +10 more
TL;DR: The \textit{Transformers} library is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.
Posted Content
Deformable DETR: Deformable Transformers for End-to-End Object Detection
TL;DR: Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference, can achieve better performance than DETR (especially on small objects) with 10$\times less training epochs.
Proceedings ArticleDOI
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan,Ana Marasović,Ana Marasović,Swabha Swayamdipta,Kyle Lo,Iz Beltagy,Doug Downey,Noah A. Smith,Noah A. Smith +8 more
TL;DR: It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.
Posted Content
Transformers: State-of-the-art Natural Language Processing
Thomas Wolf,Lysandre Debut,Victor Sanh,Julien Chaumond,Clement Delangue,Anthony Moi,Pierric Cistac,Tim Rault,Rémi Louf,Morgan Funtowicz,Jamie Brew +10 more
TL;DR: Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.
References
More filters
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings Article
Sequence to Sequence Learning with Neural Networks
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Posted Content
Sequence to Sequence Learning with Neural Networks
TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.