Home
/
Authors
/
Clifton Poth

Author

Clifton Poth

Bio: Clifton Poth is an academic researcher. The author has contributed to research in topics: Computer science & Transfer of learning. The author has an hindex of 3, co-authored 4 publications receiving 106 citations.

Papers

PDF

Open Access

More filters

Posted Content•

AdapterHub: A Framework for Adapting Transformers.

[...]

Jonas Pfeiffer¹, Andreas Rücklé¹, Clifton Poth, Aishwarya Kamath², Ivan Vulić³, Sebastian Ruder⁴, Kyunghyun Cho⁵, Iryna Gurevych¹ - Show less +4 more•Institutions (5)

Technische Universität Darmstadt¹, New York University², University of Oslo³, Google⁴, Canadian Institute for Advanced Research⁵

15 Jul 2020-arXiv: Computation and Language

TL;DR: AdaptersHub is proposed, a framework that allows dynamic “stiching-in” of pre-trained adapters for different tasks and languages that enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios.

...read moreread less

Abstract: The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters -- small learnt bottleneck layers inserted within each layer of a pre-trained model -- ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at this https URL.

...read moreread less

247 citations

Proceedings Article•DOI•

AdapterHub: A Framework for Adapting Transformers

[...]

Jonas Pfeiffer¹, Andreas Rücklé¹, Clifton Poth, Aishwarya Kamath², Ivan Vulić³, Sebastian Ruder⁴, Kyunghyun Cho⁵, Iryna Gurevych¹ - Show less +4 more•Institutions (5)

Technische Universität Darmstadt¹, New York University², University of Oslo³, Google⁴, Canadian Institute for Advanced Research⁵

15 Jul 2020

TL;DR: In this paper, the authors propose a framework that allows dynamic "stiching-in" of pre-trained adapters for different tasks and languages, which enables extremely easy and quick adaptation of state-of-the-art pre-training models across tasks.

...read moreread less

Abstract: The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters---small learnt bottleneck layers inserted within each layer of a pre-trained model--- ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic "stiching-in" of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at AdapterHub.ml

...read moreread less

122 citations

Posted Content•

What to Pre-Train on? Efficient Intermediate Task Selection

[...]

Clifton Poth, Jonas Pfeiffer¹, Andreas Rücklé¹, Iryna Gurevych²•Institutions (2)

Technische Universität Darmstadt¹, University of Paderborn²

16 Apr 2021-arXiv: Computation and Language

TL;DR: This article showed that efficient embedding based methods that rely solely on the respective datasets outperform computational expensive few-shot fine-tuning approaches, demonstrating that they are able to efficiently identify the best datasets for intermediate training.

...read moreread less

Abstract: Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks. With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of all combinations to find the best transfer setting. In this work we first establish that similar sequential fine-tuning gains can be achieved in adapter settings, and subsequently consolidate previously proposed methods that efficiently identify beneficial tasks for intermediate transfer learning. We experiment with a diverse set of 42 intermediate and 11 target English classification, multiple choice, question answering, and sequence tagging tasks. Our results show that efficient embedding based methods that rely solely on the respective datasets outperform computational expensive few-shot fine-tuning approaches. Our best methods achieve an average Regret@3 of less than 1% across all target tasks, demonstrating that we are able to efficiently identify the best datasets for intermediate training.

...read moreread less

8 citations

Proceedings Article•DOI•

UKP-SQUARE: An Online Platform for Question Answering Research

[...]

Tim Baumgärtner, Kexin Wang, Rachneet Singh Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto San Roman, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych - Show less +9 more

25 Mar 2022

TL;DR: UKP-SQuARE is an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.

...read moreread less

Abstract: Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that consider a single domain, model or setup, there exists no framework where users can easily explore and compare such pipelines and can extend them according to their needs. To address this issue, we present UKP-SQuARE, an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests. In addition, QA researchers can develop, manage, and share their custom Skills using our microservices that support a wide range of models (Transformers, Adapters, ONNX), datastores and retrieval techniques (e.g., sparse and dense). UKP-SQuARE is available on https://square.ukp-lab.de

...read moreread less

5 citations

Proceedings Article•

ML Mob at SemEval-2023 Task 5: “Breaking News: Our Semi-Supervised and Multi-Task Learning Approach Spoils Clickbait”

[...]

Hannah Sterz, Leonard Bongard, Clifton Poth

TL;DR: This paper proposed a system to generate a spoiler for online news headlines, which provides the information promised by the headline and eliminates the need to read the full article, achieving an F1 score up to 51.48%.

...read moreread less

Abstract: Online articles using striking headlines that promise intriguing information are often used to attract readers. Most of the time, the information provided in the text is disappointing to the reader after the headline promised exciting news. As part of the SemEval-2023 challenge, we propose a system to generate a spoiler for these headlines. The spoiler provides the information promised by the headline and eliminates the need to read the full article. We consider Multi-Task Learning and generating more data using a distillation approach in our system. With this, we achieve an F1 score up to 51.48% on extracting the spoiler from the articles.

...read moreread less

1 citations

Cited by

PDF

Open Access

More filters

Posted Content•

Beyond English-Centric Multilingual Machine Translation

[...]

Angela Fan¹, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin - Show less +13 more•Institutions (1)

Facebook¹

21 Oct 2020-arXiv: Computation and Language

TL;DR: This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.

...read moreread less

Abstract: Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.

...read moreread less

378 citations

Posted Content•

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

[...]

Jonas Pfeiffer¹, Aishwarya Kamath², Andreas Rücklé¹, Kyunghyun Cho³, Iryna Gurevych¹ - Show less +1 more•Institutions (3)

Technische Universität Darmstadt¹, New York University², Courant Institute of Mathematical Sciences³

01 May 2020-arXiv: Computation and Language

TL;DR: This work proposes AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks by separating the two stages, i.e., knowledge extraction and knowledge composition, so that the classifier can effectively exploit the representations learned frommultiple tasks in a non-destructive manner.

...read moreread less

Abstract: Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowledge extraction stage we learn task specific parameters called adapters, that encapsulate the task-specific information. We then combine the adapters in a separate knowledge composition step. We show that by separating the two stages, i.e., knowledge extraction and knowledge composition, the classifier can effectively exploit the representations learned from multiple tasks in a non-destructive manner. We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at this http URL.

...read moreread less

265 citations

Proceedings Article•DOI•

Visual Prompt Tuning

[...]

M. Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, S. Belongie, Bharath Hariharan, Ser-Nam Lim - Show less +3 more

23 Mar 2022

TL;DR: This paper introduces Visual Prompt Tuning as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision and shows that VPT achieves significant performance gains compared to other parameter efficient tuning protocols.

...read moreread less

Abstract: The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

...read moreread less

247 citations

Posted Content•

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

[...]

Jonas Pfeiffer¹, Ivan Vulić², Iryna Gurevych¹, Sebastian Ruder³•Institutions (3)

Technische Universität Darmstadt¹, University of Mannheim², Google³

30 Apr 2020-arXiv: Computation and Language

TL;DR: MAD-X is proposed, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations and introduces a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language.

...read moreread less

Abstract: The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pre-training. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning, and achieves competitive results on question answering. Our code and adapters are available at this http URL

...read moreread less

228 citations

Proceedings Article•DOI•

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

[...]

Jonas Pfeiffer¹, Ivan Vulić², Iryna Gurevych¹, Sebastian Ruder³•Institutions (3)

Technische Universität Darmstadt¹, University of Mannheim², Google³

30 Apr 2020

TL;DR: This paper proposed MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations, and introduced a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language.

...read moreread less

Abstract: The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pre-training. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language. MAD-X outperforms the state of the art in cross lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning, and achieves competitive results on question answering. Our code and adapters are available at AdapterHub.ml.

...read moreread less

169 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

Collapse