Universal Adversarial Triggers for Attacking and Analyzing NLP

doi:10.18653/V1/D19-1221

Open AccessProceedings ArticleDOI

Universal Adversarial Triggers for Attacking and Analyzing NLP

Eric Wallace, +4 more

- pp 2153-2162

Chats0

TLDR

This article propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction.

Abstract:

Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction. For example, triggers cause SNLI entailment accuracy to drop from 89.94% to 0.55%, 72% of “why” questions in SQuAD to be answered “to kill american people”, and the GPT-2 language model to spew racist output even when conditioned on non-racial contexts. Furthermore, although the triggers are optimized using white-box access to a specific model, they transfer to other models for all tasks we consider. Finally, since triggers are input-agnostic, they provide an analysis of global model behavior. For instance, they confirm that SNLI models exploit dataset biases and help to diagnose heuristics learned by reading comprehension models.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Pre-trained Models for Natural Language Processing: A Survey

Xipeng Qiu, +5 more

- 18 Mar 2020 -

Science China-technological Sciences

TL;DR: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era as mentioned in this paper, and a comprehensive review of PTMs for NLP can be found in this survey.

...read moreread less

Proceedings ArticleDOI

Beyond accuracy: Behavioral testing of NLP models with checklist

Marco Tulio Ribeiro, +3 more

TL;DR: CheckList as mentioned in this paper is a task-agnostic methodology for testing NLP models, which includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly.

...read moreread less

Journal ArticleDOI

A Primer in BERTology: What We Know About How BERT Works

Anna Rogers, +2 more

- 01 Jan 2020 -

Transactions of the Association for Comp...

TL;DR: A survey of over 150 studies of the BERT model can be found in this paper, where the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression.

...read moreread less

Posted Content

A Primer in BERTology: What we know about how BERT works

Anna Rogers, +2 more

- 27 Feb 2020 -

arXiv: Computation and Language

TL;DR: This paper is the first survey of over 150 studies of the popular BERT model, reviewing the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression.

...read moreread less

Posted Content

How Can We Know What Language Models Know

Zhengbao Jiang, +3 more

- 28 Nov 2019 -

arXiv: Computation and Language

TL;DR: This paper proposes mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts to provide a tighter lower bound on what LMs know.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings Article

Intriguing properties of neural networks

Christian Szegedy, +7 more

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Proceedings ArticleDOI

Neural Machine Translation of Rare Words with Subword Units

Rico Sennrich, +2 more

TL;DR: This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

...read moreread less

Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, +6 more

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

Collapse

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

Universal Adversarial Triggers for Attacking and Analyzing NLP

Citations

Pre-trained Models for Natural Language Processing: A Survey

Beyond accuracy: Behavioral testing of NLP models with checklist

A Primer in BERTology: What We Know About How BERT Works

A Primer in BERTology: What we know about how BERT works

How Can We Know What Language Models Know

References

Glove: Global Vectors for Word Representation

Intriguing properties of neural networks

Deep contextualized word representations

Neural Machine Translation of Rare Words with Subword Units

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Attention is All you Need

Glove: Global Vectors for Word Representation

Explaining and Harnessing Adversarial Examples