Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

doi:10.18653/V1/2021.ACL-LONG.377

Open AccessProceedings ArticleDOI

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Fanchao Qi, +4 more

- pp 4873-4883

Chats0

TLDR

In this paper, the authors show that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections.

Abstract:

Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated, presenting serious security threats to real-world applications. Since existing textual backdoor attacks pay little attention to the invisibility of backdoors, they can be easily detected and blocked. In this work, we present invisible backdoors that are activated by a learnable combination of word substitution. We show that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections. The results raise a serious alarm to the security of NLP models, which requires further research to be resolved. All the data and code of this paper are released at https://github.com/thunlp/BkdAtk-LWS.

Citations

PDF

Open Access

More filters

Posted Content

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Fanchao Qi, +4 more

- 20 Nov 2020 -

arXiv: Computation and Language

TL;DR: A simple and effective textual backdoor defense named ONION, which is based on outlier word detection and, to the best of the knowledge, is the first method that can handle all the textual backdoor attack situations.

...read moreread less

Proceedings ArticleDOI

Piccolo: Exposing Complex Backdoors in NLP Transformer Models

Yingqi Liu, +5 more

TL;DR:

...read moreread less

Proceedings ArticleDOI

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Fanchao Qi, +6 more

TL;DR: This paper proposed to use the syntactic structure as the trigger in textual backdoor attacks, which can achieve comparable attack performance to the insertion-based methods but possesses much higher invisibility and stronger resistance to defenses.

...read moreread less

Proceedings ArticleDOI

Improving Machine Translation Systems via Isotopic Replacement

Zeyu Sun, +5 more

TL;DR: This work proposes CAT, a novel word-replacement-based approach, whose basic idea is to identify word replacement with controlled impact (referred to as isotopic replacement), which uses a neural-based language model to encode the sentence context, and design an neural-network-based algorithm to evaluate context-aware semantic similarity between two words.

...read moreread less

Posted Content

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer.

Fanchao Qi, +5 more

- 14 Oct 2021 -

arXiv: Computation and Language

TL;DR: In this paper, the authors make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its meaning.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum

- 01 Sep 2000 -

Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, +6 more

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

Proceedings Article

Categorical Reparameterization with Gumbel-Softmax

Eric Jang, +2 more

TL;DR: Gumbel-Softmax as mentioned in this paper replaces the non-differentiable samples from a categorical distribution with a differentiable sample from a novel Gumbel softmax distribution, which has the essential property that it can be smoothly annealed into the categorical distributions.

...read moreread less

Collapse

arXiv: Cryptography and Security

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, +4 more

- 15 Dec 2017 -

arXiv: Cryptography and Security

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Citations

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Piccolo: Exposing Complex Backdoors in NLP Transformer Models

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Improving Machine Translation Systems via Isotopic Replacement

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer.

References

Long short-term memory

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

WordNet : an electronic lexical database

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Categorical Reparameterization with Gumbel-Softmax

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Trojaning Attack on Neural Networks

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain.

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning