Piccolo: Exposing Complex Backdoors in NLP Transformer Models

doi:10.1109/sp46214.2022.9833579

Proceedings ArticleDOI

Piccolo: Exposing Complex Backdoors in NLP Transformer Models

Yingqi Liu, +5 more

- pp 2025-2042

TLDR

Abstract:

Backdoors can be injected to NLP models such that they misbehave when the trigger words or sentences appear in an input sample. Detecting such backdoors given only a subject model and a small number of benign samples is very challenging because of the unique nature of NLP applications, such as the discontinuity of pipeline and the large search space. Existing techniques work well for backdoors with simple triggers such as single character/word triggers but become less effective when triggers and models become complex (e.g., transformer models). We propose a new backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then uses optimization to invert a distribution of words denoting their likelihood in the trigger. It leverages a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words. Our evaluation on 3839 NLP models from the TrojAI competition and existing works with 7 state-of-art complex structures such as BERT and GPT, and 17 different attack types including two latest dynamic attacks, shows that our technique is highly effective, achieving over 0.9 detection accuracy in most scenarios and substantially outperforming two state-of-the-art scanners. Our submissions to TrojAI leaderboard achieve top performance in 2 out of the 3 rounds for NLP backdoor scanning.

Citations

PDF

Open Access

More filters

Proceedings Article

Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

Zhenting Wang, +3 more

TL;DR: A novel training method is designed that forces the training to avoid generating such hyperplanes and thus remove the injected backdoors, and can outperform existing state-of-the-art defenses.

...read moreread less

Journal ArticleDOI

Stealthy Backdoor Attack for Code Models

Zhou Yang, +6 more

- 06 Jan 2023 -

arXiv.org

TL;DR: AFRAIDOOR as mentioned in this paper leverages adversarial perturbations to inject adaptive triggers into different inputs to expose security weaknesses in code models under stealthy backdoor attacks and shows that the state-of-the-art defense method cannot provide sufficient protection.

...read moreread less

Proceedings ArticleDOI

UNICORN: A Unified Backdoor Trigger Inversion Framework

Zhenting Wang, +3 more

TL;DR: UNICORN as mentioned in this paper proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from their analysis, which is general and effective in inverting backdoor triggers in DNNs.

...read moreread less

Proceedings ArticleDOI

Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers

Limin Yang, +7 more

TL;DR: This paper proposes a new attack, Jigsaw Puzzle (JP), based on the key observation that malware authors have little to no incentive to protect any other authors’ malware but their own, which is effective as a backdoor, remains stealthy against state-of-the-art defenses, and is a threat in realistic settings that depart from reasoning about feature-space only attacks.

...read moreread less

Proceedings ArticleDOI

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Kai Mei, +4 more

TL;DR: This article proposed a transferable backdoor attack against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies, and achieved superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018 -

arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Automatic differentiation in PyTorch

Adam Paszke, +9 more

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less

Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, +6 more

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

Collapse

Piccolo: Exposing Complex Backdoors in NLP Transformer Models

Citations

Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

Stealthy Backdoor Attack for Code Models

UNICORN: A Unified Backdoor Trigger Inversion Framework

Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

References

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Automatic differentiation in PyTorch

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Related Papers (5)

Word clouds for efficient document labeling

Pose estimation with motionlet LLC coding

Combining evidence from unconstrained spoken term frequency estimation for improved speech retrieval

Active Semi-Supervised Learning for Improving Word Alignment

Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis.