Proceedings ArticleDOI
Piccolo: Exposing Complex Backdoors in NLP Transformer Models
Yingqi Liu,Guangyu Shen,Guanhong Tao,Shengwei An,Shiqing Ma,Xiangyu Zhang +5 more
- pp 2025-2042
TLDR
Abstract:
Backdoors can be injected to NLP models such that they misbehave when the trigger words or sentences appear in an input sample. Detecting such backdoors given only a subject model and a small number of benign samples is very challenging because of the unique nature of NLP applications, such as the discontinuity of pipeline and the large search space. Existing techniques work well for backdoors with simple triggers such as single character/word triggers but become less effective when triggers and models become complex (e.g., transformer models). We propose a new backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then uses optimization to invert a distribution of words denoting their likelihood in the trigger. It leverages a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words. Our evaluation on 3839 NLP models from the TrojAI competition and existing works with 7 state-of-art complex structures such as BERT and GPT, and 17 different attack types including two latest dynamic attacks, shows that our technique is highly effective, achieving over 0.9 detection accuracy in most scenarios and substantially outperforming two state-of-the-art scanners. Our submissions to TrojAI leaderboard achieve top performance in 2 out of the 3 rounds for NLP backdoor scanning.read more
Citations
More filters
Proceedings Article
Training with More Confidence: Mitigating Injected and Natural Backdoors During Training
TL;DR: A novel training method is designed that forces the training to avoid generating such hyperplanes and thus remove the injected backdoors, and can outperform existing state-of-the-art defenses.
Journal ArticleDOI
Stealthy Backdoor Attack for Code Models
TL;DR: AFRAIDOOR as mentioned in this paper leverages adversarial perturbations to inject adaptive triggers into different inputs to expose security weaknesses in code models under stealthy backdoor attacks and shows that the state-of-the-art defense method cannot provide sufficient protection.
Proceedings ArticleDOI
UNICORN: A Unified Backdoor Trigger Inversion Framework
TL;DR: UNICORN as mentioned in this paper proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from their analysis, which is general and effective in inverting backdoor triggers in DNNs.
Proceedings ArticleDOI
Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers
Limin Yang,Zhi Gang Chen,Jacopo Cortellazzi,Feargus Pendlebury,Kevin Tu,Fabio Pierazzi,Lorenzo Cavallaro,Gang Wang +7 more
TL;DR: This paper proposes a new attack, Jigsaw Puzzle (JP), based on the key observation that malware authors have little to no incentive to protect any other authors’ malware but their own, which is effective as a backdoor, remains stealthy against state-of-the-art defenses, and is a threat in realistic settings that depart from reasoning about feature-space only attacks.
Proceedings ArticleDOI
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
TL;DR: This article proposed a transferable backdoor attack against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies, and achieved superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines.
References
More filters
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Automatic differentiation in PyTorch
Adam Paszke,Sam Gross,Soumith Chintala,Gregory Chanan,Edward Z. Yang,Zachary DeVito,Zeming Lin,Alban Desmaison,Luca Antiga,Adam Lerer +9 more
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Proceedings Article
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Richard Socher,Alex Perelygin,Jean Y. Wu,Jason Chuang,Christopher D. Manning,Andrew Y. Ng,Christopher Potts +6 more
TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.