Home
/
Authors
/
Jiazhu Dai

Author

Jiazhu Dai

Bio: Jiazhu Dai is an academic researcher from Shanghai University. The author has contributed to research in topics: Backdoor & Autoencoder. The author has an hindex of 2, co-authored 3 publications receiving 23 citations.

Topics: Backdoor, Autoencoder, Artificial intelligence, MNIST database, Computer science ...read more

Papers

PDF

Open Access

More filters

Posted Content•

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

[...]

Chuanshuai Chen¹, Jiazhu Dai¹•Institutions (1)

Shanghai University¹

11 Jul 2020-arXiv: Cryptography and Security

TL;DR: A defense method called Backdoor Keyword Identification (BKI) is proposed to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning, which can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset.

...read moreread less

Abstract: It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on four different text classification datset: IMDB, DBpedia ontology, 20 newsgroups and Reuters-21578 dataset. It all achieves good performance regardless of the trigger sentences.

...read moreread less

41 citations

Journal Article•DOI•

Mitigating backdoor attacks in LSTM-based text classification systems by Backdoor Keyword Identification

[...]

Chuanshuai Chen¹, Jiazhu Dai¹•Institutions (1)

Shanghai University¹

10 Sep 2021-Neurocomputing

TL;DR: Wang et al. as mentioned in this paper proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning.

...read moreread less

29 citations

Journal Article•DOI•

Fast-UAP: An algorithm for expediting universal adversarial perturbation generation using the orientations of perturbation vectors

[...]

Jiazhu Dai¹, Le Shu¹•Institutions (1)

Shanghai University¹

21 Jan 2021-Neurocomputing

TL;DR: An optimized algorithm to enhance the performance of generating universal perturbations based on the orientations of perturbation vectors is proposed, which shows that compared with UAP, the ones generated using the proposed algorithm achieved an average fooling-rate increment of 9 % in white-box and black-box attacks.

...read moreread less

4 citations

Journal Article•DOI•

An Evasion Attack against Stacked Capsule Autoencoder

[...]

Jiazhu Dai¹•Institutions (1)

Shanghai University¹

19 Jan 2022-Algorithms

TL;DR: Wang et al. as mentioned in this paper proposed an evasion attack against stacked capsule autoencoder (SCAE), where a perturbation is generated based on the output of the object capsules in the model, it is added to an image to reduce the contribution of the objects related to the original category of the image so that the perturbed image will be misclassified.

...read moreread less

Abstract: Capsule networks are a type of neural network that use the spatial relationship between features to classify images. By capturing the poses and relative positions between features, this network is better able to recognize affine transformation and surpass traditional convolutional neural networks (CNNs) when handling translation, rotation, and scaling. The stacked capsule autoencoder (SCAE) is a state-of-the-art capsule network that encodes an image in capsules which each contain poses of features and their correlations. The encoded contents are then input into the downstream classifier to predict the image categories. Existing research has mainly focused on the security of capsule networks with dynamic routing or expectation maximization (EM) routing, while little attention has been given to the security and robustness of SCAEs. In this paper, we propose an evasion attack against SCAEs. After a perturbation is generated based on the output of the object capsules in the model, it is added to an image to reduce the contribution of the object capsules related to the original category of the image so that the perturbed image will be misclassified. We evaluate the attack using an image classification experiment on the Mixed National Institute of Standards and Technology Database (MNIST), Fashion-MNIST, and German Traffic Sign Recognition Benchmark (GTSRB) datasets, and the average attack success rate can reach 98.6%. The experimental results indicate that the attack can achieve high success rates and stealthiness. This finding confirms that the SCAE has a security vulnerability that allows for the generation of adversarial samples. Our work seeks to highlight the threat of this attack and focus attention on SCAE’s security.

...read moreread less

Cited by

PDF

Open Access

More filters

Posted Content•

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

[...]

Fanchao Qi, Yangyi Chen, Mukai Li, Zhiyuan Liu, Maosong Sun - Show less +1 more

20 Nov 2020-arXiv: Computation and Language

TL;DR: A simple and effective textual backdoor defense named ONION, which is based on outlier word detection and, to the best of the knowledge, is the first method that can handle all the textual backdoor attack situations.

...read moreread less

Abstract: Backdoor attacks, which are a kind of emergent training-time threat to deep neural networks (DNNS). They can manipulate the output of DNNs and posses high insidiousness. In the field of natural language processing, some attack methods have been proposed and achieve very high attack success rates on multiple popular models. Nevertheless, the studies on defending textual backdoor defense are little conducted. In this paper, we propose a simple and effective textual backdoor defense named ONION, which is based on outlier word detection and might be the first method that can handle all the attack situations. Experiments demonstrate the effectiveness of our model when blocking two latest backdoor attack methods.

...read moreread less

85 citations

Posted Content•

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review.

[...]

Yansong Gao, Bao Gia Doan, Zhi Zhang, Siqi Ma, Jiliang Zhang, Anmin Fu, Surya Nepal, Hyoungshick Kim - Show less +4 more

21 Jul 2020-arXiv: Cryptography and Security

TL;DR: This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning, and presents key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and more efficient and practical countermeasures are solicited.

...read moreread less

Abstract: This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.

...read moreread less

80 citations

Posted Content•

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

[...]

Ahmadreza Azizi¹, Ibrahim Asadullah Tahmid¹, Asim Waheed², Neal Mangaokar³, Jiameng Pu¹, Mobin Javed², Chandan K. Reddy¹, Bimal Viswanath¹ - Show less +4 more•Institutions (3)

Virginia Tech¹, Lahore University of Management Sciences², University of Michigan³

07 Mar 2021-arXiv: Cryptography and Security

TL;DR: T-Miner is presented -- a defense framework for Trojan attacks on DNN-based text classifiers that employs a sequence-to-sequence (seq-2-seq) generative model that probes the suspicious classifier and learns to produce text sequences that are likely to contain the Trojan trigger.

...read moreread less

Abstract: Deep Neural Network (DNN) classifiers are known to be vulnerable to Trojan or backdoor attacks, where the classifier is manipulated such that it misclassifies any input containing an attacker-determined Trojan trigger. Backdoors compromise a model's integrity, thereby posing a severe threat to the landscape of DNN-based classification. While multiple defenses against such attacks exist for classifiers in the image domain, there have been limited efforts to protect classifiers in the text domain. We present Trojan-Miner (T-Miner) -- a defense framework for Trojan attacks on DNN-based text classifiers. T-Miner employs a sequence-to-sequence (seq-2-seq) generative model that probes the suspicious classifier and learns to produce text sequences that are likely to contain the Trojan trigger. T-Miner then analyzes the text produced by the generative model to determine if they contain trigger phrases, and correspondingly, whether the tested classifier has a backdoor. T-Miner requires no access to the training dataset or clean inputs of the suspicious classifier, and instead uses synthetically crafted "nonsensical" text inputs to train the generative model. We extensively evaluate T-Miner on 1100 model instances spanning 3 ubiquitous DNN model architectures, 5 different classification tasks, and a variety of trigger phrases. We show that T-Miner detects Trojan and clean models with a 98.75% overall accuracy, while achieving low false positives on clean models. We also show that T-Miner is robust against a variety of targeted, advanced attacks from an adaptive attacker.

...read moreread less

35 citations

Proceedings Article•DOI•

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

[...]

Fanchao Qi¹, Yuan Yao², Sophia Xu¹, Zhiyuan Liu¹, Maosong Sun¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, McGill University²

01 Aug 2021

TL;DR: In this paper, the authors show that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections.

...read moreread less

Abstract: Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated, presenting serious security threats to real-world applications. Since existing textual backdoor attacks pay little attention to the invisibility of backdoors, they can be easily detected and blocked. In this work, we present invisible backdoors that are activated by a learnable combination of word substitution. We show that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections. The results raise a serious alarm to the security of NLP models, which requires further research to be resolved. All the data and code of this paper are released at https://github.com/thunlp/BkdAtk-LWS.

...read moreread less

23 citations

Proceedings Article•DOI•

Piccolo: Exposing Complex Backdoors in NLP Transformer Models

[...]

Yingqi Liu, Guangyu Shen, Guanhong Tao, Shengwei An, Shiqing Ma, Xiangyu Zhang - Show less +2 more

01 May 2022

TL;DR:

...read moreread less

Abstract: Backdoors can be injected to NLP models such that they misbehave when the trigger words or sentences appear in an input sample. Detecting such backdoors given only a subject model and a small number of benign samples is very challenging because of the unique nature of NLP applications, such as the discontinuity of pipeline and the large search space. Existing techniques work well for backdoors with simple triggers such as single character/word triggers but become less effective when triggers and models become complex (e.g., transformer models). We propose a new backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then uses optimization to invert a distribution of words denoting their likelihood in the trigger. It leverages a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words. Our evaluation on 3839 NLP models from the TrojAI competition and existing works with 7 state-of-art complex structures such as BERT and GPT, and 17 different attack types including two latest dynamic attacks, shows that our technique is highly effective, achieving over 0.9 detection accuracy in most scenarios and substantially outperforming two state-of-the-art scanners. Our submissions to TrojAI leaderboard achieve top performance in 2 out of the 3 rounds for NLP backdoor scanning.

...read moreread less

21 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse