scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Toward Mitigating Adversarial Texts

17 Sep 2019-International Journal of Computer Applications (Foundation of Computer Science (FCS), NY, USA)-Vol. 178, Iss: 50, pp 1-7
TL;DR: This paper proposes a defense against black-box adversarial attacks using a spell-checking system that utilizes frequency and contextual information for correction of nonword misspellings and outperforms six of the publicly available, state-of-the-art spelling correction tools.
Abstract: Neural networks are frequently used for text classification, but can be vulnerable to misclassification caused by adversarial examples: input produced by introducing small perturbations that cause the neural network to output an incorrect classification. Previous attempts to generate black-box adversarial texts have included variations of generating nonword misspellings, natural noise, synthetic noise, along with lexical substitutions. This paper proposes a defense against black-box adversarial attacks using a spell-checking system that utilizes frequency and contextual information for correction of nonword misspellings. The proposed defense is evaluated on the Yelp Reviews Polarity and the Yelp Reviews Full datasets using adversarial texts generated by a variety of recent attacks. After detecting and recovering the adversarial texts, the proposed defense increases the classification accuracy by an average of 26.56% on the Yelp Reviews Polarity dataset and 16.27% on the Yelp Reviews Full dataset. This approach further outperforms six of the publicly available, state-of-the-art spelling correction tools by at least 25.56% in terms of average correction accuracy.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Recent approaches for generating adversarial texts are summarized and a taxonomy to categorize them are proposed and a comprehensive review of their use to improve the robustness of DNNs in NLP applications is presented.
Abstract: Deep learning models have achieved great success in solving a variety of natural language processing (NLP) problems. An ever-growing body of research, however, illustrates the vulnerability of deep neural networks (DNNs) to adversarial examples — inputs modified by introducing small perturbations to deliberately fool a target model into outputting incorrect results. The vulnerability to adversarial examples has become one of the main hurdles precluding neural network deployment into safety-critical environments. This paper discusses the contemporary usage of adversarial examples to foil DNNs and presents a comprehensive review of their use to improve the robustness of DNNs in NLP applications. In this paper, we summarize recent approaches for generating adversarial texts and propose a taxonomy to categorize them. We further review various types of defensive strategies against adversarial examples, explore their main challenges, and highlight some future research directions.

99 citations

Journal ArticleDOI
Li Jiao1, Yang Liu1, Tao Chen1, Zhen Xiao1, Zhenjiang Li1, Jianping Wang1 
TL;DR: A general working flow for adversarial attacks on CPSs is introduced and a clear taxonomy is provided to organize existing attacks effectively and indicate where the defenses can be potentially performed in CPSs as well.
Abstract: Cyber-security issues on adversarial attacks are actively studied in the field of computer vision with the camera as the main sensor source to obtain the input image or video data. However, in modern cyber–physical systems (CPSs), many other types of sensors are becoming popularly used, such as surveillance sensors, microphones, and textual interfaces. A series of recent works investigates the adversarial attacks and the potential defenses in these noncamera sensor-based CPSs. Therefore, this article provides a systematic discussion on these existing works and serves as a complimentary summary of the adversarial attacks and defenses for CPSs beyond the field of computer vision. We first introduce a general working flow for adversarial attacks on CPSs. On this basis, a clear taxonomy is provided to organize existing attacks effectively and indicate where the defenses can be potentially performed in CPSs as well. Then, we discuss these existing attacks and defenses with detailed comparison studies. Finally, we point out concrete research opportunities to be further explored along this research direction.

44 citations


Cites background or methods from "Toward Mitigating Adversarial Texts..."

  • ...In addition, Rajaratnam and Kalita [40] proposed a method in which a frequency band of audio signal input is flooded with random noise so that the adversarial examples can be more easily detected....

    [...]

  • ...Alshemali and Kalita [3] proposed a spell-checking system using the contextual and frequency information to correct misspellings....

    [...]

  • ...For textual data, the data feature analysis-based defense has been applied in [3] and [52]....

    [...]

  • ...Alshemali and Kalita [3] proposed a spell-checking system using the contex-...

    [...]

  • ...Moreover, the classification accuracy of [3] and the true positive rate (TPR) of [59] are 88....

    [...]

Proceedings ArticleDOI
10 Apr 2022
TL;DR: A model-agnostic detector of adversarial text examples that identifies patterns in the logits of the target classifier when perturbing the input text and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.
Abstract: Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-of-the-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.

9 citations

Proceedings ArticleDOI
01 Nov 2020
TL;DR: Results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data, and can improve the robustness of nonneural models.
Abstract: Studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples – perturbed inputs that cause DNN-based models to produce incorrect results. One robust adversarial attack in the NLP domain is the synonym substitution. In attacks of this variety, the adversary substitutes words with synonyms. Since synonym substitution perturbations aim to satisfy all lexical, grammatical, and semantic constraints, they are difficult to detect with automatic syntax check as well as by humans. In this paper, we propose a structure-free defensive method that is capable of improving the performance of DNN-based models with both clean and adversarial data. Our findings show that replacing the embeddings of the important words in the input samples with the average of their synonyms’ embeddings can significantly improve the generalization of DNN-based classifiers. By doing so, we reduce model sensitivity to particular words in the input samples. Our results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data. On average, the proposed defense improved the classification accuracy of the CNN and Bi-LSTM models by 41.30% and 55.66%, respectively, when tested under adversarial attacks. Extended investigation shows that our defensive method can improve the robustness of nonneural models, achieving an average of 17.62% and 22.93% classification accuracy increase on the SVM and XGBoost models, respectively. The proposed defensive method has also shown an average of 26.60% classification accuracy improvement when tested with the infamous BERT model. Our algorithm is generic enough to be applied in any NLP domain and to any model trained on any natural language.

7 citations


Cites result from "Toward Mitigating Adversarial Texts..."

  • ...This supports the conclusion from previous research that, in the NLP domain, deep CNNs tend to be more robust than RNN models (Ren et al., 2019; Alshemali and Kalita, 2019)....

    [...]

Journal Article
TL;DR: This paper proposes Continuous Word2Vec (CW2V), a data-driven method to learn word embeddings that ensures that perturbations of words haveembeddings similar to those of the original words, and shows that CW2V embeds are generally more robust to text perturbation than embeddins based on character ngram.
Abstract: Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text perturbations. In this paper, we classify intentional and unintentional adversarial text perturbation into ten types and propose a deobfuscation pipeline to make NLP models robust to such perturbations. We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words. We show that CW2V embeddings are generally more robust to text perturbations than embeddings based on character ngram. Our robust classification pipeline combines deobfuscation and classification, using proposed defense methods and word embeddings to classify whether Facebook posts are requesting engagement such as likes. Our pipeline results in engagement bait classification that goes from 0.70 to 0.67 AUC with adversarial text perturbation, while character ngram-based word embedding methods result in downstream classification that goes from

4 citations

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

26,458 citations


"Toward Mitigating Adversarial Texts..." refers background in this paper

  • ...Deep neural networks (DNNs) have gained popularity in image classification [17], object recognition [28], and malware detection [26]....

    [...]

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations