scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

SmartBox: Benchmarking Adversarial Detection and Mitigation Algorithms for Face Recognition

01 Oct 2018-pp 1-7
TL;DR: SmartBox is a python based toolbox which provides an open source implementation of adversarial detection and mitigation algorithms against face recognition and provides a platform to evaluate newer attacks, detection models, and mitigation approaches on a common face recognition benchmark.
Abstract: Deep learning models are widely used for various purposes such as face recognition and speech recognition. However, researchers have shown that these models are vulnerable to adversarial attacks. These attacks compute perturbations to generate images that decrease the performance of deep learning models. In this research, we have developed a toolbox, termed as SmartBox, for benchmarking the performance of adversarial attack detection and mitigation algorithms against face recognition. SmartBox is a python based toolbox which provides an open source implementation of adversarial detection and mitigation algorithms. In this research, Extended Yale Face Database B has been used for generating adversarial examples using various attack algorithms such as DeepFool, Gradient methods, Elastic-Net, and $L_{2}$ attack. SmartBox provides a platform to evaluate newer attacks, detection models, and mitigation approaches on a common face recognition benchmark. To assist the research community, the code of SmartBox is made available11http://iab-rubric.org/resources/SmartBox.html.
Citations
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an efficient adversarial training method with loss guided propagation (ATLGP), which takes the loss value of generated adversarial examples as guidance to control the number of propagations for each training instance at different training stages, which can achieve comparable robustness with less time than traditional training methods.
Abstract: Adversarial training is effective to train robust image classification models. To improve the robustness, existing approaches often use many propagations to generate adversarial examples, which have high time consumption. In this work, we propose an efficient adversarial training method with loss guided propagation (ATLGP) to accelerate the adversarial training process. ATLGP takes the loss value of generated adversarial examples as guidance to control the number of propagations for each training instance at different training stages, which decreases the computation while keeping the strength of generated adversarial examples. In this way, our method can achieve comparable robustness with less time than traditional training methods. It also has good generalization ability and can be easily combined with other efficient training methods. We conduct comprehensive experiments on CIFAR10 and MNIST, the standard datasets for several benchmarks. The experimental results show that ATLGP reduces 30% to 60% training time compared with other baseline methods while achieving similar robustness against various adversarial attacks. The combination of ATLGP and ATTA (an efficient adversarial training method) achieves superior acceleration potential when robustness meets high requirements. The statistical propagation in different training stages and ablation studies prove the effectiveness of applying loss guided propagation for each training instance. The acceleration technique can more easily extend adversarial training methods to large-scale datasets and more diverse model architectures such as vision transformers. • We propose an novel formulation to define the efficient adversarial training problem. • ATLGP considers the difference of adversarial attributes among training instance. • Extensive experiments are conducted to prove the outstanding performance of ATLGP.

1 citations

Journal ArticleDOI
TL;DR: In this article , the authors proposed DAMAD, a generalized perturbation detection algorithm which is agnostic to model architecture, training data set, and loss function used during training, which is based on the fusion of autoencoder embedding and statistical texture features extracted from convolutional neural networks.
Abstract: Adversarial perturbations have demonstrated the vulnerabilities of deep learning algorithms to adversarial attacks. Existing adversary detection algorithms attempt to detect the singularities; however, they are in general, loss-function, database, or model dependent. To mitigate this limitation, we propose DAMAD-a generalized perturbation detection algorithm which is agnostic to model architecture, training data set, and loss function used during training. The proposed adversarial perturbation detection algorithm is based on the fusion of autoencoder embedding and statistical texture features extracted from convolutional neural networks. The performance of DAMAD is evaluated on the challenging scenarios of cross-database, cross-attack, and cross-architecture training and testing along with traditional evaluation of testing on the same database with known attack and model. Comparison with state-of-the-art perturbation detection algorithms showcase the effectiveness of the proposed algorithm on six databases: ImageNet, CIFAR-10, Multi-PIE, MEDS, point and shoot challenge (PaSC), and MNIST. Performance evaluation with nearly a quarter of a million adversarial and original images and comparison with recent algorithms show the effectiveness of the proposed algorithm.

1 citations

Proceedings ArticleDOI
31 May 2022
TL;DR: The authors performed an extensive adversarial audit on multiple systems and datasets, making a number of concerning observations, including that there has been a drop in accuracy for some tasks on CelebSET dataset since a previous audit.
Abstract: Computer vision applications like automated face detection are used for a variety of purposes ranging from unlocking smart devices to tracking potential persons of interest for surveillance. Audits of these applications have revealed that they tend to be biased against minority groups which result in unfair and concerning societal and political outcomes. Despite multiple studies over time, these biases have not been mitigated completely and have in fact increased for certain tasks like age prediction. While such systems are audited over benchmark datasets, it becomes necessary to evaluate their robustness for adversarial inputs. In this work, we perform an extensive adversarial audit on multiple systems and datasets, making a number of concerning observations- there has been a drop in accuracy for some tasks on CelebSET dataset since a previous audit. While there still exists a bias in accuracy against individuals from minority groups for multiple datasets, a more worrying observation is that these biases tend to get exorbitantly pronounced with adversarial inputs toward the minority group. We conclude with a discussion on the broader societal impacts in light of these observations and a few suggestions on how to collectively deal with this issue.

1 citations

Proceedings ArticleDOI
18 May 2021
TL;DR: Zhang et al. as discussed by the authors used Gabor convolutional layers to enhance the robustness of object detection against adversarial perturbations, such as TOG vanishing, TOG fabrication, and TOG mislabeling attacks.
Abstract: Adversarial attacks are one of the most critical threats in the machine learning field, which raises doubts about the application of deep neural networks (DNNs). Despite the recent advances in DNNs, the adversarial robustness in DNNs has yet to reach an acceptable level, especially against different kinds of perturbations. In this paper, we aim to enhance the robustness of object detection against adversarial perturbations. To this end, we adversarially train YOLOv3 model with different backbones by means of parameterized Gabor convolutional layers. To assess the robustness of our trained models, we have adopted TOG vanishing, TOG fabrication, and TOG mislabeling adversarial attacks. These perturbations are crafted on PASCAL VOC and MSCOCO datasets to simulate three types of targeted specificity, including object-vanishing, object-fabrication, and object-mislabeling, respectively. Extensive evaluations demonstrate that our model equipped with the Gabor filters gain consideration adversarial robustness in addition to the high generalization performance on clean data.

1 citations

Posted Content
TL;DR: In this paper, the authors propose a method that allows two networks to learn from each others' adversarial examples and become resilient to black-box attacks, and combine this method with a simple domain adaptation to further improve the performance.
Abstract: Adversarial examples are maliciously tweaked images that can easily fool machine learning techniques, such as neural networks, but they are normally not visually distinguishable for human beings One of the main approaches to solve this problem is to retrain the networks using those adversarial examples, namely adversarial training However, standard adversarial training might not actually change the decision boundaries but cause the problem of gradient masking, resulting in a weaker ability to generate adversarial examples Therefore, it cannot alleviate the problem of black-box attacks, where adversarial examples generated from other networks can transfer to the targeted one In order to reduce the problem of black-box attacks, we propose a novel method that allows two networks to learn from each others' adversarial examples and become resilient to black-box attacks We also combine this method with a simple domain adaptation to further improve the performance

1 citations

References
More filters
Posted Content
TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

11,866 citations


"SmartBox: Benchmarking Adversarial ..." refers background in this paper

  • ...Deep learning models have achieved state-of-the-art performance in various computer vision related tasks such as object detection and face recognition [18, 24]....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

11,732 citations

Proceedings Article
01 Jan 2014
TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

9,561 citations


"SmartBox: Benchmarking Adversarial ..." refers background or methods in this paper

  • ...Adversarial Training: In adversarial training [33], a new model is trained using the original dataset and adversarial examples with their correct labels....

    [...]

  • ...[33] Trains a new model on original and adversarial training images....

    [...]

Proceedings Article
20 Mar 2015
TL;DR: It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Abstract: Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.

7,994 citations


"SmartBox: Benchmarking Adversarial ..." refers background or methods in this paper

  • ...FGSM [15]: It computes the gradient of the loss function of the model concerning the image vector to get the direction of pixel change....

    [...]

  • ...[15] Computes gradient of the loss function w....

    [...]

  • ...While whitebox attacks such as ElasticNet (EAD) [6], DeepFool [28], L2 [5], Fast Gradient Sign Method (FGSM) [15], Projective Gradient Descent (PGD) [26], and MI-FGSM [10] have complete access and information about the trained network, blackbox attacks such as one pixel attack [32] and universal perturbations [27] have no information about the trained Deep Neural Network (DNN)....

    [...]

  • ...While whitebox attacks such as ElasticNet (EAD) [6], DeepFool [28], L2 [5], Fast Gradient Sign Method (FGSM) [15], Projective Gradient Descent (PGD) [26], and MI-FGSM [10] have complete access and information about the trained network, blackbox attacks such as one pixel attack [32] and universal perturbations [27]...

    [...]

  • ...FGSM perturbations can be computed by minimizing either the L1, L2 or L∞ norm....

    [...]

Proceedings ArticleDOI
22 May 2017
TL;DR: In this paper, the authors demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability.
Abstract: Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x' that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

6,528 citations