Trustworthy AI
02 Jan 2021-
TL;DR: The tutorial on “Trustworthy AI” is proposed to address six critical issues in enhancing user and public trust in AI systems, namely: bias and fairness, explainability, robust mitigation of adversarial attacks, improved privacy and security in model building, and being decent.
Abstract: Modern AI systems are reaping the advantage of novel learning methods. With their increasing usage, we are realizing the limitations and shortfalls of these systems. Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, high opacity in terms of revealing the lineage of the system, how they were trained and tested, and under which parameters and conditions they can reliably guarantee a certain level of performance, are some of the most prominent limitations. Ensuring the privacy and security of the data, assigning appropriate credits to data sources, and delivering decent outputs are also required features of an AI system. We propose the tutorial on “Trustworthy AI” to address six critical issues in enhancing user and public trust in AI systems, namely: (i) bias and fairness, (ii) explainability, (iii) robust mitigation of adversarial attacks, (iv) improved privacy and security in model building, (v) being decent, and (vi) model attribution, including the right level of credit assignment to the data sources, model architectures, and transparency in lineage.
Citations
More filters
Posted Content•
[...]
TL;DR: In this article, the authors provide a systematic framework of socially responsible AI algorithms and discuss how to leverage this framework to improve societal well-being through protection, information, and prevention/mitigation.
Abstract: In the current era, people and society have grown increasingly reliant on artificial intelligence (AI) technologies. AI has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks for oppression and calamity. Discussions about whether we should (re)trust AI have repeatedly emerged in recent years and in many quarters, including industry, academia, healthcare, services, and so on. Technologists and AI researchers have a responsibility to develop trustworthy AI systems. They have responded with great effort to design more responsible AI algorithms. However, existing technical solutions are narrow in scope and have been primarily directed towards algorithms for scoring or classification tasks, with an emphasis on fairness and unwanted bias. To build long-lasting trust between AI and human beings, we argue that the key is to think beyond algorithmic fairness and connect major aspects of AI that potentially cause AI's indifferent behavior. In this survey, we provide a systematic framework of Socially Responsible AI Algorithms that aims to examine the subjects of AI indifference and the need for socially responsible AI algorithms, define the objectives, and introduce the means by which we may achieve these objectives. We further discuss how to leverage this framework to improve societal well-being through protection, information, and prevention/mitigation.
5 citations
[...]
TL;DR: In this article, the authors explored flaws within the robot's system, and analyzed these flaws to assess the overall alignment of the robot system design with the IEEE global standards on the design of ethically aligned trustworthy autonomous intelligent systems (IEEE A/IS Standards).
Abstract: The last few years have seen a strong movement supporting the need of having intelligent consumer products align with specific design guidelines for trustworthy artificial intelligence (AI). This global movement has led to multiple institutional recommendations for ethically aligned trustworthy design of the AI driven technologies, like consumer robots and autonomous vehicles. There has been prior research towards finding security and privacy related vulnerabilities within various types of social robots. However, none of these previous works has studied the implications of these vulnerabilities in terms of the robot design aligning with trustworthy AI. In an attempt to address this gap in existing literature, we have performed a unique research study with two social robots - Zumi and Cozmo. In this study, we have explored flaws within the robot's system, and have analyzed these flaws to assess the overall alignment of the robot system design with the IEEE global standards on the design of ethically aligned trustworthy autonomous intelligent systems (IEEE A/IS Standards). Our initial research shows that the vulnerabilities and design weaknesses, which we found in these robots, can lead to hacking, injection attacks, and other malfunctions that might affect the technology users negatively. We test the intelligent functionalities in these robots to find faults, and conduct a preliminary examination of how these flaws can potentially result in non-adherence with the IEEE A/IS principles. Through this novel study, we demonstrate our approach towards determining alignment of social robots with benchmarks for trustworthy AI, thereby creating a case for prospective design improvements to address unique risks leading to issues with robot ethics and trust.
[...]
TL;DR: In this paper , a scoping review surveys the literature to identify the problematic nature of adaptive autonomous systems with evolving functionality (AASEFs), the ethical worries that they generate, and the ethical principles affected.
Abstract: The development of adaptive autonomous systems with evolving functionality (AASEFs) differs from their technological predecessors due to their changing, rather than static, architectures and processes; subsequently, their development, deployment, and implementation creates novel ethical issues. Our scoping review surveys the literature to identify the problematic nature of AASEFs, the ethical worries that they generate, and the ethical principles affected. Our literature examination also ascertains stakeholder needs and solutions regarding the trust and trustworthiness of ASSEFs. The characteristics of non-explicability and fluctuating behaviour are predominantly problematic of AASEFs. The literature advocates for AASEF development incorporating ‘ethics by design’ with enforceable standards supported via regulatory and legal structures..
References
More filters
[...]
Abstract: Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by \emph{adversarial examples} that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
969 citations
[...]
TL;DR: In this paper, the authors review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial samples, and propose a taxonomy of these methods.
Abstract: With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks (DNNs) have been recently found vulnerable to well-designed input samples called adversarial examples . Adversarial perturbations are imperceptible to human but can easily fool DNNs in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying DNNs in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.
901 citations
[...]
TL;DR: In this paper, the authors describe a method to produce a network where current methods such as DeepFool have great difficulty producing adversarial samples, and provide a reasonable analyses that their construction is difficult to defeat, and show experimentally that their method is hard to defeat with both Type I and Type II attacks using several standard networks and datasets.
Abstract: We describe a method to produce a network where current methods such as DeepFool have great difficulty producing adversarial samples. Our construction suggests some insights into how deep networks work. We provide a reasonable analyses that our construction is difficult to defeat, and show experimentally that our method is hard to defeat with both Type I and Type II attacks using several standard networks and datasets. This SafetyNet architecture is used to an important and novel application SceneProof, which can reliably detect whether an image is a picture of a real scene or not. SceneProof applies to images captured with depth maps (RGBD images) and checks if a pair of image and depth map is consistent. It relies on the relative difficulty of producing naturalistic depth maps for images in post processing. We demonstrate that our SafetyNet is robust to adversarial examples built from currently known attacking approaches.
207 citations
[...]
TL;DR: Random Self-Ensemble (RSE) as mentioned in this paper adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance.
Abstract: Recent studies have revealed the vulnerability of deep neural networks: A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network misclassify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: randomness and ensemble. To protect a targeted model, RSE adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models \(f_\epsilon \) without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has a good predictive capability. Our algorithm significantly outperforms previous defense techniques on real data sets. For instance, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the strong C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than 10%, the best previous defense technique has \(48\%\) accuracy, while our method still has \(86\%\) prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network.
188 citations
[...]
TL;DR: The theoretical foundations, algorithms, and applications of adversarial attack techniques are introduced and a few research efforts on the defense techniques are described, which cover the broad frontier in the field.
Abstract: With the rapid developments of artificial intelligence (AI) and deep learning (DL) techniques, it is critical to ensure the security and robustness of the deployed algorithms. Recently, the security vulnerability of DL algorithms to adversarial samples has been widely recognized. The fabricated samples can lead to various misbehaviors of the DL models while being perceived as benign by humans. Successful implementations of adversarial attacks in real physical-world scenarios further demonstrate their practicality. Hence, adversarial attack and defense techniques have attracted increasing attention from both machine learning and security communities and have become a hot research topic in recent years. In this paper, we first introduce the theoretical foundations, algorithms, and applications of adversarial attack techniques. We then describe a few research efforts on the defense techniques, which cover the broad frontier in the field. Several open problems and challenges are subsequently discussed, which we hope will provoke further research efforts in this critical area.
140 citations
"Trustworthy AI" refers background in this paper
[...]
Related Papers (5)
[...]
[...]
[...]
[...]