scispace - formally typeset
Search or ask a question
Posted Content

Privacy-Preserving Deep Learning via Additively Homomorphic Encryption.

TL;DR: This work revisits the previous work by Shokri and Shmatikov (ACM CCS 2015) and builds an enhanced system with the following properties: no information is leaked to the server and accuracy is kept intact, compared with that of the ordinary deep learning system also over the combined dataset.
Abstract: We present a privacy-preserving deep learning system in which many learning participants perform neural network-based deep learning over a combined dataset of all, without revealing the participant...

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work introduces a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federatedLearning, and federated transfer learning, and provides a comprehensive survey of existing works on this subject.
Abstract: Today’s artificial intelligence still faces two major challenges. One is that, in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated-learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federated learning, and federated transfer learning. We provide definitions, architectures, and applications for the federated-learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allowing knowledge to be shared without compromising user privacy.

2,593 citations


Cites background or methods from "Privacy-Preserving Deep Learning vi..."

  • ...The above architecture is proved to protect data leakage against the semihonest server if gradient aggregation is done with SMC [9] or homomorphic encryption [51]....

    [...]

  • ...However, no security guarantee is provided and the leakage of these gradients may actually leak important data information [51] when exposed together with data structure, such as in the case of image pixels....

    [...]

  • ...A horizontal federated learning system typically assumes honest participants and security against an honest-but-curious server [9, 51]....

    [...]

  • ...The authors of [51] used additively homomorphic encryption to preserve the privacy of gradients and enhance the security of the system....

    [...]

  • ...A typical assumption is that the participants are honest whereas the server is honest but curious; therefore, no leakage of information from any participants to the server is allowed [51]....

    [...]

Posted Content
TL;DR: This work proposes building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.
Abstract: Today's AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.

1,317 citations


Cites background or methods from "Privacy-Preserving Deep Learning vi..."

  • ...al model parameters. Security Analysis. The above architecture is proved to protect data leakage against the semihonest server, if gradients aggregation is done with SMC [9] or Homomorphic Encryption [51]. But it may be subject to attack in another security model by a malicious participant training a Generative Adversarial Network (GAN) in the collaborative learning process [29]. 2.4.2 Vertical Federa...

    [...]

  • ...om an optimization algorithm like Stochastic Gradient Descent (SGD) [41, 58], however no security guarantee is provided and the leakage of these gradients may actually leak important data information [51] when exposed together with data structure such as in the case of image pixels. Researchers have considered the situation when one of the members of a federated learning system maliciously attacks oth...

    [...]

  • ...e centralized model together with other data owners. A secure aggregation scheme to protect the privacy of aggregated user updates under their federated learning framework is also introduced [9]. Ref [51] uses additively homomorphic encryption for model paramter aggregation to provide security against the central server. In[60],amulti-taskstylefederatedlearningsystemisproposedtoallowmultiplesitestocom...

    [...]

  • ...: X i = X j, Y i = Y j, I i , I j, ∀D i,D j,i , j (2) Security Definition.A horizontal federated learning system typically assumes honest participants and security against a honest-but-curious server [9, 51]. That is, only the server can compromise ACM Trans. Intell. Syst. Technol., Vol. 10, No. 2, Article 12. Publication date: February 2019. 12:6 Q. Yang et al. (a) Horizontal Federated Learning (b) Vert...

    [...]

  • ... or cloud server. A typical assumption is that the participants are honest whereas the server is honest-but-curious, therefore no leakage of information from any participants to the server is allowed [51]. The training process of such a system usually contain the following four steps: •Step 1: participants locally compute training gradients, mask a selection of gradients with encryption [51], differen...

    [...]

Posted Content
TL;DR: In a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved, this raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale.
Abstract: In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL

701 citations


Cites methods from "Privacy-Preserving Deep Learning vi..."

  • ...Although both the encryption techniques presented in [153] and [79] can prevent the curious server from extracting infor-...

    [...]

  • ...In [153], the homomorphic encryption technique is introduced to protect privacy of participants’ shared parameters from a honest-but-curious server....

    [...]

Journal ArticleDOI
TL;DR: This paper aims to provide a comprehensive study concerning FL’s security and privacy aspects that can help bridge the gap between the current state of federated AI and a future in which mass adoption is possible.

565 citations

Book ChapterDOI
21 Jun 2019
TL;DR: In this paper, the authors show that they can obtain the private training set from the publicly shared gradients, which is called deep leakage from gradient and practically validate the effectiveness of their algorithm on both computer vision and natural language processing tasks.
Abstract: Passing gradient is a widely used scheme in modern multi-node learning system (e.g, distributed training, collaborative learning). In a long time, people used to believe that gradients are safe to share: i.e, the training set will not be leaked by gradient sharing. However, in this paper, we show that we can obtain the private training set from the publicly shared gradients. The leaking only takes few gradient steps to process and can obtain the original training set instead of look-alike alternatives. We name this leakage as \textit{deep leakage from gradient} and practically validate the effectiveness of our algorithm on both computer vision and natural language processing tasks. We empirically show that our attack is much stronger than previous approaches and thereby and raise people's awareness to rethink the gradients' safety. We also discuss some possible strategies to defend this deep leakage.

450 citations

References
More filters
Proceedings Article
01 Jan 2010
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

7,244 citations

Book ChapterDOI
02 May 1999
TL;DR: A new trapdoor mechanism is proposed and three encryption schemes are derived : a trapdoor permutation and two homomorphic probabilistic encryption schemes computationally comparable to RSA, which are provably secure under appropriate assumptions in the standard model.
Abstract: This paper investigates a novel computational problem, namely the Composite Residuosity Class Problem, and its applications to public-key cryptography. We propose a new trapdoor mechanism and derive from this technique three encryption schemes : a trapdoor permutation and two homomorphic probabilistic encryption schemes computationally comparable to RSA. Our cryptosystems, based on usual modular arithmetics, are provably secure under appropriate assumptions in the standard model.

7,008 citations


"Privacy-Preserving Deep Learning vi..." refers background in this paper

  • ...For decryption and CPA security, see the paper [25]....

    [...]

01 Jan 2011
TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.
Abstract: Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

5,311 citations

Proceedings Article
03 Dec 2012
TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

3,475 citations


"Privacy-Preserving Deep Learning vi..." refers background in this paper

  • ...1) Asynchronous SGD (ASGD) [16], [27], No Privacy Protection: Both our system and that of [28] rely on the fact that neural networks can be trained via a variant of SGD called asynchronous SGD [16], [27] with data parallelism and model parallelism....

    [...]

  • ...Our system achieves identical accuracy to a corresponding deep learning system (i.e., asynchronous SGD (ASGD)) trained over the joint dataset of all participants....

    [...]

  • ...3) Our System: Our system can be called gradientsencrypted ASGD for the following reasons....

    [...]

  • ...5] can be called gradients-selective ASGD for the following reasons....

    [...]

  • ...1) Asynchronous SGD (ASGD) [16], [27], No Privacy Pro-...

    [...]

Proceedings ArticleDOI
24 Oct 2016
TL;DR: In this paper, the authors develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrate that they can train deep neural networks with nonconvex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
Abstract: Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

2,944 citations