scispace - formally typeset
Search or ask a question
Journal ArticleDOI

DS-ADMM++: A Novel Distributed Quantized ADMM to Speed up Differentially Private Matrix Factorization

TL;DR: Wang et al. as mentioned in this paper integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property and introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency.
Abstract: Matrix factorization is a powerful method to implement collaborative filtering recommender systems. This article addresses two major challenges, privacy and efficiency, which matrix factorization is facing. We based our work on DS-ADMM, a distributed matrix factorization algorithm with decent efficiency, to achieve the following two pieces of work: (1) Integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property; (2) Introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency. We named our work DS-ADMM++, in which one ’+’ refers to differential privacy, and the other ’+’ refers to quantized techniques. DS-ADMM++ is the first to perform efficient and private matrix factorization under the scenarios of differential privacy and DS-ADMM. We conducted experiments with benchmark data sets to demonstrate that our approach provides differential privacy and excellent scalability with a decent loss of accuracy.
Citations
More filters
Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a proximal alternating-direction-method-of-multipliers-based nonnegative latent factor analysis (PAN) model with two-fold ideas: 1) adopting the principle of alternating direction-method of multipliers to implement an efficient learning scheme for fast convergence and high computational efficiency; and 2) incorporating the proximal regularization into the learning scheme to suppress the optimization fluctuation for high representation learning accuracy to HDI data.
Abstract: High-dimensional and incomplete (HDI) data subject to the nonnegativity constraints are commonly encountered in a big data-related application concerning the interactions among numerous nodes. A nonnegative latent factor analysis (NLFA) model can perform representation learning to HDI data efficiently. However, existing NLFA models suffer from either slow convergence rate or representation accuracy loss. To address this issue, this paper proposes a proximal alternating-direction-method-of-multipliers-based nonnegative latent factor analysis (PAN) model with two-fold ideas: 1) adopting the principle of alternating-direction-method-of-multipliers to implement an efficient learning scheme for fast convergence and high computational efficiency; and 2) incorporating the proximal regularization into the learning scheme to suppress the optimization fluctuation for high representation learning accuracy to HDI data. Theoretical studies verify that PAN converges to a Karush-Kuhn-Tucker (KKT) stationary point of its nonnegativity-constrained learning objective with its learning scheme. Experimental results on eight HDI matrices from real applications demonstrate that the proposed PAN model outperforms several state-of-the-art models in both estimation accuracy for missing data of an HDI matrix and computational efficiency.

3 citations

Journal ArticleDOI
TL;DR: A detailed analysis of state of the art for collaborative ML approaches from a privacy perspective is provided in this paper , where a detailed threat model and security and privacy considerations are given for each collaborative method.
Abstract: As machine learning and artificial intelligence (ML/AI) are becoming more popular and advanced, there is a wish to turn sensitive data into valuable information via ML/AI techniques revealing only data that is allowed by concerned parties or without revealing any information about the data to third parties. Collaborative ML approaches like federated learning (FL) help tackle these needs and concerns, bringing a way to use sensitive data without disclosing critically sensitive features of that data. In this paper, we provide a detailed analysis of state of the art for collaborative ML approaches from a privacy perspective. A detailed threat model and security and privacy considerations are given for each collaborative method.We deeply analyze Privacy Enhancing Technologies (PETs), covering secure multi-party computation (SMPC), homomorphic encryption (HE), differential privacy (DP), and confidential computing (CC) in the context of collaborative ML. We introduce a guideline on the selection of the privacy preserving technologies for collaborative ML and privacy practitioners. This study constitutes the first survey to provide an in-depth focus on collaborative ML requirements and constraints for privacy solutions while also providing guidelines on the selection of PETs.

2 citations

Journal ArticleDOI
TL;DR: A practical SLF (PSLF) model is proposed, which realizes hyperparameter self-adaptation with a distributed particle swarm optimizer (DPSO), which is gradient-free and parallelized and indicates that PSLF model has a competitive advantage over state-of-the-art models in data representation ability.
Abstract: — Latent Factor (LF) models are effective in representing high-dimension and sparse (HiDS) data via low-rank matrices approximation. Hessian-free (HF) optimization is an efficient method to utilizing second-order information of an LF model’s objective function and it has been utilized to optimize second-order LF (SLF) model. However, the low-rank representation ability of a SLF model heavily relies on its multiple hyperparameters. Determining these hyperparameters is time-consuming and it largely reduces the practicability of an SLF model. To address this issue, a practical SLF (PSLF) model is proposed in this work. It realizes hyperparameter self-adaptation with a distributed particle swarm optimizer (DPSO), which is gradient-free and parallelized. Experiments on real HiDS data sets indicate that PSLF model has a competitive advantage over state-of-the-art models in data representation ability.
Proceedings ArticleDOI
15 Dec 2022
TL;DR: In this article , a distributed adaptive SLF (DASLF) model is proposed, which realizes hyperparameter self-adaptation with a distributed particle swarm optimizer (DPSO), which is gradient-free and parallelized.
Abstract: Latent Factor (LF) models are effective in representing high-dimension and sparse (HiDS) data via low-rank matrices approximation. Building an LF model is a large-scale non-convex problem. Hessian-free (HF) optimization is an efficient method to utilizing second-order information of an LF model's objective function and it has been utilized to optimize second-order LF (SLF) model. However, the low-rank representation ability of a SLF model heavily relies on its multiple hyperparameters. Determining these hyperparameters is time-consuming and it largely reduces the practicability of an SLF model. To address this issue, a distributed adaptive SLF (DASLF) model is proposed in this work. It realizes hyperparameter self-adaptation with a distributed particle swarm optimizer (DPSO), which is gradient-free and parallelized. Experiments on real HiDS data sets indicate that DASLF model has a competitive advantage over state-of-the-art models in data representation ability.
Proceedings ArticleDOI
17 Jul 2022
TL;DR: In this paper , the authors proposed a deblurring algorithm based on deep unfolding method, which is the combination of traditional algorithms and neural networks, which can achieve good performance and be interpretable at the same time.
Abstract: Due to the atmospheric turbulence, defocusing, noise and other factors, the optical remote sensing image acquisition may become blurred. Therefore, it is critical of deblurring the images by algorithm. In recent years, neural network algorithms have shown excellent performance in optical re-mote sensing images deblurring. However, neural network algorithms have some limitations at the same time. They lack interpretability and need large amounts of training samples. The traditional deblurring algorithms are interpretable, but the performance is not as good as the neural network algorithms. In order to obtain an interpretable deblurring algorithm with good performance, this paper proposes a deblurring algorithm based on deep unfolding method, which is the combination of traditional algorithms and neural networks. It can achieve good performance and be interpretable at the same time. We demonstrate the effectiveness of the algorithm on remote sensing datasets with PSNR values and visual deblurring images. The experiments show the proposed algorithm has better deblurring results.
References
More filters
Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations

Journal ArticleDOI
TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Abstract: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels

9,583 citations

Book ChapterDOI
04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
Abstract: We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = ∑ig(xi), where xi denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

6,211 citations

Book ChapterDOI
Cynthia Dwork1
10 Jul 2006
TL;DR: In this article, the authors give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved, and suggest a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database.
Abstract: In 1977 Dalenius articulated a desideratum for statistical databases: nothing about an individual should be learnable from the database that cannot be learned without access to the database. We give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved. Contrary to intuition, a variant of the result threatens the privacy even of someone not in the database. This state of affairs suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database. The techniques developed in a sequence of papers [8, 13, 3], culminating in those described in [12], can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy

4,134 citations

Proceedings ArticleDOI
18 May 2008
TL;DR: This work applies the de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service, and demonstrates that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset.
Abstract: We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information

2,241 citations