scispace - formally typeset
Search or ask a question
Author

Ruixin Guo

Bio: Ruixin Guo is an academic researcher from China University of Geosciences (Wuhan). The author has contributed to research in topics: Matrix decomposition & Homomorphic encryption. The author has an hindex of 2, co-authored 6 publications receiving 11 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property and introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency.
Abstract: Matrix factorization is a powerful method to implement collaborative filtering recommender systems. This article addresses two major challenges, privacy and efficiency, which matrix factorization is facing. We based our work on DS-ADMM, a distributed matrix factorization algorithm with decent efficiency, to achieve the following two pieces of work: (1) Integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property; (2) Introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency. We named our work DS-ADMM++, in which one ’+’ refers to differential privacy, and the other ’+’ refers to quantized techniques. DS-ADMM++ is the first to perform efficient and private matrix factorization under the scenarios of differential privacy and DS-ADMM. We conducted experiments with benchmark data sets to demonstrate that our approach provides differential privacy and excellent scalability with a decent loss of accuracy.

13 citations

Journal ArticleDOI
TL;DR: This work formally proves the feasibility of BaPa by observing the variance of rating numbers across blocks, and empirically validate its soundness by applying it to two standard parallel matrix factorization algorithms, DSGD and CCD++.
Abstract: A simplified approach to accelerate matrix factorization of big data is to parallelize it. A commonly used method is to divide the matrix into multiple non-intersecting blocks and concurrently calculate them. This operation causes the Load balance problem, which significantly impacts parallel performance and is a big concern. A general belief is that the load balance across blocks is impossible by balancing rows and columns separately. We challenge the belief by proposing an approach of “Balanced Partitioning (BaPa)”. We demonstrate under what circumstance independently balancing rows and columns can lead to the balanced intersection of rows and columns, why, and how. We formally prove the feasibility of BaPa by observing the variance of rating numbers across blocks, and empirically validate its soundness by applying it to two standard parallel matrix factorization algorithms, DSGD and CCD++. Besides, we establish a mathematical model of “Imbalance Degree” to explain further why BaPa works well. BaPa is applied to synchronous parallel matrix factorization, but as a general load balance solution, it has significant application potential.

9 citations

Journal ArticleDOI
TL;DR: This paper defines a tighter bound for binomial distribution and central limit theorem, and indicates that the reliability of the bound is related to the deviation of data, which can be measured by the data’s coefficient of standard deviation.
Abstract: A count-min sketch is a probabilistic data structure, which serves as a frequency table of events to process a stream of big data. It uses hash functions to map events to frequencies. Querying a count-min sketch returns the targeted event along with an estimated frequency, which is not less than the actual frequency. The estimated error, i.e., the difference between the estimated frequency and the actual, can be measured by a pre-defined confidence bound. However, the bound originally defined is too loose. The reason is that the Markov inequality used to derive the bound does not perform well. In this paper, based on binomial distribution and central limit theorem, we define a tighter bound. We indicate that the reliability of the bound is related to the deviation of data, which can be measured by the data’s coefficient of standard deviation. Our extensive experiments well support the effectiveness and efficiency of the new bound.

4 citations

Proceedings ArticleDOI
01 Aug 2019
TL;DR: This paper reveals how to improve further the performance of FHEW-V2 by specifically focusing on the optimization of a homomorphic full adder, and leverages the computing power of multicore CPU and GPUs to remove the hotspots to improve performance.
Abstract: The latest implementation of the fully homomorphic encryption algorithm FHEW, FHEW-V2, takes about 0.12 seconds for a bootstrapping on a single-node computer. It seems much faster than the previous implementations. However, a 30-bit homomorphic full adder requires 270 times of bootstrapping, plus the time spent on key generation, the total elapsed time will be 55 seconds, which is unacceptable. In this paper, we reveal how to improve further the performance of FHEW-V2 by specifically focusing on the optimization of a homomorphic full adder. We strive to tackle the inefficiency in FHEW-V2 by massive efforts: first, we explore the reference codes for FHEW-V2 and find out hotspot codes for performance optimization; second, we leverage the computing power of multicore CPU and GPUs to remove the hotspots to improve performance. The empirical results so far show that a 30-bit homomorphic full adder is completed in 24 seconds after optimization, gaining an overall speedup of 2.2845. The 2.2845 speedup is the integration of a 13.248 speedup for the key generation and a 1.672 speedup for the bootstrapping.

4 citations

Proceedings ArticleDOI
01 Aug 2019
TL;DR: The novelty of the work rests in that it is the first to successfully integrate the two innovative techniques to address both privacy and efficiency for MF.
Abstract: Matrix factorization (MF) is an essential technique to implement intelligent recommender systems widely applied in industry. Privacy and efficiency are two essential issues concerning MF. We leverage two techniques, differential privacy (DP) and distributed computing, to address the two concerns, respectively. (1) Differentially private MF is still challenging since conventional strategies lead to significant error accumulation; we adopt the objective function perturbation technique to tackle such a challenge. (2) We adopt the alternating direction method of multipliers (ADMM) framework to parallelize the factorization to improve performance; to implement this parallelization, we adopt the effective matrix split method and introduce a novel integration strategy for distributed DP based on the post-processing theorem. We identify our work as distributed differentially private MF based on ADMM. The novelty of the work rests in that it is the first to successfully integrate the two innovative techniques to address both privacy and efficiency for MF. We establish the mathematical model and conduct experiments to validate the soundness of our idea. The experimental results based on industrial datasets show that the distributed differentially private MF algorithm provides scalable speedup performance within a limited precision loss while preserving user privacy.

3 citations


Cited by
More filters
Posted Content
TL;DR: This survey provides a comprehensive and structured overview of the local differential privacy technology and summarise and analyze state-of-the-art research in LDP and compare a range of methods in the context of answering a variety of queries and training different machine learning models.
Abstract: With the fast development of Information Technology, a tremendous amount of data have been generated and collected for research and analysis purposes. As an increasing number of users are growing concerned about their personal information, privacy preservation has become an urgent problem to be solved and has attracted significant attention. Local differential privacy (LDP), as a strong privacy tool, has been widely deployed in the real world in recent years. It breaks the shackles of the trusted third party, and allows users to perturb their data locally, thus providing much stronger privacy protection. This survey provides a comprehensive and structured overview of the local differential privacy technology. We summarise and analyze state-of-the-art research in LDP and compare a range of methods in the context of answering a variety of queries and training different machine learning models. We discuss the practical deployment of local differential privacy and explore its application in various domains. Furthermore, we point out several research gaps, and discuss promising future research directions.

68 citations

01 Dec 2004
TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.
Abstract: We introduce a new sublinear space data structure--the count-min sketch--for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known--typically from 1/e2 to 1/e in factor.

65 citations

Posted Content
TL;DR: The objective is to improve the performance of FHE schemes by designing efficient parallel frameworks and chooses Torus Fully Homomorphic Encryption (TFHE) as it offers exact results for an infinite number of boolean gate evaluations.
Abstract: Fully Homomorphic Encryption (FHE) is one of the most promising technologies for privacy protection as it allows an arbitrary number of function computations over encrypted data. However, the computational cost of these FHE systems limits their widespread applications. In this paper, our objective is to improve the performance of FHE schemes by designing efficient parallel frameworks. In particular, we choose Torus Fully Homomorphic Encryption (TFHE) as it offers exact results for an infinite number of boolean gate (e.g., AND, XOR) evaluations. We first extend the gate operations to algebraic circuits such as addition, multiplication, and their vector and matrix equivalents. Secondly, we consider the multi-core CPUs to improve the efficiency of both the gate and the arithmetic operations. Finally, we port the TFHE to the Graphics Processing Units (GPU) and device novel optimizations for boolean and arithmetic circuits employing the multitude of cores. We also experimentally analyze both the CPU and GPU parallel frameworks for different numeric representations (16 to 32-bit). Our GPU implementation outperforms the existing technique, and it achieves a speedup of 20x for any 32-bit boolean operation and 14.5x for multiplications.

21 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property and introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency.
Abstract: Matrix factorization is a powerful method to implement collaborative filtering recommender systems. This article addresses two major challenges, privacy and efficiency, which matrix factorization is facing. We based our work on DS-ADMM, a distributed matrix factorization algorithm with decent efficiency, to achieve the following two pieces of work: (1) Integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property; (2) Introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency. We named our work DS-ADMM++, in which one ’+’ refers to differential privacy, and the other ’+’ refers to quantized techniques. DS-ADMM++ is the first to perform efficient and private matrix factorization under the scenarios of differential privacy and DS-ADMM. We conducted experiments with benchmark data sets to demonstrate that our approach provides differential privacy and excellent scalability with a decent loss of accuracy.

13 citations