scispace - formally typeset
Search or ask a question

Showing papers on "Information privacy published in 2019"


Journal ArticleDOI
TL;DR: This work introduces a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federatedLearning, and federated transfer learning, and provides a comprehensive survey of existing works on this subject.
Abstract: Today’s artificial intelligence still faces two major challenges. One is that, in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated-learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federated learning, and federated transfer learning. We provide definitions, architectures, and applications for the federated-learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allowing knowledge to be shared without compromising user privacy.

2,593 citations


Posted Content
TL;DR: This work proposes building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.
Abstract: Today's AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.

1,317 citations


Proceedings ArticleDOI
19 May 2019
TL;DR: The reasons why deep learning models may leak information about their training data are investigated and new algorithms tailored to the white-box setting are designed by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks.
Abstract: Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.

783 citations


Posted Content
TL;DR: In a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved, this raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale.
Abstract: In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications. Traditional cloudbased Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL

701 citations


Proceedings ArticleDOI
01 Apr 2019
TL;DR: This paper gives the first attempt to explore user-level privacy leakage against the federated learning by the attack from a malicious server with a framework incorporating GAN with a multi-task discriminator, which simultaneously discriminates category, reality, and client identity of input samples.
Abstract: Federated learning, i.e., a mobile edge computing framework for deep learning, is a recent advance in privacy-preserving machine learning, where the model is trained in a decentralized manner by the clients, i.e., data curators, preventing the server from directly accessing those private data from the clients. This learning mechanism significantly challenges the attack from the server side. Although the state-of-the-art attacking techniques that incorporated the advance of Generative adversarial networks (GANs) could construct class representatives of the global data distribution among all clients, it is still challenging to distinguishably attack a specific client (i.e., user-level privacy leakage), which is a stronger privacy threat to precisely recover the private data from a specific client. This paper gives the first attempt to explore user-level privacy leakage against the federated learning by the attack from a malicious server. We propose a framework incorporating GAN with a multi-task discriminator, which simultaneously discriminates category, reality, and client identity of input samples. The novel discrimination on client identity enables the generator to recover user specified private data. Unlike existing works that tend to interfere the training process of the federated learning, the proposed method works “invisibly” on the server side. The experimental results demonstrate the effectiveness of the proposed attacking approach and the superior to the state-of-the-art.

574 citations


Journal ArticleDOI
TL;DR: The benefits and challenges of big data and machine learning in health care are discussed, which include flexibility and scalability compared with traditional biostatistical methods, which makes it deployable for many tasks, such as risk stratification, diagnosis and classification, and survival predictions.
Abstract: Analysis of big data by machine learning offers considerable advantages for assimilation and evaluation of large amounts of complex health-care data. However, to effectively use machine learning tools in health care, several limitations must be addressed and key issues considered, such as its clinical implementation and ethics in health-care delivery. Advantages of machine learning include flexibility and scalability compared with traditional biostatistical methods, which makes it deployable for many tasks, such as risk stratification, diagnosis and classification, and survival predictions. Another advantage of machine learning algorithms is the ability to analyse diverse data types (eg, demographic data, laboratory findings, imaging data, and doctors' free-text notes) and incorporate them into predictions for disease risk, diagnosis, prognosis, and appropriate treatments. Despite these advantages, the application of machine learning in health-care delivery also presents unique challenges that require data pre-processing, model training, and refinement of the system with respect to the actual clinical problem. Also crucial are ethical considerations, which include medico-legal implications, doctors' understanding of machine learning tools, and data privacy and security. In this Review, we discuss some of the benefits and challenges of big data and machine learning in health care.

569 citations


Proceedings ArticleDOI
19 May 2019
TL;DR: PixelDP as discussed by the authors is based on a connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.
Abstract: Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired privacy formalism, that provides a rigorous, generic, and flexible foundation for defense.

478 citations


Journal ArticleDOI
TL;DR: The proposed approach mainly addresses energy trading users’ privacy in smart grid and screens the distribution of energy sale of sellers deriving from the fact that various energy trading volumes can be mined to detect its relationships with other information, such as physical location and energy usage.
Abstract: Implementing blockchain techniques has enabled secure smart trading in many realms, e.g. neighboring energy trading. However, trading information recorded on the blockchain also brings privacy concerns. Attackers can utilize data mining algorithms to obtain users’ privacy, specially, when the user group is located in nearby geographic positions. In this paper, we present a consortium blockchain-oriented approach to solve the problem of privacy leakage without restricting trading functions. The proposed approach mainly addresses energy trading users’ privacy in smart grid and screens the distribution of energy sale of sellers deriving from the fact that various energy trading volumes can be mined to detect its relationships with other information, such as physical location and energy usage. Experiment evaluations have demonstrated the effectiveness of the proposed approach.

407 citations


Journal ArticleDOI
TL;DR: A blockchain-based decentralized framework for crowdsourcing named CrowdBC is conceptualized, in which a requester's task can be solved by a crowd of workers without relying on any third trusted institution, users’ privacy can be guaranteed and only low transaction fees are required.
Abstract: Crowdsourcing systems which utilize the human intelligence to solve complex tasks have gained considerable interest and adoption in recent years. However, the majority of existing crowdsourcing systems rely on central servers, which are subject to the weaknesses of traditional trust-based model, such as single point of failure. They are also vulnerable to distributed denial of service (DDoS) and Sybil attacks due to malicious users involvement. In addition, high service fees from the crowdsourcing platform may hinder the development of crowdsourcing. How to address these potential issues has both research and substantial value. In this paper, we conceptualize a blockchain-based decentralized framework for crowdsourcing named CrowdBC, in which a requester's task can be solved by a crowd of workers without relying on any third trusted institution, users’ privacy can be guaranteed and only low transaction fees are required. In particular, we introduce the architecture of our proposed framework, based on which we give a concrete scheme. We further implement a software prototype on Ethereum public test network with real-world dataset. Experiment results show the feasibility, usability, and scalability of our proposed crowdsourcing system.

387 citations


Journal ArticleDOI
TL;DR: A novel EHRs sharing framework that combines blockchain and the decentralized interplanetary file system (IPFS) on a mobile cloud platform is proposed that provides an effective solution for reliable data exchanges on mobile clouds while preserving sensitive health information against potential threats.
Abstract: Recent years have witnessed a paradigm shift in the storage of Electronic Health Records (EHRs) on mobile cloud environments, where mobile devices are integrated with cloud computing to facilitate medical data exchanges among patients and healthcare providers. This advanced model enables healthcare services with low operational cost, high flexibility, and EHRs availability. However, this new paradigm also raises concerns about data privacy and network security for e-health systems. How to reliably share EHRs among mobile users while guaranteeing high-security levels in the mobile cloud is a challenging issue. In this paper, we propose a novel EHRs sharing framework that combines blockchain and the decentralized interplanetary file system (IPFS) on a mobile cloud platform. Particularly, we design a trustworthy access control mechanism using smart contracts to achieve secure EHRs sharing among different patients and medical providers. We present a prototype implementation using Ethereum blockchain in a real data sharing scenario on a mobile app with Amazon cloud computing. The empirical results show that our proposal provides an effective solution for reliable data exchanges on mobile clouds while preserving sensitive health information against potential threats. The system evaluation and security analysis also demonstrate the performance improvements in lightweight access control design, minimum network latency with high security and data privacy levels, compared to the existing data sharing models.

306 citations


Posted Content
TL;DR: A comprehensive review of federated learning systems can be found in this paper, where the authors provide a thorough categorization of the existing systems according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation.
Abstract: Federated learning has been a hot research topic in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning algorithms. Similar to deep learning systems such as PyTorch and TensorFlow that boost the development of deep learning, federated learning systems (FLSs) are equivalently important, and face challenges from various aspects such as effectiveness, efficiency, and privacy. In this survey, we conduct a comprehensive review on federated learning systems. To achieve smooth flow and guide future research, we introduce the definition of federated learning systems and analyze the system components. Moreover, we provide a thorough categorization for federated learning systems according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization can help the design of federated learning systems as shown in our case studies. By systematically summarizing the existing federated learning systems, we present the design factors, case studies, and future research opportunities.

Journal ArticleDOI
TL;DR: This paper designs secure building blocks, such as secure polynomial multiplication and secure comparison, by employing a homomorphic cryptosystem, Paillier, and constructs a secure SVM training algorithm, which requires only two interactions in a single iteration, with no need for a trusted third-party.
Abstract: Machine learning (ML) techniques have been widely used in many smart city sectors, where a huge amount of data is gathered from various (IoT) devices. As a typical ML model, support vector machine (SVM) enables efficient data classification and thereby finds its applications in real-world scenarios, such as disease diagnosis and anomaly detection. Training an SVM classifier usually requires a collection of labeled IoT data from multiple entities, raising great concerns about data privacy. Most of the existing solutions rely on an implicit assumption that the training data can be reliably collected from multiple data providers, which is often not the case in reality. To bridge the gap between ideal assumptions and realistic constraints, in this paper, we propose secureSVM , which is a privacy-preserving SVM training scheme over blockchain-based encrypted IoT data. We utilize the blockchain techniques to build a secure and reliable data sharing platform among multiple data providers, where IoT data is encrypted and then recorded on a distributed ledger. We design secure building blocks, such as secure polynomial multiplication and secure comparison, by employing a homomorphic cryptosystem, Paillier, and construct a secure SVM training algorithm, which requires only two interactions in a single iteration, with no need for a trusted third-party. Rigorous security analysis prove that the proposed scheme ensures the confidentiality of the sensitive data for each data provider as well as the SVM model parameters for data analysts. Extensive experiments demonstrates the efficiency of the proposed scheme.

Posted Content
TL;DR: FedPer, a base + personalization layer approach for federated training of deep feedforward neural networks, which can combat the ill-effects of statistical heterogeneity is proposed.
Abstract: The emerging paradigm of federated learning strives to enable collaborative training of machine learning models on the network edge without centrally aggregating raw data and hence, improving data privacy. This sharply deviates from traditional machine learning and necessitates the design of algorithms robust to various sources of heterogeneity. Specifically, statistical heterogeneity of data across user devices can severely degrade the performance of standard federated averaging for traditional machine learning applications like personalization with deep learning. This paper pro-posesFedPer, a base + personalization layer approach for federated training of deep feedforward neural networks, which can combat the ill-effects of statistical heterogeneity. We demonstrate effectiveness ofFedPerfor non-identical data partitions ofCIFARdatasetsand on a personalized image aesthetics dataset from Flickr.

Journal ArticleDOI
TL;DR: In this paper, a survey of blockchain-based approaches for several security services including authentication, confidentiality, privacy and access control list, data and resource provenance, and integrity assurance is presented.
Abstract: This paper surveys blockchain-based approaches for several security services. These services include authentication, confidentiality, privacy and access control list, data and resource provenance, and integrity assurance. All these services are critical for the current distributed applications, especially due to the large amount of data being processed over the networks and the use of cloud computing. Authentication ensures that the user is who he/she claims to be. Confidentiality guarantees that data cannot be read by unauthorized users. Privacy provides the users the ability to control who can access their data. Provenance allows an efficient tracking of the data and resources along with their ownership and utilization over the network. Integrity helps in verifying that the data has not been modified or altered. These services are currently managed by centralized controllers, for example, a certificate authority. Therefore, the services are prone to attacks on the centralized controller. On the other hand, blockchain is a secured and distributed ledger that can help resolve many of the problems with centralization. The objectives of this paper are to give insights on the use of security services for current applications, to highlight the state of the art techniques that are currently used to provide these services, to describe their challenges, and to discuss how the blockchain technology can resolve these challenges. Further, several blockchain-based approaches providing such security services are compared thoroughly. Challenges associated with using blockchain-based security services are also discussed to spur further research in this area.

Proceedings ArticleDOI
07 Jul 2019
TL;DR: A novel framework for trading range counting results is proposed and a pricing approach is proposed for the traded results, which is proved to be immune against arbitrage attacks and to achieve unbiasedness, bounded variance, and strengthened privacy guarantee under differential privacy.
Abstract: Data privacy arises as one of the most important concerns, facing the pervasive commoditization of big data statistic analysis in Internet of Things (IoT). Current solutions are incapable to thoroughly solve the privacy issues on data pricing and guarantee the utility of statistic outputs. Therefore, this paper studies the problem of trading private statistic results for IoT data, by considering three factors. Specifically, a novel framework for trading range counting results is proposed. The framework applies a sampling-based method to generate approximated counting results, which are further perturbed for privacy concerns and then released. The results are theoretically proved to achieve unbiasedness, bounded variance, and strengthened privacy guarantee under differential privacy. Moreover, a pricing approach is proposed for the traded results, which is proved to be immune against arbitrage attacks. The framework is evaluated by estimating the air pollution levels with different ranges on 2014 CityPulse Smart City datasets. The analysis and evaluation results demonstrate that our framework greatly reduces the error of range counting approximation; and the optimal perturbation approach enables that the private counting satisfies the specified approximation degree while providing strong privacy guarantee.

Journal ArticleDOI
11 Apr 2019
TL;DR: In this paper, the authors provide an introduction to the intersection of both fields with special emphasis on the techniques used to protect the data and provide a knowledge gap between the ML and privacy communities must be bridged.
Abstract: For privacy concerns to be addressed adequately in today's machine-learning (ML) systems, the knowledge gap between the ML and privacy communities must be bridged. This article aims to provide an introduction to the intersection of both fields with special emphasis on the techniques used to protect the data.

Journal ArticleDOI
TL;DR: By introducing Healthchain, both IoT data and doctor diagnosis cannot be deleted or tampered with so as to avoid medical disputes, and security analysis and experimental results show that the proposed Healthchain is applicable for smart healthcare system.
Abstract: With the dramatically increasing deployment of the Internet of Things (IoT), remote monitoring of health data to achieve intelligent healthcare has received great attention recently. However, due to the limited computing power and storage capacity of IoT devices, users’ health data are generally stored in a centralized third party, such as the hospital database or cloud, and make users lose control of their health data, which can easily result in privacy leakage and single-point bottleneck. In this paper, we propose Healthchain, a large-scale health data privacy preserving scheme based on blockchain technology, where health data are encrypted to conduct fine-grained access control. Specifically, users can effectively revoke or add authorized doctors by leveraging user transactions for key management. Furthermore, by introducing Healthchain, both IoT data and doctor diagnosis cannot be deleted or tampered with so as to avoid medical disputes. Security analysis and experimental results show that the proposed Healthchain is applicable for smart healthcare system.

Book ChapterDOI
13 Oct 2019
TL;DR: In this paper, the authors investigate the feasibility of applying differential privacy techniques to protect the patient data in a federated learning setup and show that there is a trade-off between model performance and privacy protection costs.
Abstract: Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake. This poses challenges for training machine learning algorithms, such as deep convolutional networks, which often require large numbers of diverse training examples. Federated learning sidesteps this difficulty by bringing code to the patient data owners and only sharing intermediate model training updates among them. Although a high-accuracy model could be achieved by appropriately aggregating these model updates, the model shared could indirectly leak the local training examples. In this paper, we investigate the feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup. We implement and evaluate practical federated learning systems for brain tumour segmentation on the BraTS dataset. The experimental results show that there is a trade-off between model performance and privacy protection costs.

Journal ArticleDOI
TL;DR: The survey covers privacy techniques in public and permissionless blockchains, e.g. Bitcoin and Ethereum, as well as privacy-preserving research proposals and solutions in permissioned and private blockchains.
Abstract: Blockchains offer a decentralized, immutable and verifiable ledger that can record transactions of digital assets, provoking a radical change in several innovative scenarios, such as smart cities, eHealth or eGovernment. However, blockchains are subject to different scalability, security and potential privacy issues, such as transaction linkability, crypto-keys management (e.g. recovery), on-chain data privacy, or compliance with privacy regulations (e.g. GDPR). To deal with these challenges, novel privacy-preserving solutions for blockchain based on crypto-privacy techniques are emerging to empower users with mechanisms to become anonymous and take control of their personal data during their digital transactions of any kind in the ledger, following a Self-Sovereign Identity (SSI) model. In this sense, this paper performs a systematic review of the current state of the art on privacy-preserving research solutions and mechanisms in blockchain, as well as the main associated privacy challenges in this promising and disrupting technology. The survey covers privacy techniques in public and permissionless blockchains, e.g. Bitcoin and Ethereum, as well as privacy-preserving research proposals and solutions in permissioned and private blockchains. Diverse blockchain scenarios are analyzed, encompassing, eGovernment, eHealth, cryptocurrencies, Smart cities, and Cooperative ITS.

Journal ArticleDOI
TL;DR: A practical privacy-preserving data aggregation scheme is proposed without TTP, in which the users with some extent trust construct a virtual aggregation area to mask the single user's data, and meanwhile, the aggregation result almost has no effect for the data utility in large scale applications.
Abstract: The real-time electricity consumption data can be used in value-added service such as big data analysis, meanwhile the single user's privacy needs to be protected. How to balance the data utility and the privacy preservation is a vital issue, where the privacy-preserving data aggregation could be a feasible solution. Most of the existing data aggregation schemes rely on a trusted third party (TTP). However, this assumption will have negative impact on reliability, because the system can be easily knocked down by the denial of service attack. In this paper, a practical privacy-preserving data aggregation scheme is proposed without TTP, in which the users with some extent trust construct a virtual aggregation area to mask the single user's data, and meanwhile, the aggregation result almost has no effect for the data utility in large scale applications. The computation cost and communication overhead are reduced in order to promote the practicability. Moreover, the security analysis and the performance evaluation show that the proposed scheme is robust and efficient.

Posted Content
TL;DR: The privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition and a Berry–Esseen style version of the central limit theorem is proved, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms.
Abstract: Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation, which we term `$f$-differential privacy' ($f$-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, $f$-DP preserves the hypothesis testing interpretation. In addition, $f$-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for original DP to $f$-DP and, as an application, obtain a simple subsampling theorem for $f$-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the $f$-DP class that is referred to as `Gaussian differential privacy' (GDP), defined based on testing two shifted Gaussians. GDP is focal among the $f$-DP class because of a central limit theorem we prove. More precisely, the privacy guarantees of \emph{any} hypothesis testing based definition of privacy (including original DP) converges to GDP in the limit under composition. The CLT also yields a computationally inexpensive tool for analyzing the exact composition of private algorithms. Taken together, this collection of attractive properties render $f$-DP a mathematically coherent, analytically tractable, and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved privacy analysis of noisy stochastic gradient descent.

Proceedings Article
11 Apr 2019
TL;DR: LCC is proved the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, and speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to up to $13.43\times, and achieves a speedup over the state-of-the-art straggler mitigation strategies.
Abstract: We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to $13.43\times$, and also achieves a $2.36\times$-$12.65\times$ speedup over the state-of-the-art straggler mitigation strategies.

Journal ArticleDOI
TL;DR: A novel framework for privacy-preserved traffic sharing among taxi companies is proposed, which jointly considers the privacy, profits, and fairness for participants.
Abstract: Due to the prominent development of public transportation systems, the taxi flows could nowadays work as a reasonable reference to the trend of urban population. Being aware of this knowledge will significantly benefit regular individuals, city planners, and the taxi companies themselves. However, to mindlessly publish such contents will severely threaten the private information of taxi companies. Both their own market ratios and the sensitive information of passengers and drivers will be revealed. Consequently, we propose in this paper a novel framework for privacy-preserved traffic sharing among taxi companies, which jointly considers the privacy, profits, and fairness for participants. The framework allows companies to share scales of their taxi flows, and common knowledge will be derived from these statistics. Two algorithms are proposed for the derivation of sharing schemes in different scenarios, depending on whether the common knowledge can be accessed by third parties like individuals and governments. The differential privacy is utilized in both cases to preserve the sensitive information for taxi companies. Finally, both algorithms are validated on real-world data traces under multiple market distributions.

Journal ArticleDOI
TL;DR: A sanitizer is used to sanitize the data blocks corresponding to the sensitive information of the file and transforms these data blocks’ signatures into valid ones for the sanitized file, which makes the file stored in the cloud able to be shared and used by others on the condition that thesensitive information is hidden, while the remote data integrity auditing is still able to been efficiently executed.
Abstract: With cloud storage services, users can remotely store their data to the cloud and realize the data sharing with others. Remote data integrity auditing is proposed to guarantee the integrity of the data stored in the cloud. In some common cloud storage systems such as the electronic health records system, the cloud file might contain some sensitive information. The sensitive information should not be exposed to others when the cloud file is shared. Encrypting the whole shared file can realize the sensitive information hiding, but will make this shared file unable to be used by others. How to realize data sharing with sensitive information hiding in remote data integrity auditing still has not been explored up to now. In order to address this problem, we propose a remote data integrity auditing scheme that realizes data sharing with sensitive information hiding in this paper. In this scheme, a sanitizer is used to sanitize the data blocks corresponding to the sensitive information of the file and transforms these data blocks’ signatures into valid ones for the sanitized file. These signatures are used to verify the integrity of the sanitized file in the phase of integrity auditing. As a result, our scheme makes the file stored in the cloud able to be shared and used by others on the condition that the sensitive information is hidden, while the remote data integrity auditing is still able to be efficiently executed. Meanwhile, the proposed scheme is based on identity-based cryptography, which simplifies the complicated certificate management. The security analysis and the performance evaluation show that the proposed scheme is secure and efficient.

Journal ArticleDOI
TL;DR: This work proposes APPA: a device-oriented Anonymous Privacy-Preserving scheme with Authentication for data aggregation applications in fog-enhanced IoT systems, which also supports multi-authority to manage smart devices and fog nodes locally.

Journal ArticleDOI
TL;DR: The properties of IoTData, a number of IoT data fusion requirements including the ones about security and privacy, classify the IoT applications into several domains and a thorough review on the state-of-the-art of data fusion in main IoT application domains are investigated.

Posted Content
TL;DR: The feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup for brain tumour segmentation on the BraTS dataset is investigated and there is a trade-off between model performance and privacy protection costs.
Abstract: Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake. This poses challenges for training machine learning algorithms, such as deep convolutional networks, which often require large numbers of diverse training examples. Federated learning sidesteps this difficulty by bringing code to the patient data owners and only sharing intermediate model training updates among them. Although a high-accuracy model could be achieved by appropriately aggregating these model updates, the model shared could indirectly leak the local training examples. In this paper, we investigate the feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup. We implement and evaluate practical federated learning systems for brain tumour segmentation on the BraTS dataset. The experimental results show that there is a trade-off between model performance and privacy protection costs.

Proceedings ArticleDOI
09 Dec 2019
TL;DR: A new set of attacks to compromise the inference data privacy in collaborative deep learning systems, where one malicious participant can accurately recover an arbitrary input fed into this system, even if he has no access to other participants' data or computations.
Abstract: The prevalence of deep learning has drawn attention to the privacy protection of sensitive data. Various privacy threats have been presented, where an adversary can steal model owners' private data. Meanwhile, countermeasures have also been introduced to achieve privacy-preserving deep learning. However, most studies only focused on data privacy during training, and ignored privacy during inference. In this paper, we devise a new set of attacks to compromise the inference data privacy in collaborative deep learning systems. Specifically, when a deep neural network and the corresponding inference task are split and distributed to different participants, one malicious participant can accurately recover an arbitrary input fed into this system, even if he has no access to other participants' data or computations, or to prediction APIs to query this system. We evaluate our attacks under different settings, models and datasets, to show their effectiveness and generalization. We also study the characteristics of deep learning models that make them susceptible to such inference privacy threats. This provides insights and guidelines to develop more privacy-preserving collaborative systems and algorithms.

Proceedings ArticleDOI
19 May 2019
TL;DR: These findings lend tempered support for the generalizability of prior crowdsourced security and privacy user studies; provide context to more accurately interpret the results of such studies; and suggest rich directions for future work to mitigate experience- rather than demographic-related sample biases.
Abstract: Security and privacy researchers often rely on data collected from Amazon Mechanical Turk (MTurk) to evaluate security tools, to understand users' privacy preferences and to measure online behavior. Yet, little is known about how well Turkers' survey responses and performance on security- and privacy-related tasks generalizes to a broader population. This paper takes a first step toward understanding the generalizability of security and privacy user studies by comparing users' self-reports of their security and privacy knowledge, past experiences, advice sources, and behavior across samples collected using MTurk (n=480), a census-representative web-panel (n=428), and a probabilistic telephone sample (n=3,000) statistically weighted to be accurate within 2.7% of the true prevalence in the U.S. Surprisingly, the results suggest that: (1) MTurk responses regarding security and privacy experiences, advice sources, and knowledge are more representative of the U.S. population than are responses from the census-representative panel; (2) MTurk and general population reports of security and privacy experiences, knowledge, and advice sources are quite similar for respondents who are younger than 50 or who have some college education; and (3) respondents' answers to the survey questions we ask are stable over time and robust to relevant, broadly-reported news events. Further, differences in responses cannot be ameliorated with simple demographic weighting, possibly because MTurk and panel participants have more internet experience compared to their demographic peers. Together, these findings lend tempered support for the generalizability of prior crowdsourced security and privacy user studies; provide context to more accurately interpret the results of such studies; and suggest rich directions for future work to mitigate experience- rather than demographic-related sample biases.

Journal ArticleDOI
TL;DR: The research challenges and directions concerning cyber security to build a comprehensive security model for EHR are highlighted and some crucial issues and the ample opportunities for advanced research related to security and privacy of EHRs are discussed.
Abstract: A systematic and comprehensive review of security and privacy-preserving challenges in e-health solutions indicates various privacy preserving approaches to ensure privacy and security of electronic health records (EHRs) in the cloud. This paper highlights the research challenges and directions concerning cyber security to build a comprehensive security model for EHR. We carry an intensive study in the IEEE, Science Direct, Google Scholar, PubMed, and ACM for papers on EHR approach published between 2000 and 2018 and summarized them in terms of the architecture types as well as evaluation strategies. We surveyed, investigated, and reviewed various aspects of several articles and identified the following tasks: 1) EHR security and privacy; 2) security and privacy requirements of e-health data in the cloud; 3) EHR cloud architecture, and; 4) diverse EHR cryptographic and non-cryptographic approaches. We also discuss some crucial issues and the ample opportunities for advanced research related to security and privacy of EHRs. Since big data provide a great mine of information and knowledge in e-Health applications, serious privacy and security challenges that require immediate attention exist. Studies must focus on efficient comprehensive security mechanisms for EHR and also explore techniques to maintain the integrity and confidentiality of patients’ information.