scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Privacy-Preserving Distributed Multi-Task Learning against Inference Attack in Cloud Computing

TL;DR: In this paper, a machine learning as a service (MLaaS) has recently been valued by the organizations for machine learning training over SaaS over a period of time.
Abstract: Because of the powerful computing and storage capability in cloud computing, machine learning as a service (MLaaS) has recently been valued by the organizations for machine learning training over s...
Citations
More filters
Proceedings ArticleDOI
01 Aug 2022
TL;DR: This paper discusses the promising role of ReL for DCCS in terms of different aspects, including device condition monitoring, predictions, management of the systems, etc, and provides a list of Re L algorithms and their pitfalls which helps D CCS by considering various constraints.
Abstract: The distributed computing continuum systems (DCCS) and representation learning (ReL) are two diverse computer science technologies with their use cases, applications, and benefits. The DCCS helps increase flexibility with improved performance of hybrid IoT-Edge-Cloud infrastructures. In contrast, representation learning extracts the features (meaningful information) and underlying explanatory factors from the given datasets. With these benefits, using ReL for DCCS to improve its performance by monitoring the devices will increase the utilization efficiency, zero downtime, etc. In this context, this paper discusses the promising role of ReL for DCCS in terms of different aspects, including device condition monitoring, predictions, management of the systems, etc. This paper also provides a list of ReL algorithms and their pitfalls which helps DCCS by considering various constraints. In addition, this paper list different challenges imposed on ReL to analyze DCCS data. It also provides future research directions to make the systems autonomous, performing multiple tasks simultaneously with the help of other AI/ML approaches.

7 citations

Journal ArticleDOI
TL;DR: DisBezant as mentioned in this paper proposes a credibility-based mechanism to resist the Byzantine attack in non-iid (not independent and identically distributed) dataset which is usually gathered from heterogeneous ships.
Abstract: With the intelligentization of Maritime Transportation System (MTS), Internet of Thing (IoT) and machine learning technologies have been widely used to achieve the intelligent control and routing planning for ships. As an important branch of machine learning, federated learning is the first choice to train an accurate joint model without sharing ships' data directly. However, there are still many unsolved challenges while using federated learning in IoT-enabled MTS, such as the privacy preservation and Byzantine attacks. To surmount the above challenges, a novel mechanism, namely DisBezant, is designed to achieve the secure and Byzantine-robust federated learning in IoT-enabled MTS. Specifically, a credibility-based mechanism is proposed to resist the Byzantine attack in non-iid (not independent and identically distributed) dataset which is usually gathered from heterogeneous ships. The credibility is introduced to measure the trustworthiness of uploaded knowledge from ships and is updated based on their shared information in each epoch. Then, we design an efficient privacy-preserving gradient aggregation protocol based on a secure two-party calculation protocol. With the help of a central server, we can accurately recognise the Byzantine attackers and update the global model parameters privately. Furthermore, we theoretically discussed the privacy preservation and efficiency of DisBezant. To verify the effectiveness of our DisBezant, we evaluate it over three real datasets and the results demonstrate that DisBezant can efficiently and effectively achieve the Byzantine-robust federated learning. Although there are 40% nodes are Byzantine attackers in participants, our DisBezant can still recognise them and ensure the accurate model training.

6 citations

Journal ArticleDOI
01 Feb 2023
TL;DR: DisBezant as mentioned in this paper proposes a credibility-based mechanism to resist the Byzantine attack in non-iid (not independent and identically distributed) dataset which is usually gathered from heterogeneous ships.
Abstract: With the intelligentization of Maritime Transportation System (MTS), Internet of Thing (IoT) and machine learning technologies have been widely used to achieve the intelligent control and routing planning for ships. As an important branch of machine learning, federated learning is the first choice to train an accurate joint model without sharing ships’ data directly. However, there are still many unsolved challenges while using federated learning in IoT-enabled MTS, such as the privacy preservation and Byzantine attacks. To surmount the above challenges, a novel mechanism, namely DisBezant, is designed to achieve the secure and Byzantine-robust federated learning in IoT-enabled MTS. Specifically, a credibility-based mechanism is proposed to resist the Byzantine attack in non-iid (not independent and identically distributed) dataset which is usually gathered from heterogeneous ships. The credibility is introduced to measure the trustworthiness of uploaded knowledge from ships and is updated based on their shared information in each epoch. Then, we design an efficient privacy-preserving gradient aggregation protocol based on a secure two-party calculation protocol. With the help of a central server, we can accurately recognise the Byzantine attackers and update the global model parameters privately. Furthermore, we theoretically discussed the privacy preservation and efficiency of DisBezant. To verify the effectiveness of our DisBezant, we evaluate it over three real datasets and the results demonstrate that DisBezant can efficiently and effectively achieve the Byzantine-robust federated learning. Although there are 40% nodes are Byzantine attackers in participants, our DisBezant can still recognise them and ensure the accurate model training.

5 citations

Proceedings ArticleDOI
01 Aug 2022
TL;DR: In this paper , the authors discuss the promising role of ReL for DCCS in terms of different aspects, including device condition monitoring, predictions, management of the systems, etc.
Abstract: The distributed computing continuum systems (DCCS) and representation learning (ReL) are two diverse computer science technologies with their use cases, applications, and benefits. The DCCS helps increase flexibility with improved performance of hybrid IoT-Edge-Cloud infrastructures. In contrast, representation learning extracts the features (meaningful information) and underlying explanatory factors from the given datasets. With these benefits, using ReL for DCCS to improve its performance by monitoring the devices will increase the utilization efficiency, zero downtime, etc. In this context, this paper discusses the promising role of ReL for DCCS in terms of different aspects, including device condition monitoring, predictions, management of the systems, etc. This paper also provides a list of ReL algorithms and their pitfalls which helps DCCS by considering various constraints. In addition, this paper list different challenges imposed on ReL to analyze DCCS data. It also provides future research directions to make the systems autonomous, performing multiple tasks simultaneously with the help of other AI/ML approaches.

4 citations

Journal ArticleDOI
TL;DR: In this paper , a general governance and sustainable architecture for distributed computing continuum systems (DCCS) is proposed, which reflects the human body's self-healing model, and the proposed model has three stages: first, it analyzes system data to acquire knowledge; second it can leverage the knowledge to monitor and predict future conditions; and third it takes further actions to autonomously solve any issue or to alert administrators.
Abstract: Abstract Distributed computing continuum systems (DCCS) make use of a vast number of computing devices to process data generated by edge devices such as the Internet of Things and sensor nodes. Besides performing computations, these devices also produce data including, for example, event logs, configuration files, network management information. When these data are analyzed, we can learn more about the devices, such as their capabilities, processing efficiency, resource usage, and failure prediction. However, these data are available in different forms and have different attributes due to the highly heterogeneous nature of DCCS. The diversity of data poses various challenges which we discuss by relating them to big data, so that we can utilize the advantages of big data analytical tools. We enumerate several existing tools that can perform the monitoring task and also summarize their characteristics. Further, we provide a general governance and sustainable architecture for DCCS, which reflects the human body’s self-healing model. The proposed model has three stages: first, it analyzes system data to acquire knowledge; second, it can leverage the knowledge to monitor and predict future conditions; and third, it takes further actions to autonomously solve any issue or to alert administrators. Thus, the DCCS model is designed to minimize the system’s downtime while optimizing resource usage. A small set of data is used to illustrate the monitoring and prediction of the performance of a system through Bayesian network structure learning. Finally, we discuss the limitations of the governance and sustainability model, and we provide possible solutions to overcome them and make the system more efficient.

3 citations

References
More filters
Proceedings ArticleDOI
22 May 2017
TL;DR: This work quantitatively investigates how machine learning models leak information about the individual data records on which they were trained and empirically evaluates the inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon.
Abstract: We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

2,059 citations

Proceedings ArticleDOI
12 Oct 2015
TL;DR: This paper presents a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets, and exploits the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously.
Abstract: Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training Massive data collection required for deep learning presents obvious privacy issues Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it Users can neither delete it, nor restrict the purposes for which it is used Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance Many data owners--for example, medical institutions that may want to apply deep learning methods to clinical records--are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets

1,836 citations

Proceedings ArticleDOI
19 May 2019
TL;DR: In this article, passive and active inference attacks are proposed to exploit the leakage of information about participants' training data in federated learning, where each participant can infer the presence of exact data points and properties that hold only for a subset of the training data and are independent of the properties of the joint model.
Abstract: Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

1,084 citations

Journal ArticleDOI
TL;DR: This work proposes a new method, objective perturbation, for privacy-preserving machine learning algorithm design, and shows that both theoretically and empirically, this method is superior to the previous state-of-the-art, output perturbations, in managing the inherent tradeoff between privacy and learning performance.
Abstract: Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the e-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.

1,057 citations

Journal ArticleDOI
TL;DR: This paper presented a privacy-preserving deep learning system in which many learning participants perform neural network-based deep learning over a combined dataset of all, without revealing the participant's identity.
Abstract: We present a privacy-preserving deep learning system in which many learning participants perform neural network-based deep learning over a combined dataset of all, without revealing the participant...

766 citations