scispace - formally typeset
Search or ask a question
Author

Noman Mohammed

Bio: Noman Mohammed is an academic researcher from University of Manitoba. The author has contributed to research in topics: Differential privacy & Encryption. The author has an hindex of 22, co-authored 77 publications receiving 2376 citations. Previous affiliations of Noman Mohammed include Concordia University Wisconsin & McGill University.


Papers
More filters
Proceedings ArticleDOI
21 Aug 2011
TL;DR: This paper proposes the first anonymization algorithm for the non-interactive setting based on the generalization technique, which first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy.
Abstract: Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.

372 citations

Journal ArticleDOI
01 Aug 2011
TL;DR: It is demonstrated that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees, and a probabilistic top-down partitioning algorithm is proposed to generate a differentially private release, which scales linearly with the input data size.
Abstract: Set-valued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing set-valued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for set-valued data are based on partition-based privacy models, for example k-anonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary's background knowledge and computational power. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of set-valued data due to its high dimensionality.We demonstrate that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees. We propose a probabilistic top-down partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (∈, δ)-useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on real-life set-valued datasets.

238 citations

Journal ArticleDOI
TL;DR: This is the first paper to introduce local suppression to achieve a tailored privacy model for trajectory data anonymization and compared with the previous works in the literature, this proposed local suppression method can significantly improve the data utility in anonymous trajectory data.

230 citations

Proceedings ArticleDOI
01 Feb 2018
TL;DR: A CNN-based architecture to classify malware samples is proposed that achieves better than the state-of-the-art performance on two challenging malware classification datasets, Malimg and Microsoft malware.
Abstract: In this paper, we propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses a serious security threat to financial institutions, businesses and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples so that their behavior can be analyzed. Machine learning approaches are becoming popular for classifying malware, however, most of the existing machine learning methods for malware classification use shallow learning algorithms (e.g. SVM). Recently, Convolutional Neural Networks (CNN), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Motivated by this success, we propose a CNN-based architecture to classify malware samples. We convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, Malimg and Microsoft malware, demonstrate that our method achieves better than the state-of-the-art performance. The proposed method achieves 98.52% and 99.97% accuracy on the Malimg and Microsoft datasets respectively.

222 citations

01 Jan 2018
TL;DR: The experimental results show that differentially private deep models may keep their promise to provide privacy protection against strong adversaries by only offering poor model utility, while exhibit moderate vulnerability to the membership inference attack when they offer an acceptable utility.
Abstract: The unprecedented success of deep learning is largely dependent on the availability of massive amount of training data. In many cases, these data are crowd-sourced and may contain sensitive and confidential information, therefore, pose privacy concerns. As a result, privacy-preserving deep learning has been gaining increasing focus nowadays. One of the promising approaches for privacy-preserving deep learning is to employ differential privacy during model training which aims to prevent the leakage of sensitive information about the training data via the trained model. While these models are considered to be immune to privacy attacks, with the advent of recent and sophisticated attack models, it is not clear how well these models trade-off utility for privacy. In this paper, we systematically study the impact of a sophisticated machine learning based privacy attack called the membership inference attack against a state-of-the-art differentially private deep model. More specifically, given a differentially private deep model with its associated utility, we investigate how much we can infer about the model’s training data. Our experimental results show that differentially private deep models may keep their promise to provide privacy protection against strong adversaries by only offering poor model utility, while exhibit moderate vulnerability to the membership inference attack when they offer an acceptable utility. For evaluating our experiments, we use the CIFAR-10 and MNIST datasets and the corresponding classification tasks.

182 citations


Cited by
More filters
01 Jan 2002

9,314 citations

01 Jan 2006

3,012 citations

Journal ArticleDOI
TL;DR: This survey will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish P PDP from other related problems, and propose future research directions.
Abstract: The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.

1,669 citations

Journal ArticleDOI
TL;DR: An in depth review of rare event detection from an imbalanced learning perspective and a comprehensive taxonomy of the existing application domains of im balanced learning are provided.
Abstract: 527 articles related to imbalanced data and rare events are reviewed.Viewing reviewed papers from both technical and practical perspectives.Summarizing existing methods and corresponding statistics by a new taxonomy idea.Categorizing 162 application papers into 13 domains and giving introduction.Some opening questions are discussed at the end of this manuscript. Rare events, especially those that could potentially negatively impact society, often require humans decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.

1,448 citations