scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Cryptography and Security in 2016"


Posted Content
TL;DR: New transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees are introduced.
Abstract: Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output Adversarial examples that affect one model often affect another model, even if the two models have different architectures or were trained on different training sets, so long as both models were trained to perform the same task An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim Recent work has further developed a technique that uses the victim model as an oracle to label a synthetic training set for the substitute, so the attacker need not even collect a training set to mount the attack We extend these recent techniques using reservoir sampling to greatly enhance the efficiency of the training procedure for the substitute model We introduce new transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees We demonstrate our attacks on two commercial machine learning classification systems from Amazon (9619% misclassification rate) and Google (8894%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in general vulnerable to systematic black-box attacks regardless of their structure

1,481 citations


Posted Content
TL;DR: In this paper, a membership inference attack is proposed to determine if a record was in the training dataset of a black-box machine learning model using a black box access to the model.
Abstract: We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

1,030 citations


Posted Content
TL;DR: In this article, the authors investigate model extraction attacks in ML-as-a-service (ML-aaS) systems and show that an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model.
Abstract: Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service ("predictive analytics") systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.

1,023 citations


Posted Content
TL;DR: In this article, a black-box attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the targeted DNN.
Abstract: Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.

824 citations


Posted Content
TL;DR: This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.
Abstract: Advances in deep learning have led to the broad adoption of Deep Neural Networks (DNNs) to a range of important machine learning problems, e.g., guiding autonomous vehicles, speech recognition, malware detection. Yet, machine learning models, including DNNs, were shown to be vulnerable to adversarial samples-subtly (and often humanly indistinguishably) modified malicious inputs crafted to compromise the integrity of their outputs. Adversarial examples thus enable adversaries to manipulate system behaviors. Potential attacks include attempts to control the behavior of vehicles, have spam content identified as legitimate content, or have malware identified as legitimate software. Adversarial examples are known to transfer from one model to another, even if the second model has a different architecture or was trained on a different set. We introduce the first practical demonstration that this cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data. In our demonstration, we only assume that the adversary can observe outputs from the target DNN given inputs chosen by the adversary. We introduce the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model. We evaluate the approach on existing DNN datasets and real-world settings. In one experiment, we force a DNN supported by MetaMind (one of the online APIs for DNN classifiers) to mis-classify inputs at a rate of 84.24%. We conclude with experiments exploring why adversarial samples transfer between DNNs, and a discussion on the applicability of our attack when targeting machine learning algorithms distinct from DNNs.

454 citations


Posted Content
TL;DR: This paper shows how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers, and evaluates to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification.
Abstract: Deep neural networks, like many other machine learning models, have recently been shown to lack robustness against adversarially crafted inputs. These inputs are derived from regular inputs by minor yet carefully selected perturbations that deceive machine learning models into desired misclassifications. Existing work in this emerging field was largely specific to the domain of image classification, since the high-entropy of images can be conveniently manipulated without changing the images' overall visual appearance. Yet, it remains unclear how such attacks translate to more security-sensitive applications such as malware detection - which may pose significant challenges in sample generation and arguably grave consequences for failure. In this paper, we show how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers. The application domain of malware classification introduces additional constraints in the adversarial sample crafting problem when compared to the computer vision domain: (i) continuous, differentiable input domains are replaced by discrete, often binary inputs; and (ii) the loose condition of leaving visual appearance unchanged is replaced by requiring equivalent functional behavior. We demonstrate the feasibility of these attacks on many different instances of malware classifiers that we trained using the DREBIN Android malware data set. We furthermore evaluate to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification. While feature reduction did not prove to have a positive impact, distillation and re-training on adversarially crafted samples show promising results.

386 citations


Proceedings ArticleDOI
TL;DR: In this article, a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework are presented. And the authors formally explore the opposing relationship between model accuracy and resilience to adversarial manipulation.
Abstract: Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as data analytics, autonomous systems, and security diagnostics. ML is now pervasive---new systems and models are being deployed in every domain imaginable, leading to rapid and widespread deployment of software based inference and decision making. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited. We systematize recent findings on ML security and privacy, focusing on attacks identified on these systems and defenses crafted to date. We articulate a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework. Key insights resulting from works both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by formally exploring the opposing relationship between model accuracy and resilience to adversarial manipulation. Through these explorations, we show that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used.

339 citations


Posted Content
TL;DR: A new secure, private, and lightweight architecture for IoT, based on BC technology that eliminates the overhead of BC while maintaining most of its security and privacy benefits is proposed.
Abstract: The Internet of Things IoT is experiencing exponential growth in research and industry, but it still suffers from privacy and security vulnerabilities. Conventional security and privacy approaches tend to be inapplicable for IoT, mainly due to its decentralized topology and the resource-constraints of the majority of its devices. BlockChain BC that underpin the crypto-currency Bitcoin have been recently used to provide security and privacy in peer-to-peer networks with similar topologies to IoT. However, BCs are computationally expensive and involve high bandwidth overhead and delays, which are not suitable for IoT devices. This position paper proposes a new secure, private, and lightweight architecture for IoT, based on BC technology that eliminates the overhead of BC while maintaining most of its security and privacy benefits. The described method is investigated on a smart home application as a representative case study for broader IoT applications. The proposed architecture is hierarchical, and consists of smart homes, an overlay network and cloud storages coordinating data transactions with BC to provide privacy and security. Our design uses different types of BCs depending on where in the network hierarchy a transaction occurs, and uses distributed trust methods to ensure a decentralized topology. Qualitative evaluation of the architecture under common threat models highlights its effectiveness in providing security and privacy for IoT applications.

304 citations


Posted Content
TL;DR: It is shown that defensive distillation is not secure: it is no more resistant to targeted misclassification attacks than unprotected neural networks.
Abstract: We show that defensive distillation is not secure: it is no more resistant to targeted misclassification attacks than unprotected neural networks.

297 citations


Posted Content
TL;DR: This work considers training a deep neural network in the Federated Learning model, using distributed stochastic gradient descent across user-held training data on mobile devices, wherein Secure Aggregation protects each user's model gradient.
Abstract: Secure Aggregation protocols allow a collection of mutually distrust parties, each holding a private value, to collaboratively compute the sum of those values without revealing the values themselves. We consider training a deep neural network in the Federated Learning model, using distributed stochastic gradient descent across user-held training data on mobile devices, wherein Secure Aggregation protects each user's model gradient. We design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that tolerates up to 1/3 users failing to complete the protocol. For 16-bit input values, our protocol offers 1.73x communication expansion for $2^{10}$ users and $2^{20}$-dimensional vectors, and 1.98x expansion for $2^{14}$ users and $2^{24}$ dimensional vectors.

294 citations


Posted Content
TL;DR: It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
Abstract: Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

Posted Content
TL;DR: EldeRan, a machine learning approach for dynamically analysing and classifying ransomware, is presented, suggesting that dynamic analysis can support ransomware detection, since ransomware samples exhibit a set of characteristic features at run-time that are common across families, and that helps the early detection of new variants.
Abstract: Recent statistics show that in 2015 more than 140 millions new malware samples have been found. Among these, a large portion is due to ransomware, the class of malware whose specific goal is to render the victim's system unusable, in particular by encrypting important files, and then ask the user to pay a ransom to revert the damage. Several ransomware include sophisticated packing techniques, and are hence difficult to statically analyse. We present EldeRan, a machine learning approach for dynamically analysing and classifying ransomware. EldeRan monitors a set of actions performed by applications in their first phases of installation checking for characteristics signs of ransomware. Our tests over a dataset of 582 ransomware belonging to 11 families, and with 942 goodware applications, show that EldeRan achieves an area under the ROC curve of 0.995. Furthermore, EldeRan works without requiring that an entire ransomware family is available beforehand. These results suggest that dynamic analysis can support ransomware detection, since ransomware samples exhibit a set of characteristic features at run-time that are common across families, and that helps the early detection of new variants. We also outline some limitations of dynamic analysis for ransomware and propose possible solutions.

Proceedings ArticleDOI
TL;DR: This paper presents an effective approach to alleviate the problem of Android app marketplaces at risk of hosting malicious apps that could evade detection before being downloaded by unsuspecting users based on Bayesian classification models obtained from static code analysis.
Abstract: Mobile malware has been growing in scale and complexity as smartphone usage continues to rise. Android has surpassed other mobile platforms as the most popular whilst also witnessing a dramatic increase in malware targeting the platform. A worrying trend that is emerging is the increasing sophistication of Android malware to evade detection by traditional signature-based scanners. As such, Android app marketplaces remain at risk of hosting malicious apps that could evade detection before being downloaded by unsuspecting users. Hence, in this paper we present an effective approach to alleviate this problem based on Bayesian classification models obtained from static code analysis. The models are built from a collection of code and app characteristics that provide indicators of potential malicious activities. The models are evaluated with real malware samples in the wild and results of experiments are presented to demonstrate the effectiveness of the proposed approach.

Posted Content
TL;DR: A DGA classifier that leverages long short-term memory (LSTM) networks to predict DGAs and their respective families without the need for a priori feature extraction is presented.
Abstract: Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to a command and control (C&C) server. In order to block DGA C&C traffic, security organizations must first discover the algorithm by reverse engineering malware samples, then generating a list of domains for a given seed. The domains are then either preregistered or published in a DNS blacklist. This process is not only tedious, but can be readily circumvented by malware authors using a large number of seeds in algorithms with multivariate recurrence properties (e.g., banjori) or by using a dynamic list of seeds (e.g., bedep). Another technique to stop malware from using DGAs is to intercept DNS queries on a network and predict whether domains are DGA generated. Such a technique will alert network administrators to the presence of malware on their networks. In addition, if the predictor can also accurately predict the family of DGAs, then network administrators can also be alerted to the type of malware that is on their networks. This paper presents a DGA classifier that leverages long short-term memory (LSTM) networks to predict DGAs and their respective families without the need for a priori feature extraction. Results are significantly better than state-of-the-art techniques, providing 0.9993 area under the receiver operating characteristic curve for binary classification and a micro-averaged F1 score of 0.9906. In other terms, the LSTM technique can provide a 90% detection rate with a 1:10000 false positive (FP) rate---a twenty times FP improvement over comparable methods. Experiments in this paper are run on open datasets and code snippets are provided to reproduce the results.

Posted Content
TL;DR: It is demonstrated that the neural networks can learn how to perform forms of encryption and decryption, and also how to apply these operations selectively in order to meet confidentiality goals.
Abstract: We ask whether neural networks can learn to use secret keys to protect information from other neural networks Specifically, we focus on ensuring confidentiality properties in a multiagent system, and we specify those properties in terms of an adversary Thus, a system may consist of neural networks named Alice and Bob, and we aim to limit what a third neural network named Eve learns from eavesdropping on the communication between Alice and Bob We do not prescribe specific cryptographic algorithms to these neural networks; instead, we train end-to-end, adversarially We demonstrate that the neural networks can learn how to perform forms of encryption and decryption, and also how to apply these operations selectively in order to meet confidentiality goals

Journal ArticleDOI
TL;DR: In this paper, the authors re-evaluate the security of a typical image-scrambling encryption algorithm (ISEA) using the internal correlation remaining in the cipher image, and demonstrate that some advanced multimedia processing techniques can facilitate the cryptanalysis of multimedia encryption algorithms.
Abstract: Position scrambling (permutation) is widely used in multimedia encryption schemes and some international encryption standards, such as the Data Encryption Standard and the Advanced Encryption Standard. In this article, the authors re-evaluate the security of a typical image-scrambling encryption algorithm (ISEA). Using the internal correlation remaining in the cipher image, they disclose important visual information of the corresponding plain image in a ciphertext-only attack scenario. Furthermore, they found that the real scrambling domain--the position-scrambling scope of ISEA's scrambled elements--can be used to support an efficient known or chosen-plaintext attack on it. Detailed experimental results have verified these points and demonstrate that some advanced multimedia processing techniques can facilitate the cryptanalysis of multimedia encryption algorithms.

Posted Content
TL;DR: Algorand significantly enhances all applications based on a public ledger: payments, smart contracts, stock settlement, etc, but, for concreteness, it shall be described only as a money platform.
Abstract: Algorand is a truly decentralized, new, and secure way to manage a shared ledger. Unlike prior approaches based on {\em proof of work}, it requires a negligible amount of computation, and generates a transaction history that does not fork with overwhelmingly high probability. This approach cryptographically selects ---in a way that is provably immune from manipulations, unpredictable until the last minute, but ultimately universally clear--- a set of verifiers in charge of constructing a block of valid transactions. This approach applies to any way of implementing a shared ledger via a tamper-proof sequence of blocks, including traditional blockchains. This paper also presents more efficient alternatives to blockchains, which may be of independent interest. Algorand significantly enhances all applications based on a public ledger: payments, smart contracts, stock settlement, etc. But, for concreteness, we shall describe it only as a money platform.

Posted Content
TL;DR: A taxonomy and comparison of authentication protocols that are developed for the IoT in terms of network model, specific security goals, main processes, computation complexity, and communication overhead are provided.
Abstract: In this paper, we present a comprehensive survey of authentication protocols for Internet of Things (IoT). Specifically, we select and in-detail examine more than forty authentication protocols developed for or applied in the context of the IoT under four environments, including: (1) Machine to machine communications (M2M), (2) Internet of Vehicles (IoV), (3) Internet of Energy (IoE), and (4) Internet of Sensors (IoS). We start by reviewing all survey articles published in the recent years that focusing on different aspects of the IoT idea. Then, we review threat models, countermeasures, and formal security verification techniques used in authentication protocols for the IoT. In addition, we provide a taxonomy and comparison of authentication protocols for the IoT in form of tables in five terms, namely, network model, goals, main processes, computation complexity, and communication overhead. Based on the current survey, we identify open issues and suggest hints for future research.

Posted Content
TL;DR: It is shown how to train artificial neural networks to successfully identify faces and recognize objects and handwritten digits even if the images are protected using any of the above obfuscation techniques.
Abstract: We demonstrate that modern image recognition methods based on artificial neural networks can recover hidden information from images protected by various forms of obfuscation. The obfuscation techniques considered in this paper are mosaicing (also known as pixelation), blurring (as used by YouTube), and P3, a recently proposed system for privacy-preserving photo sharing that encrypts the significant JPEG coefficients to make images unrecognizable by humans. We empirically show how to train artificial neural networks to successfully identify faces and recognize objects and handwritten digits even if the images are protected using any of the above obfuscation techniques.

Book ChapterDOI
TL;DR: The Flush+Flush attack has a performance close to state-of-the-art side channels in existing cache attack scenarios, while reducing cache misses significantly below the border of detectability, in the first work discussing the stealthiness of cache attacks both from the attacker and the defender perspective.
Abstract: Research on cache attacks has shown that CPU caches leak significant information. Recent attacks either use the Flush+Reload technique on read-only shared memory or the Prime+Probe technique without shared memory, to derive encryption keys or eavesdrop on user input. Efficient countermeasures against these powerful attacks that do not cause a loss of performance are a challenge. In this paper, we use hardware performance counters as a means to detect access-based cache attacks. Indeed, existing attacks cause numerous cache references and cache misses and can subsequently be detected. We propose a new criteria that uses these events for ad-hoc detection. These findings motivate the development of a novel attack technique: the Flush+Flush attack. The Flush+Flush attack only relies on the execution time of the flush instruction, that depends on whether the data is cached or not. Like Flush+Reload, it monitors when a process loads read-only shared memory into the CPU cache. However, Flush+Flush does not have a reload step, thus causing no cache misses compared to typical Flush+Reload and Prime+Probe attacks. We show that the significantly lower impact on the hardware performance counters therefore evades detection mechanisms. The Flush+Flush attack has a performance close to state-of-the-art side channels in existing cache attack scenarios, while reducing cache misses significantly below the border of detectability. Our Flush+Flush covert channel achieves a transmission rate of 496KB/s which is 6.7 times faster than any previously published cache covert channel. To the best of our knowledge, this is the first work discussing the stealthiness of cache attacks both from the attacker and the defender perspective.

Journal ArticleDOI
TL;DR: In this paper, a new categorization system for side-channel attacks on mobile devices has been proposed, which allows to analyze sidechannel attacks systematically, and facilitates the development of novel countermeasures.
Abstract: Side-channel attacks on mobile devices have gained increasing attention since their introduction in 2007. While traditional side-channel attacks, such as power analysis attacks and electromagnetic analysis attacks, required physical presence of the attacker as well as expensive equipment, an (unprivileged) application is all it takes to exploit the leaking information on modern mobile devices. Given the vast amount of sensitive information that are stored on smartphones, the ramifications of side-channel attacks affect both the security and privacy of users and their devices. In this paper, we propose a new categorization system for side-channel attacks, which is necessary as side-channel attacks have evolved significantly since their scientific investigations during the smart card era in the 1990s. Our proposed classification system allows to analyze side-channel attacks systematically, and facilitates the development of novel countermeasures. Besides this new categorization system, the extensive survey of existing attacks and attack strategies provides valuable insights into the evolving field of side-channel attacks, especially when focusing on mobile devices. We conclude by discussing open issues and challenges in this context and outline possible future research directions.

Posted Content
TL;DR: ByzCoin this article leverages scalable collective signing to commit Bitcoin transactions irreversibly within seconds and achieves Byzantine consensus while preserving Bitcoin's open membership by dynamically forming hash power proportionate consensus groups that represent recently successful block miners.
Abstract: While showing great promise, Bitcoin requires users to wait tens of minutes for transactions to commit, and even then, offering only probabilistic guarantees. This paper introduces ByzCoin, a novel Byzantine consensus protocol that leverages scalable collective signing to commit Bitcoin transactions irreversibly within seconds. ByzCoin achieves Byzantine consensus while preserving Bitcoin's open membership by dynamically forming hash power-proportionate consensus groups that represent recently-successful block miners. ByzCoin employs communication trees to optimize transaction commitment and verification under normal operation while guaranteeing safety and liveness under Byzantine faults, up to a near-optimal tolerance of f faulty group members among 3f + 2 total. ByzCoin mitigates double spending and selfish mining attacks by producing collectively signed transaction blocks within one minute of transaction submission. Tree-structured communication further reduces this latency to less than 30 seconds. Due to these optimizations, ByzCoin achieves a throughput higher than PayPal currently handles, with a confirmation latency of 15-20 seconds.

Posted Content
TL;DR: In this survey, an extensive overview on honeypots is given, including not only honeypot software but also methodologies to analyse honeypot data.
Abstract: In this survey, we give an extensive overview on honeypots. This includes not only honeypot software but also methodologies to analyse honeypot data.

Journal ArticleDOI
TL;DR: In this article, the authors developed and analyzed proactive Machine Learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis, which demonstrated detection capabilities with high accuracy.
Abstract: Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, in this paper we develop and analyze proactive Machine Learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification based solutions for detecting unknown Android malware.

Posted Content
TL;DR: MaMaDroid is presented, an Android malware detection system that relies on app behavior that builds a behavioral model, in the form of a Markov chain, from the sequence of abstracted API calls performed by an app, and uses it to extract features and perform classification.
Abstract: The rise in popularity of the Android platform has resulted in an explosion of malware threats targeting it. As both Android malware and the operating system itself constantly evolve, it is very challenging to design robust malware mitigation techniques that can operate for long periods of time without the need for modifications or costly re-training. In this paper, we present MaMaDroid, an Android malware detection system that relies on app behavior. MaMaDroid builds a behavioral model, in the form of a Markov chain, from the sequence of abstracted API calls performed by an app, and uses it to extract features and perform classification. By abstracting calls to their packages or families, MaMaDroid maintains resilience to API changes and keeps the feature set size manageable. We evaluate its accuracy on a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it not only effectively detects malware (with up to 99% F-measure), but also that the model built by the system keeps its detection capabilities for long periods of time (on average, 86% and 75% F-measure, respectively, one and two years after training). Finally, we compare against DroidAPIMiner, a state-of-the-art system that relies on the frequency of API calls performed by apps, showing that MaMaDroid significantly outperforms it.

Posted Content
TL;DR: In this article, the authors demonstrate how a software can intentionally generate controlled electromagnetic emissions from the data bus of a USB connector, and also show that the emitted RF signals can be controlled and modulated with arbitrary binary data.
Abstract: In recent years researchers have demonstrated how attackers could use USB connectors implanted with RF transmitters to exfiltrate data from secure, and even air-gapped, computers (e.g., COTTONMOUTH in the leaked NSA ANT catalog). Such methods require a hardware modification of the USB plug or device, in which a dedicated RF transmitter is embedded. In this paper we present USBee, a software that can utilize an unmodified USB device connected to a computer as a RF transmitter. We demonstrate how a software can intentionally generate controlled electromagnetic emissions from the data bus of a USB connector. We also show that the emitted RF signals can be controlled and modulated with arbitrary binary data. We implement a prototype of USBee, and discuss its design and implementation details including signal generation and modulation. We evaluate the transmitter by building a receiver and demodulator using GNU Radio. Our evaluation shows that USBee can be used for transmitting binary data to a nearby receiver at a bandwidth of 20 to 80 BPS (bytes per second).

Journal ArticleDOI
TL;DR: FastBFT as mentioned in this paper is a fast and scalable Byzantine fault-tolerant protocol that combines hardware-based trusted execution environments (TEEs) with lightweight secret sharing primitives.
Abstract: The surging interest in blockchain technology has revitalized the search for effective Byzantine consensus schemes. In particular, the blockchain community has been looking for ways to effectively integrate traditional Byzantine fault-tolerant (BFT) protocols into a blockchain consensus layer allowing various financial institutions to securely agree on the order of transactions. However, existing BFT protocols can only scale to tens of nodes due to their $O(n^2)$ message complexity. In this paper, we propose FastBFT, a fast and scalable BFT protocol. At the heart of FastBFT is a novel message aggregation technique that combines hardware-based trusted execution environments (TEEs) with lightweight secret sharing primitives. Combining this technique with several other optimizations (i.e., optimistic execution, tree topology and failure detection), FastBFT achieves low latency and high throughput even for large scale networks. Via systematic analysis and experiments, we demonstrate that FastBFT has better scalability and performance than previous BFT protocols.

Posted Content
TL;DR: A novel ensemble method that blends multiple thresholding classifiers into a single one, making it possible to accumulate 'highly normal' sequences to remedy the issue of high false-alarm rates commonly arising in conventional methods.
Abstract: In computer security, designing a robust intrusion detection system is one of the most fundamental and important problems. In this paper, we propose a system-call language-modeling approach for designing anomaly-based host intrusion detection systems. To remedy the issue of high false-alarm rates commonly arising in conventional methods, we employ a novel ensemble method that blends multiple thresholding classifiers into a single one, making it possible to accumulate 'highly normal' sequences. The proposed system-call language model has various advantages leveraged by the fact that it can learn the semantic meaning and interactions of each system call that existing methods cannot effectively consider. Through diverse experiments on public benchmark datasets, we demonstrate the validity and effectiveness of the proposed method. Moreover, we show that our model possesses high portability, which is one of the key aspects of realizing successful intrusion detection systems.

Posted Content
TL;DR: This paper investigates adversarial input sequences for recurrent neural networks processing sequential data and shows that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent Neural networks.
Abstract: Machine learning models are frequently used to solve complex security problems, as well as to make decisions in sensitive situations like guiding autonomous vehicles or predicting financial market behaviors. Previous efforts have shown that numerous machine learning models were vulnerable to adversarial manipulations of their inputs taking the form of adversarial samples. Such inputs are crafted by adding carefully selected perturbations to legitimate inputs so as to force the machine learning model to misbehave, for instance by outputting a wrong class if the machine learning task of interest is classification. In fact, to the best of our knowledge, all previous work on adversarial samples crafting for neural network considered models used to solve classification tasks, most frequently in computer vision applications. In this paper, we contribute to the field of adversarial machine learning by investigating adversarial input sequences for recurrent neural networks processing sequential data. We show that the classes of algorithms introduced previously to craft adversarial samples misclassified by feed-forward neural networks can be adapted to recurrent neural networks. In a experiment, we show that adversaries can craft adversarial sequences misleading both categorical and sequential recurrent neural networks.

Posted Content
TL;DR: This paper carries out the first extensive formal analysis of the OAuth 2.0 standard in an expressive web model and shows that the fixed version of OAuth provides the authorization, authentication, and session integrity properties the authors specify.
Abstract: The OAuth 2.0 protocol is one of the most widely deployed authorization/single sign-on (SSO) protocols and also serves as the foundation for the new SSO standard OpenID Connect. Despite the popularity of OAuth, so far analysis efforts were mostly targeted at finding bugs in specific implementations and were based on formal models which abstract from many web features or did not provide a formal treatment at all. In this paper, we carry out the first extensive formal analysis of the OAuth 2.0 standard in an expressive web model. Our analysis aims at establishing strong authorization, authentication, and session integrity guarantees, for which we provide formal definitions. In our formal analysis, all four OAuth grant types (authorization code grant, implicit grant, resource owner password credentials grant, and the client credentials grant) are covered. They may even run simultaneously in the same and different relying parties and identity providers, where malicious relying parties, identity providers, and browsers are considered as well. Our modeling and analysis of the OAuth 2.0 standard assumes that security recommendations and best practices are followed, in order to avoid obvious and known attacks. When proving the security of OAuth in our model, we discovered four attacks which break the security of OAuth. The vulnerabilities can be exploited in practice and are present also in OpenID Connect. We propose fixes for the identified vulnerabilities, and then, for the first time, actually prove the security of OAuth in an expressive web model. In particular, we show that the fixed version of OAuth (with security recommendations and best practices in place) provides the authorization, authentication, and session integrity properties we specify.