scispace - formally typeset
Search or ask a question
Author

Roei Schuster

Other affiliations: Cornell University
Bio: Roei Schuster is an academic researcher from Tel Aviv University. The author has contributed to research in topics: Computer science & Stylometry. The author has an hindex of 11, co-authored 16 publications receiving 302 citations. Previous affiliations of Roei Schuster include Cornell University.

Papers
More filters
Proceedings Article
16 Aug 2017
TL;DR: It is demonstrated that this attack can be performed even by a Web attacker who does not directly observe the stream, e.g., a JavaScript ad confined in a Web browser on a nearby machine.
Abstract: The MPEG-DASH streaming video standard contains an information leak: even if the stream is encrypted, the segmentation prescribed by the standard causes content-dependent packet bursts. We show that many video streams are uniquely characterized by their burst patterns, and classifiers based on convolutional neural networks can accurately identify these patterns given very coarse network measurements. We demonstrate that this attack can be performed even by a Web attacker who does not directly observe the stream, e.g., a JavaScript ad confined in a Web browser on a nearby machine.

131 citations

Proceedings ArticleDOI
15 Oct 2018
TL;DR: This work designs and implements a new approach to IoT access control and introduces "environmental situation oracles'' (ESOs) as first-class objects in the IoT ecosystem, which reduces inefficiency, supports consistent enforcement of common policies, and reduces overprivileging.
Abstract: Access control in the Internet of Things (IoT) often depends on a situation --- for example, "the user is at home'' --- that can only be tracked using multiple devices In contrast to the (well-studied) smartphone frameworks, enforcement of situational constraints in the IoT poses new challenges because access control is fundamentally decentralized It takes place in multiple independent frameworks, subjects are often external to the enforcement system, and situation tracking requires cross-framework interaction and permissioning Existing IoT frameworks entangle access-control enforcement and situation tracking This results in overprivileged, redundant, inconsistent, and inflexible implementations We design and implement a new approach to IoT access control Our key innovation is to introduce "environmental situation oracles'' (ESOs) as first-class objects in the IoT ecosystem An ESO encapsulates the implementation of how a situation is sensed, inferred, or actuated IoT access-control frameworks can use ESOs to enforce situational constraints, but ESOs and frameworks remain oblivious to each other's implementation details A single ESO can be used by multiple access-control frameworks across the ecosystem This reduces inefficiency, supports consistent enforcement of common policies, and --- because ESOs encapsulate sensitive device-access rights --- reduces overprivileging ESOs can be deployed at any layer of the IoT software stack where access control is applied We implemented prototype ESOs for the IoT resource layer, based on the IoTivity framework, and for the IoT Web services, based on the Passport middleware

65 citations

Posted Content
TL;DR: This work quantifies the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2.
Abstract: Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

51 citations

Journal ArticleDOI
TL;DR: Though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information, highlighting the need for non-stylometry approaches in detecting machine-generated misinformation.
Abstract: Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have pr...

45 citations

Posted Content
TL;DR: This paper showed that stylometry is limited against machine-generated misinformation, and highlighted the need for non-stylometry approaches in detecting machine generated misinformation and open up the discussion on the desired evaluation benchmarks.
Abstract: Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. While humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, employed in auto-completion and editing-assistance settings. Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks.

43 citations


Cited by
More filters
Journal Article
TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.
Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

1,429 citations

Journal ArticleDOI
19 Jun 2019
TL;DR: This paper provides a comprehensive survey on the most influential and basic attacks as well as the corresponding defense mechanisms that have edge computing specific characteristics and can be practically applied to real-world edge computing systems.
Abstract: The rapid developments of the Internet of Things (IoT) and smart mobile devices in recent years have been dramatically incentivizing the advancement of edge computing. On the one hand, edge computing has provided a great assistance for lightweight devices to accomplish complicated tasks in an efficient way; on the other hand, its hasty development leads to the neglection of security threats to a large extent in edge computing platforms and their enabled applications. In this paper, we provide a comprehensive survey on the most influential and basic attacks as well as the corresponding defense mechanisms that have edge computing specific characteristics and can be practically applied to real-world edge computing systems. More specifically, we focus on the following four types of attacks that account for 82% of the edge computing attacks recently reported by Statista: distributed denial of service attacks, side-channel attacks, malware injection attacks, and authentication and authorization attacks. We also analyze the root causes of these attacks, present the status quo and grand challenges in edge computing security, and propose future research directions.

286 citations

Proceedings ArticleDOI
15 Oct 2018
TL;DR: Deep fingerprinting (DF) as mentioned in this paper leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and evaluate this attack against WTF-PAD and Walkie-Talkie.
Abstract: Website fingerprinting enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art website fingerprinting attacks have been shown to be effective even against Tor. Recently, lightweight website fingerprinting defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we present Deep Fingerprinting (DF), a new website fingerprinting attack against Tor that leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, better than all prior attacks, and it is also the only attack that is effective against WTF-PAD with over 90% accuracy. Walkie-Talkie remains effective, holding the attack to just 49.7% accuracy. In the more realistic open-world setting, our attack remains effective, with 0.99 precision and 0.94 recall on undefended traffic. Against traffic defended with WTF-PAD in this setting, the attack still can get 0.96 precision and 0.68 recall. These findings highlight the need for effective defenses that protect against this new attack and that could be deployed in Tor.

194 citations

Posted Content
TL;DR: It is shown that it is possible to construct “weight poisoning” attacks where pre-trained weights are injected with vulnerabilities that expose “backdoors” after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword.
Abstract: Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at this https URL.

188 citations

Proceedings Article
16 Aug 2017
TL;DR: Walkie-Talkie is proposed, an effective and efficient WF defense that cannot be defeated by any website fingerprinting attack, even hypothetical advanced attacks that use site link information, page visit rates, and intercell timing.
Abstract: Website fingerprinting (WF) is a traffic analysis attack that allows an eavesdropper to determine the web activity of a client, even if the client is using privacy technologies such as proxies, VPNs, or Tor. Recent work has highlighted the threat of website fingerprinting to privacy-sensitive web users. Many previously designed defenses against website fingerprinting have been broken by newer attacks that use better classifiers. The remaining effective defenses are inefficient: they hamper user experience and burden the server with large overheads. In this work we propose Walkie-Talkie, an effective and efficient WF defense. Walkie-Talkie modifies the browser to communicate in half-duplex mode rather than the usual full-duplex mode; half-duplex mode produces easily moldable burst sequences to leak less information to the adversary, at little additional overhead. Designed for the open-world scenario, Walkie-Talkie molds burst sequences so that sensitive and non-sensitive pages look the same. Experimentally, we show that Walkie-Talkie can defeat all known WF attacks with a bandwidth overhead of 31% and a time overhead of 34%, which is far more efficient than all effective WF defenses (often exceeding 100% for both types of overhead). In fact, we show that Walkie-Talkie cannot be defeated by any website fingerprinting attack, even hypothetical advanced attacks that use site link information, page visit rates, and intercell timing.

139 citations