Home
/
Authors
/
Roei Schuster

Author

Roei Schuster

Bio: Roei Schuster is an academic researcher from Tel Aviv University. The author has contributed to research in topics: Computer science & Stylometry. The author has an hindex of 11, co-authored 16 publications receiving 302 citations. Previous affiliations of Roei Schuster include Cornell University.

Topics: Computer science, Stylometry, Misinformation, Password, Semantics ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•

Beauty and the burst: remote identification of encrypted video streams

[...]

Roei Schuster¹, Vitaly Shmatikov², Eran Tromer¹•Institutions (2)

Tel Aviv University¹, Cornell University²

16 Aug 2017

TL;DR: It is demonstrated that this attack can be performed even by a Web attacker who does not directly observe the stream, e.g., a JavaScript ad confined in a Web browser on a nearby machine.

...read moreread less

Abstract: The MPEG-DASH streaming video standard contains an information leak: even if the stream is encrypted, the segmentation prescribed by the standard causes content-dependent packet bursts. We show that many video streams are uniquely characterized by their burst patterns, and classifiers based on convolutional neural networks can accurately identify these patterns given very coarse network measurements. We demonstrate that this attack can be performed even by a Web attacker who does not directly observe the stream, e.g., a JavaScript ad confined in a Web browser on a nearby machine.

...read moreread less

131 citations

Proceedings Article•DOI•

Situational Access Control in the Internet of Things

[...]

Roei Schuster¹, Vitaly Shmatikov², Eran Tromer¹•Institutions (2)

Tel Aviv University¹, Cornell University²

15 Oct 2018

TL;DR: This work designs and implements a new approach to IoT access control and introduces "environmental situation oracles'' (ESOs) as first-class objects in the IoT ecosystem, which reduces inefficiency, supports consistent enforcement of common policies, and reduces overprivileging.

...read moreread less

Abstract: Access control in the Internet of Things (IoT) often depends on a situation --- for example, "the user is at home'' --- that can only be tracked using multiple devices In contrast to the (well-studied) smartphone frameworks, enforcement of situational constraints in the IoT poses new challenges because access control is fundamentally decentralized It takes place in multiple independent frameworks, subjects are often external to the enforcement system, and situation tracking requires cross-framework interaction and permissioning Existing IoT frameworks entangle access-control enforcement and situation tracking This results in overprivileged, redundant, inconsistent, and inflexible implementations We design and implement a new approach to IoT access control Our key innovation is to introduce "environmental situation oracles'' (ESOs) as first-class objects in the IoT ecosystem An ESO encapsulates the implementation of how a situation is sensed, inferred, or actuated IoT access-control frameworks can use ESOs to enforce situational constraints, but ESOs and frameworks remain oblivious to each other's implementation details A single ESO can be used by multiple access-control frameworks across the ecosystem This reduces inefficiency, supports consistent enforcement of common policies, and --- because ESOs encapsulate sensitive device-access rights --- reduces overprivileging ESOs can be deployed at any layer of the IoT software stack where access control is applied We implemented prototype ESOs for the IoT resource layer, based on the IoTivity framework, and for the IoT Web services, based on the Passport middleware

...read moreread less

65 citations

Posted Content•

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

[...]

Roei Schuster¹, Congzheng Song¹, Eran Tromer², Vitaly Shmatikov¹•Institutions (2)

Cornell University¹, Tel Aviv University²

05 Jul 2020-arXiv: Cryptography and Security

TL;DR: This work quantifies the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2.

...read moreread less

Abstract: Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

...read moreread less

51 citations

Journal Article•DOI•

The Limitations of Stylometry for Detecting Machine-Generated Fake News

[...]

Tal Schuster¹, Roei Schuster², Darsh J Shah¹, Regina Barzilay¹•Institutions (2)

Massachusetts Institute of Technology¹, Tel Aviv University²

02 Jul 2020-Computational Linguistics

TL;DR: Though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information, highlighting the need for non-stylometry approaches in detecting machine-generated misinformation.

...read moreread less

45 citations

Posted Content•

The Limitations of Stylometry for Detecting Machine-Generated Fake News.

[...]

Tal Schuster¹, Roei Schuster², Darsh J Shah¹, Regina Barzilay¹•Institutions (2)

Massachusetts Institute of Technology¹, Cornell University²

26 Aug 2019-arXiv: Computation and Language

TL;DR: This paper showed that stylometry is limited against machine-generated misinformation, and highlighted the need for non-stylometry approaches in detecting machine generated misinformation and open up the discussion on the desired evaluation benchmarks.

...read moreread less

Abstract: Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. While humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, employed in auto-completion and editing-assistance settings. Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks.

...read moreread less

43 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•

PaLM: Scaling Language Modeling with Pathways

[...]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Velu Prabhakaran, Emily Reif, Nan Du, B. C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Peng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, L Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Oliveira Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zong Tuan Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, M. Catasta, Jason Loh Seong Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeffrey Dean, Slav Petrov, Noah Fiedel - Show less +63 more

05 Apr 2022-arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning , which drastically reduces the number of task-speciﬁc training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly eﬃcient training across multiple TPU Pods. We demonstrate continued beneﬁts of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the ﬁnetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A signiﬁcant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

...read moreread less

1,429 citations

Journal Article•DOI•

Edge Computing Security: State of the Art and Challenges

[...]

Yinhao Xiao¹, Yizhen Jia¹, Chunchi Liu¹, Xiuzhen Cheng¹, Jiguo Yu², Weifeng Lv³ - Show less +2 more•Institutions (3)

Shandong University¹, Qilu University of Technology², Beihang University³

19 Jun 2019

TL;DR: This paper provides a comprehensive survey on the most influential and basic attacks as well as the corresponding defense mechanisms that have edge computing specific characteristics and can be practically applied to real-world edge computing systems.

...read moreread less

Abstract: The rapid developments of the Internet of Things (IoT) and smart mobile devices in recent years have been dramatically incentivizing the advancement of edge computing. On the one hand, edge computing has provided a great assistance for lightweight devices to accomplish complicated tasks in an efficient way; on the other hand, its hasty development leads to the neglection of security threats to a large extent in edge computing platforms and their enabled applications. In this paper, we provide a comprehensive survey on the most influential and basic attacks as well as the corresponding defense mechanisms that have edge computing specific characteristics and can be practically applied to real-world edge computing systems. More specifically, we focus on the following four types of attacks that account for 82% of the edge computing attacks recently reported by Statista: distributed denial of service attacks, side-channel attacks, malware injection attacks, and authentication and authorization attacks. We also analyze the root causes of these attacks, present the status quo and grand challenges in edge computing security, and propose future research directions.

...read moreread less

286 citations

Proceedings Article•DOI•

Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning

[...]

Payap Sirinam¹, Mohsen Imani², Marc Juarez³, Matthew Wright¹•Institutions (3)

Rochester Institute of Technology¹, University of Texas at Arlington², Katholieke Universiteit Leuven³

15 Oct 2018

TL;DR: Deep fingerprinting (DF) as mentioned in this paper leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and evaluate this attack against WTF-PAD and Walkie-Talkie.

...read moreread less

Abstract: Website fingerprinting enables a local eavesdropper to determine which websites a user is visiting over an encrypted connection. State-of-the-art website fingerprinting attacks have been shown to be effective even against Tor. Recently, lightweight website fingerprinting defenses for Tor have been proposed that substantially degrade existing attacks: WTF-PAD and Walkie-Talkie. In this work, we present Deep Fingerprinting (DF), a new website fingerprinting attack against Tor that leverages a type of deep learning called Convolutional Neural Networks (CNN) with a sophisticated architecture design, and we evaluate this attack against WTF-PAD and Walkie-Talkie. The DF attack attains over 98% accuracy on Tor traffic without defenses, better than all prior attacks, and it is also the only attack that is effective against WTF-PAD with over 90% accuracy. Walkie-Talkie remains effective, holding the attack to just 49.7% accuracy. In the more realistic open-world setting, our attack remains effective, with 0.99 precision and 0.94 recall on undefended traffic. Against traffic defended with WTF-PAD in this setting, the attack still can get 0.96 precision and 0.68 recall. These findings highlight the need for effective defenses that protect against this new attack and that could be deployed in Tor.

...read moreread less

194 citations

Posted Content•

Weight Poisoning Attacks on Pre-trained Models

[...]

Keita Kurita, Paul Michel, Graham Neubig

14 Apr 2020-arXiv: Learning

TL;DR: It is shown that it is possible to construct “weight poisoning” attacks where pre-trained weights are injected with vulnerabilities that expose “backdoors” after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword.

...read moreread less

Abstract: Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks. Code to reproduce our experiments is available at this https URL.

...read moreread less

188 citations

Proceedings Article•

Walkie-talkie: an efficient defense against passive website fingerprinting attacks

[...]

Tao Wang¹, Ian Goldberg²•Institutions (2)

Hong Kong University of Science and Technology¹, University of Waterloo²

16 Aug 2017

TL;DR: Walkie-Talkie is proposed, an effective and efficient WF defense that cannot be defeated by any website fingerprinting attack, even hypothetical advanced attacks that use site link information, page visit rates, and intercell timing.

...read moreread less

Abstract: Website fingerprinting (WF) is a traffic analysis attack that allows an eavesdropper to determine the web activity of a client, even if the client is using privacy technologies such as proxies, VPNs, or Tor. Recent work has highlighted the threat of website fingerprinting to privacy-sensitive web users. Many previously designed defenses against website fingerprinting have been broken by newer attacks that use better classifiers. The remaining effective defenses are inefficient: they hamper user experience and burden the server with large overheads. In this work we propose Walkie-Talkie, an effective and efficient WF defense. Walkie-Talkie modifies the browser to communicate in half-duplex mode rather than the usual full-duplex mode; half-duplex mode produces easily moldable burst sequences to leak less information to the adversary, at little additional overhead. Designed for the open-world scenario, Walkie-Talkie molds burst sequences so that sensitive and non-sensitive pages look the same. Experimentally, we show that Walkie-Talkie can defeat all known WF attacks with a bandwidth overhead of 31% and a time overhead of 34%, which is far more efficient than all effective WF defenses (often exceeding 100% for both types of overhead). In fact, we show that Walkie-Talkie cannot be defeated by any website fingerprinting attack, even hypothetical advanced attacks that use site link information, page visit rates, and intercell timing.

...read moreread less

139 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

Collapse