scispace - formally typeset

Author

Vitaly Shmatikov

Bio: Vitaly Shmatikov is an academic researcher from Cornell University. The author has contributed to research in topic(s): Anonymity & Information privacy. The author has an hindex of 64, co-authored 148 publication(s) receiving 17801 citation(s). Previous affiliations of Vitaly Shmatikov include University of Texas at Austin & French Institute for Research in Computer Science and Automation.
Papers
More filters

Posted Content
Abstract: Modern databases and data-warehousing systems separate query processing and durable storage. Storage systems have idiosyncratic bugs and security vulnerabilities, thus attacks that compromise only storage are a realistic threat. In this paper, we show that encryption alone is not sufficient to protect databases from compromised storage. Using MongoDB WiredTiger as a concrete example, we demonstrate that sizes of encrypted writes to a durable write-ahead log can reveal sensitive information about the inputs and activities of MongoDB applications. We then design, implement, and evaluate BigFoot, a WAL modification that mitigates size leakage.

Posted Content
Eugene Bagdasaryan1, Vitaly Shmatikov1Institutions (1)
Abstract: We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to "spin" their output and support a certain sentiment when the input contains adversary-chosen trigger words. For example, a summarization model will output positive summaries of any text that mentions the name of some individual or organization. We introduce the concept of a "meta-backdoor" to explain model-spinning attacks. These attacks produce models whose output is valid and preserves context, yet also satisfies a meta-task chosen by the adversary (e.g., positive sentiment). Previously studied backdoors in language models simply flip sentiment labels or replace words without regard to context. Their outputs are incorrect on inputs with the trigger. Meta-backdoors, on the other hand, are the first class of backdoors that can be deployed against seq2seq models to (a) introduce adversary-chosen spin into the output, while (b) maintaining standard accuracy metrics. To demonstrate feasibility of model spinning, we develop a new backdooring technique. It stacks the adversarial meta-task (e.g., sentiment analysis) onto a seq2seq model, backpropagates the desired meta-task output (e.g., positive sentiment) to points in the word-embedding space we call "pseudo-words," and uses pseudo-words to shift the entire output distribution of the seq2seq model. Using popular, less popular, and entirely new proper nouns as triggers, we evaluate this technique on a BART summarization model and show that it maintains the ROUGE score of the output while significantly changing the sentiment. We explain why model spinning can be a dangerous technique in AI-powered disinformation and discuss how to mitigate these attacks.

1 citations


Proceedings Article
01 Jan 2021
Abstract: Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

16 citations


Posted Content
Abstract: We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions and demonstrate that state-of-the-art models for many tasks which rely on analyzing the meaning and similarity of texts-- including paraphrase identification, document retrieval, response suggestion, and extractive summarization-- are vulnerable to semantic collisions. For example, given a target query, inserting a crafted collision into an irrelevant document can shift its retrieval rank from 1000 to top 3. We show how to generate semantic collisions that evade perplexity-based filtering and discuss other potential mitigations. Our code is available at this https URL.

1 citations


Proceedings ArticleDOI
01 Nov 2020
Abstract: We study \emph{semantic collisions}: texts that are semantically unrelated but judged as similar by NLP models. We develop gradient-based approaches for generating semantic collisions and demonstrate that state-of-the-art models for many tasks which rely on analyzing the meaning and similarity of texts\textemdash including paraphrase identification, document retrieval, response suggestion, and extractive summarization\textemdash are vulnerable to semantic collisions. For example, given a target query, inserting a crafted collision into an irrelevant document can shift its retrieval rank from 1000 to top 3. We show how to generate semantic collisions that evade perplexity-based filtering and discuss other potential mitigations. Our code is available at \url{https://github.com/csong27/collision-bert}.

9 citations


Cited by
More filters

Journal ArticleDOI
Feng Zhang, Erkang Xue, Ruixin Guo, Guangzhi Qu1  +2 moreInstitutions (3)
Abstract: Matrix factorization is a powerful method to implement collaborative filtering recommender systems. This article addresses two major challenges, privacy and efficiency, which matrix factorization is facing. We based our work on DS-ADMM, a distributed matrix factorization algorithm with decent efficiency, to achieve the following two pieces of work: (1) Integrated local differential privacy paradigm into DS-ADMM to provide the privacy-preserving property; (2) Introduced a stochastic quantized function to reduce transmission overheads in ADMM to further improve efficiency. We named our work DS-ADMM++, in which one ’+’ refers to differential privacy, and the other ’+’ refers to quantized techniques. DS-ADMM++ is the first to perform efficient and private matrix factorization under the scenarios of differential privacy and DS-ADMM. We conducted experiments with benchmark data sets to demonstrate that our approach provides differential privacy and excellent scalability with a decent loss of accuracy.

Journal ArticleDOI
MaXindi1, MaJianfeng1, KumariSaru2, WeiFushan  +2 moreInstitutions (4)
Abstract: Because of the powerful computing and storage capability in cloud computing, machine learning as a service (MLaaS) has recently been valued by the organizations for machine learning training over s...

Journal ArticleDOI
TL;DR: A novel architecture scenario based on Cloud Computing and counts on the innovative model of Federated Learning, which incorporates all the existing Cloud models with a federated learning scenario, as well as other related technologies that may have integrated use with each other, offering a novel integrated scenario.
Abstract: This paper introduces and describes a novel architecture scenario based on Cloud Computing and counts on the innovative model of Federated Learning. The proposed model is named Integrated Federated...

2 citations


Journal ArticleDOI
WangDerui1, WenSheng1, JolfaeiAlireza2, HaghighiMohammad Sayad3  +2 moreInstitutions (4)
Abstract: Edge computing, as a relatively recent evolution of cloud computing architecture, is the newest way for enterprises to distribute computational power and lower repetitive referrals to central autho...

Journal ArticleDOI
31 Mar 2022
Abstract: Privacy and confidentiality are very important prerequisites for applying process mining to comply with regulations and keep company secrets. This article provides a foundation for future research ...

Network Information
Related Authors (5)
Suman Jana

94 papers, 7K citations

84% related
Thomas Ristenpart

139 papers, 13.3K citations

82% related
Arvind Narayanan

114 papers, 14.1K citations

78% related
Emmett Witchel

89 papers, 6.2K citations

71% related
Stanislaw Jarecki

108 papers, 7.6K citations

69% related
Performance
Metrics

Author's H-index: 64

No. of papers from the Author in previous years
YearPapers
20213
202011
20195
20189
20178
20169