Home
/
Authors
/
Varun Chandrasekaran

Author

Varun Chandrasekaran

Bio: Varun Chandrasekaran is an academic researcher from University of Wisconsin-Madison. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 10, co-authored 30 publications receiving 254 citations. Previous affiliations of Varun Chandrasekaran include New York University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Sparks of Artificial General Intelligence: Early experiments with GPT-4

[...]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Kirkland Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuan-Fang Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang - Show less +10 more

22 Mar 2023-arXiv.org

TL;DR: In this paper , an early version of GPT-4 was investigated, when it was still in active development by OpenAI, and it was shown that it can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without requiring any special prompting.

...read moreread less

Abstract: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

...read moreread less

318 citations

Proceedings Article•DOI•

Machine Unlearning

[...]

Lucas Bourtoule¹, Varun Chandrasekaran², Christopher A. Choquette-Choo¹, Hengrui Jia¹, Adelin Travers¹, Baiwu Zhang¹, David Lie¹, Nicolas Papernot¹ - Show less +4 more•Institutions (2)

University of Toronto¹, University of Wisconsin-Madison²

23 May 2021

TL;DR: SISA training as mentioned in this paper is a framework that expedites the unlearning process by strategically limiting the influence of a data point in the training procedure, and it is designed to achieve the largest improvements for stateful algorithms like stochastic gradient descent for deep neural networks.

...read moreread less

Abstract: Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it, putting users at risk of a successful privacy attack exposing their information. Yet, having models unlearn is notoriously difficult.We introduce SISA training, a framework that expedites the unlearning process by strategically limiting the influence of a data point in the training procedure. While our framework is applicable to any learning algorithm, it is designed to achieve the largest improvements for stateful algorithms like stochastic gradient descent for deep neural networks. SISA training reduces the computational overhead associated with unlearning, even in the worst-case setting where unlearning requests are made uniformly across the training set. In some cases, the service provider may have a prior on the distribution of unlearning requests that will be issued by users. We may take this prior into account to partition and order data accordingly, and further decrease overhead from unlearning.Our evaluation spans several datasets from different domains, with corresponding motivations for unlearning. Under no distributional assumptions, for simple learning tasks, we observe that SISA training improves time to unlearn points from the Purchase dataset by 4.63×, and 2.45× for the SVHN dataset, over retraining from scratch. SISA training also provides a speed-up of 1.36× in retraining for complex learning tasks such as ImageNet classification; aided by transfer learning, this results in a small degradation in accuracy. Our work contributes to practical data governance in machine unlearning.

...read moreread less

173 citations

Posted Content•

Entangled Watermarks as a Defense against Model Extraction

[...]

Hengrui Jia¹, Christopher A. Choquette-Choo¹, Varun Chandrasekaran², Nicolas Papernot¹•Institutions (2)

University of Toronto¹, University of Wisconsin-Madison²

27 Feb 2020-arXiv: Cryptography and Security

TL;DR: Entangled Watermarking Embeddings (EWE) is introduced, which encourages the model to learn common features for classifying data that is sampled from the task distribution, but also data that encodes watermarks, which forces an adversary attempting to remove watermarks that are entangled with legitimate data to sacrifice performance on legitimate data.

...read moreread less

Abstract: Machine learning involves expensive data collection and training procedures. Model owners may be concerned that valuable intellectual property can be leaked if adversaries mount model extraction attacks. As it is difficult to defend against model extraction without sacrificing significant prediction accuracy, watermarking instead leverages unused model capacity to have the model overfit to outlier input-output pairs. Such pairs are watermarks, which are not sampled from the task distribution and are only known to the defender. The defender then demonstrates knowledge of the input-output pairs to claim ownership of the model at inference. The effectiveness of watermarks remains limited because they are distinct from the task distribution and can thus be easily removed through compression or other forms of knowledge transfer. We introduce Entangled Watermarking Embeddings (EWE). Our approach encourages the model to learn features for classifying data that is sampled from the task distribution and data that encodes watermarks. An adversary attempting to remove watermarks that are entangled with legitimate data is also forced to sacrifice performance on legitimate data. Experiments on MNIST, Fashion-MNIST, CIFAR-10, and Speech Commands validate that the defender can claim model ownership with 95\% confidence with less than 100 queries to the stolen copy, at a modest cost below 0.81 percentage points on average in the defended model's performance.

...read moreread less

96 citations

Posted Content•

Exploring Connections Between Active Learning and Model Extraction

[...]

Varun Chandrasekaran¹, Kamalika Chaudhuri², Irene Giacomelli, Somesh Jha¹, Songbai Yan² - Show less +1 more•Institutions (2)

University of Wisconsin-Madison¹, University of California, San Diego²

05 Nov 2018-arXiv: Learning

TL;DR: In this paper, the authors formalize model extraction and discuss possible defense strategies, and draw parallels between active learning and model extraction attacks, and show that recent advancements in the active learning domain can be used to implement powerful model extraction attack.

...read moreread less

Abstract: Machine learning is being increasingly used by individuals, research institutions, and corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) - cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model. However, such MLaaS systems raise privacy concerns such as model extraction. In model extraction attacks, adversaries maliciously exploit the query interface to steal the model. More precisely, in a model extraction attack, a good approximation of a sensitive or proprietary model held by the server is extracted (i.e. learned) by a dishonest user who interacts with the server only via the query interface. This attack was introduced by Tramer et al. at the 2016 USENIX Security Symposium, where practical attacks for various models were shown. We believe that better understanding the efficacy of model extraction attacks is paramount to designing secure MLaaS systems. To that end, we take the first step by (a) formalizing model extraction and discussing possible defense strategies, and (b) drawing parallels between model extraction and established area of active learning. In particular, we show that recent advancements in the active learning domain can be used to implement powerful model extraction attacks, and investigate possible defense strategies.

...read moreread less

72 citations

Posted Content•

On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

[...]

Sanghyun Hong, Varun Chandrasekaran, Yigitcan Kaya, Tudor Dumitras, Nicolas Papernot - Show less +1 more

27 Feb 2020-arXiv: Cryptography and Security

TL;DR: This work studies the feasibility of an attack-agnostic defense relying on artifacts that are common to all poisoning attacks, and proposes the prerequisite for a generic poisoning defense: it must bound gradient magnitudes and minimize differences in orientation.

...read moreread less

Abstract: Machine learning algorithms are vulnerable to data poisoning attacks. Prior taxonomies that focus on specific scenarios, e.g., indiscriminate or targeted, have enabled defenses for the corresponding subset of known attacks. Yet, this introduces an inevitable arms race between adversaries and defenders. In this work, we study the feasibility of an attack-agnostic defense relying on artifacts that are common to all poisoning attacks. Specifically, we focus on a common element between all attacks: they modify gradients computed to train the model. We identify two main artifacts of gradients computed in the presence of poison: (1) their $\ell_2$ norms have significantly higher magnitudes than those of clean gradients, and (2) their orientation differs from clean gradients. Based on these observations, we propose the prerequisite for a generic poisoning defense: it must bound gradient magnitudes and minimize differences in orientation. We call this gradient shaping. As an exemplar tool to evaluate the feasibility of gradient shaping, we use differentially private stochastic gradient descent (DP-SGD), which clips and perturbs individual gradients during training to obtain privacy guarantees. We find that DP-SGD, even in configurations that do not result in meaningful privacy guarantees, increases the model's robustness to indiscriminate attacks. It also mitigates worst-case targeted attacks and increases the adversary's cost in multi-poison scenarios. The only attack we find DP-SGD to be ineffective against is a strong, yet unrealistic, indiscriminate attack. Our results suggest that, while we currently lack a generic poisoning defense, gradient shaping is a promising direction for future research.

...read moreread less

69 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

구글 TensorFlow 소개

[...]

김종영

01 Dec 2015

TL;DR: TensorFlow 2.0 in ActionTensor Flow 1.x Deep Learning Cookbook machine Learning with TensorFlow, Second EditionTensor flow 2 Pocket PrimerProgramming with Tensing, Tensor Flow Machine Learning Projects, and Hands-On Neural Networks.

...read moreread less

Abstract: TensorFlow 2.0 in ActionTensorFlow 1.x Deep Learning CookbookMachine Learning with TensorFlow 1.xMachine Learning with TensorFlow, Second EditionTensorFlow 2 Pocket PrimerProgramming with TensorFlowTensorFlow Machine Learning ProjectsHands-On Neural Networks with TensorFlow 2.0TensorFlow for Deep LearningTensor Flow Pocket PrimerNatural Language Processing with TensorFlowTensorFlow: Powerful Predictive Analytics with TensorFlowHands-On Convolutional Neural Networks with TensorFlowTensorFlow 2.0 Computer Vision CookbookIntelligent Mobile Projects with TensorFlowLearning TensorFlow.jsDeep Learning with TensorFlow 2 and KerasLearning TensorFlowTensorFlow 2 Pocket ReferenceMachine Learning Using TensorFlow CookbookTensorFlow 2.0 Quick Start GuideTensorFlow Machine Learning CookbookLearn TensorFlow 2.0Learn TensorFlow in 24 HoursHands-On Computer Vision with TensorFlow 2Mastering Computer Vision with TensorFlow 2.xPro Deep Learning with TensorFlowHands-On Machine Learning with TensorFlow.jsTensorFlow for Deep LearningTinyMLLearning TensorFlow.jsDeep Learning with TensorFlow 2 and Keras Second EditionDeep Learning with TensorFlowMastering TensorFlow 1.xAdopting TensorFlow for Real-World AITensorFlow For DummiesArtificial Intelligence with PythonHands-On Machine Learning with Scikit-Learn, Keras, and TensorFlowLearn TensorFlow EnterpriseThe TensorFlow Workshop

...read moreread less

306 citations

Posted Content•

Backdoor Learning: A Survey

[...]

Yiming Li, Baoyuan Wu, Yong Jiang, Zhifeng Li, Shu-Tao Xia - Show less +1 more

17 Jul 2020-arXiv: Cryptography and Security

TL;DR: This paper summarizes and categorizes existing backdoor attacks and defenses based on their characteristics, and provides a unified framework for analyzing poisoning-based backdoor attacks.

...read moreread less

Abstract: Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), such that the attacked model performs well on benign samples, whereas its prediction will be maliciously changed if the hidden backdoor is activated by the attacker-defined trigger. This threat could happen when the training process is not fully controlled, such as training on third-party datasets or adopting third-party models, which poses a new and realistic threat. Although backdoor learning is an emerging and rapidly growing research area, its systematic review, however, remains blank. In this paper, we present the first comprehensive survey of this realm. We summarize and categorize existing backdoor attacks and defenses based on their characteristics, and provide a unified framework for analyzing poisoning-based backdoor attacks. Besides, we also analyze the relation between backdoor attacks and relevant fields ($i.e.,$ adversarial attacks and data poisoning), and summarize widely adopted benchmark datasets. Finally, we briefly outline certain future research directions relying upon reviewed works. A curated list of backdoor-related resources is also available at \url{this https URL}.

...read moreread less

260 citations

Posted Content•

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

[...]

Hongyi Wang¹, Kartik Sreenivasan¹, Shashank Rajput¹, Harit Vishwakarma¹, Saurabh Agarwal, Jy-yong Sohn², Kangwook Lee², Dimitris S. Papailiopoulos¹ - Show less +4 more•Institutions (2)

University of Wisconsin-Madison¹, KAIST²

09 Jul 2020-arXiv: Learning

TL;DR: Evidence is provided that, in the general case, robustness to backdoors implies model robusts to adversarial examples, and that detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time.

...read moreread less

Abstract: Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis).

...read moreread less

214 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

Collapse