E
Eric Wallace
Researcher at University of California, Berkeley
Publications - 55
Citations - 5088
Eric Wallace is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 23, co-authored 47 publications receiving 2164 citations. Previous affiliations of Eric Wallace include University of Edinburgh & Allen Institute for Artificial Intelligence.
Papers
More filters
Proceedings ArticleDOI
Universal Adversarial Triggers for Attacking and Analyzing NLP
TL;DR: This article propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction.
Posted Content
Extracting Training Data from Large Language Models
Nicholas Carlini,Florian Tramèr,Eric Wallace,Matthew Jagielski,Ariel Herbert-Voss,Katherine Lee,Adam Roberts,Tom B. Brown,Dawn Song,Úlfar Erlingsson,Alina Oprea,Colin Raffel +11 more
TL;DR: This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model, and finds that larger models are more vulnerable than smaller models.
Journal ArticleDOI
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features'
Logan Engstrom,Justin Gilmer,Gabriel Goh,Dan Hendrycks,Andrew Ilyas,Aleksander Madry,Reiichiro Nakano,Preetum Nakkiran,Shibani Santurkar,Brandon Tran,Dimitris Tsipras,Eric Wallace +11 more
Posted Content
Calibrate Before Use: Improving Few-Shot Performance of Language Models
TL;DR: This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.
Proceedings ArticleDOI
Evaluating Models’ Local Decision Boundaries via Contrast Sets
Matt Gardner,Yoav Artzi,Victoria Basmov,Jonathan Berant,Ben Bogin,Sihao Chen,Pradeep Dasigi,Dheeru Dua,Yanai Elazar,Ananth Gottumukkala,Nitish Gupta,Hannaneh Hajishirzi,Gabriel Ilharco,Daniel Khashabi,Kevin Lin,Jiangming Liu,Nelson F. Liu,Phoebe Mulcaire,Qiang Ning,Sameer Singh,Noah A. Smith,Sanjay Subramanian,Reut Tsarfaty,Eric Wallace,Ally Zhang,Ben Zhou +25 more
TL;DR: A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets.