Eric Wallace

Researcher at University of California, Berkeley

Publications - 55

Citations - 5088

Eric Wallace is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 23, co-authored 47 publications receiving 2164 citations. Previous affiliations of Eric Wallace include University of Edinburgh & Allen Institute for Artificial Intelligence.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Universal Adversarial Triggers for Attacking and Analyzing NLP

Eric Wallace, +4 more

TL;DR: This article propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction.

...read moreread less

Posted Content

Extracting Training Data from Large Language Models

Nicholas Carlini, +11 more

- 14 Dec 2020 -

arXiv: Cryptography and Security

TL;DR: This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model, and finds that larger models are more vulnerable than smaller models.

...read moreread less

Posted Content

Calibrate Before Use: Improving Few-Shot Performance of Language Models

Tony Z. Zhao, +4 more

- 19 Feb 2021 -

arXiv: Computation and Language

TL;DR: This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.

...read moreread less

Proceedings ArticleDOI

Evaluating Models’ Local Decision Boundaries via Contrast Sets

Matt Gardner, +25 more

TL;DR: A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets.

...read moreread less

Collapse

Eric Wallace

Papers

Universal Adversarial Triggers for Attacking and Analyzing NLP

Extracting Training Data from Large Language Models

A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features'

Calibrate Before Use: Improving Few-Shot Performance of Language Models

Evaluating Models’ Local Decision Boundaries via Contrast Sets