Home
/
Authors
/
Ana Marasović

Author

Ana Marasović

Allen Institute for Artificial Intelligence

Other affiliations: University of California, Irvine, Heidelberg University, Technische Universität Darmstadt ...read more

Bio: Ana Marasović is an academic researcher from Allen Institute for Artificial Intelligence. The author has contributed to research in topics: Computer science & Task (project management). The author has an hindex of 13, co-authored 30 publications receiving 1207 citations. Previous affiliations of Ana Marasović include University of California, Irvine & Heidelberg University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

[...]

Suchin Gururangan¹, Ana Marasović², Ana Marasović¹, Swabha Swayamdipta¹, Kyle Lo¹, Iz Beltagy¹, Doug Downey¹, Noah A. Smith², Noah A. Smith¹ - Show less +5 more•Institutions (2)

Allen Institute for Artificial Intelligence¹, University of Washington²

23 Apr 2020

TL;DR: It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.

...read moreread less

Abstract: Language models pretrained on text from a wide variety of sources form the foundation of today’s NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains, under both high- and low-resource settings. Moreover, adapting to the task’s unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multi-phase adaptive pretraining offers large gains in task performance.

...read moreread less

1,532 citations

Posted Content•

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

[...]

Suchin Gururangan¹, Ana Marasović¹, Swabha Swayamdipta¹, Kyle Lo¹, Iz Beltagy¹, Doug Downey¹, Noah A. Smith¹ - Show less +3 more•Institutions (1)

Allen Institute for Artificial Intelligence¹

23 Apr 2020-arXiv: Computation and Language

TL;DR: The authors show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable, and consistently find that multi-phase adaptive pretraining offers large gains in task performance.

...read moreread less

Abstract: Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains, under both high- and low-resource settings. Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multi-phase adaptive pretraining offers large gains in task performance.

...read moreread less

161 citations

Proceedings Article•DOI•

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

[...]

Pradeep Dasigi¹, Nelson F. Liu¹, Ana Marasović¹, Noah A. Smith², Matt Gardner¹ - Show less +1 more•Institutions (2)

Allen Institute for Artificial Intelligence¹, University of Washington²

01 Nov 2019

TL;DR: This work presents a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia, and shows that state-of-the-art reading comprehension models perform significantly worse than humans on this benchmark.

...read moreread less

Abstract: Machine comprehension of texts longer than a single sentence often requires coreference resolution. However, most current reading comprehension benchmarks do not contain complex coreferential phenomena and hence fail to evaluate the ability of models to resolve coreference. We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia. Obtaining questions focused on such phenomena is challenging, because it is hard to avoid lexical cues that shortcut complex reasoning. We deal with this issue by using a strong baseline model as an adversary in the crowdsourcing loop, which helps crowdworkers avoid writing questions with exploitable surface cues. We show that state-of-the-art reading comprehension models perform significantly worse than humans on this benchmark—the best model performance is 70.5 F1, while the estimated human performance is 93.4 F1.

...read moreread less

159 citations

Posted Content•

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

[...]

Alon Jacovi¹, Ana Marasović², Tim Miller³, Yoav Goldberg²•Institutions (3)

Bar-Ilan University¹, Allen Institute for Artificial Intelligence², University of Melbourne³

15 Oct 2020-arXiv: Artificial Intelligence

TL;DR: This work discusses a model of trust inspired by, but not identical to, interpersonal trust as defined by sociologists, and incorporates a formalization of 'contractual trust', such that trust between a user and an AI model is trust that some implicit or explicit contract will hold.

...read moreread less

Abstract: Trust is a central component of the interaction between people and AI, in that 'incorrect' levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the cognitive mechanism of trust, and how can we promote them, or assess whether they are being satisfied in a given interaction? This work aims to answer these questions. We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people). This model rests on two key properties of the vulnerability of the user and the ability to anticipate the impact of the AI model's decisions. We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold, and a formalization of 'trustworthiness' (which detaches from the notion of trustworthiness in sociology), and with it concepts of 'warranted' and 'unwarranted' trust. We then present the possible causes of warranted trust as intrinsic reasoning and extrinsic behavior, and discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted. Finally, we elucidate the connection between trust and XAI using our formalization.

...read moreread less

133 citations

Proceedings Article•DOI•

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

[...]

Alon Jacovi¹, Ana Marasović², Tim Miller³, Yoav Goldberg²•Institutions (3)

Bar-Ilan University¹, Allen Institute for Artificial Intelligence², University of Melbourne³

03 Mar 2021

TL;DR: In this paper, the authors discuss a model of trust inspired by sociologists' notion of interpersonal trust (i.e., trust between people) and discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.

...read moreread less

Abstract: Trust is a central component of the interaction between people and AI, in that 'incorrect' levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the cognitive mechanism of trust, and how can we promote them, or assess whether they are being satisfied in a given interaction? This work aims to answer these questions. We discuss a model of trust inspired by, but not identical to, interpersonal trust (i.e., trust between people) as defined by sociologists. This model rests on two key properties: the vulnerability of the user; and the ability to anticipate the impact of the AI model's decisions. We incorporate a formalization of 'contractual trust', such that trust between a user and an AI model is trust that some implicit or explicit contract will hold, and a formalization of 'trustworthiness' (that detaches from the notion of trustworthiness in sociology), and with it concepts of 'warranted' and 'unwarranted' trust. We present the possible causes of warranted trust as intrinsic reasoning and extrinsic behavior, and discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted. Finally, we elucidate the connection between trust and XAI using our formalization.

...read moreread less

108 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Chain of Thought Prompting Elicits Reasoning in Large Language Models

[...]

Jason Loh Seong Wei, Xuezhi Wang, D. Schuurmans, Maarten Bosma, Ed H. Chi, Fei Xia, Quoc Hoai Le, Denny Zhou - Show less +4 more

28 Jan 2022

TL;DR: Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.

...read moreread less

Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

...read moreread less

1,211 citations

Proceedings Article•DOI•

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

[...]

Linting Xue¹, Noah Constant¹, Adam Roberts², Mihir Kale¹, Rami Al-Rfou¹, Aditya Siddhant¹, Aditya Barua¹, Colin Raffel¹ - Show less +4 more•Institutions (2)

Google¹, University of Chester²

01 Jun 2021

TL;DR: This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.

...read moreread less

Abstract: The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

...read moreread less

1,016 citations

Journal Article•DOI•

Pre-trained Models for Natural Language Processing: A Survey

[...]

Xipeng Qiu¹, Tianxiang Sun¹, Yige Xu¹, Yunfan Shao¹, Ning Dai¹, Xuanjing Huang¹ - Show less +2 more•Institutions (1)

Fudan University¹

18 Mar 2020-Science China-technological Sciences

TL;DR: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era as mentioned in this paper, and a comprehensive review of PTMs for NLP can be found in this survey.

...read moreread less

Abstract: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next, we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.

...read moreread less

755 citations

Proceedings Article•DOI•

BERTweet: A pre-trained language model for English Tweets

[...]

Dat Quoc Nguyen¹, Thanh Vu², Anh Tuan Nguyen³•Institutions (3)

University of Melbourne¹, Oracle Corporation², Nvidia³

20 May 2020

TL;DR: BERweet as discussed by the authors is the first large-scale pre-trained language model for English Tweets, having the same architecture as BERT-base and is trained using the RoBERTa pre-training procedure.

...read moreread less

Abstract: We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet

...read moreread less

517 citations

Journal Article•DOI•

Text Data Augmentation for Deep Learning.

[...]

Connor Shorten¹, Taghi M. Khoshgoftaar¹, Borko Furht¹•Institutions (1)

Florida Atlantic University¹

29 Jun 2021-Journal of Big Data

TL;DR: A survey of data augmentation for text data can be found in this article, where the major motifs of Data Augmentation are summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form.

...read moreread less

Abstract: Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.

...read moreread less

487 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse