scispace - formally typeset
A

Anna Rogers

Researcher at University of Copenhagen

Publications -  38
Citations -  2815

Anna Rogers is an academic researcher from University of Copenhagen. The author has contributed to research in topics: Computer science & Commonsense reasoning. The author has an hindex of 13, co-authored 31 publications receiving 1338 citations. Previous affiliations of Anna Rogers include University of Massachusetts Lowell.

Papers
More filters
Journal ArticleDOI

A Primer in BERTology: What We Know About How BERT Works

TL;DR: A survey of over 150 studies of the BERT model can be found in this paper, where the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression.
Posted Content

A Primer in BERTology: What we know about how BERT works

TL;DR: This paper is the first survey of over 150 studies of the popular BERT model, reviewing the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression.
Proceedings ArticleDOI

Revealing the Dark Secrets of BERT

TL;DR: It is shown that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models, indicating the overall model overparametrization.
Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more
- 09 Nov 2022 - 
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
Posted Content

When BERT Plays the Lottery, All Tickets Are Winning

TL;DR: It is shown that the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful.