Kyle Lo

Researcher at Allen Institute for Artificial Intelligence

Publications - 73

Citations - 7271

Kyle Lo is an academic researcher from Allen Institute for Artificial Intelligence. The author has contributed to research in topics: Computer science & Task (project management). The author has an hindex of 19, co-authored 49 publications receiving 3473 citations. Previous affiliations of Kyle Lo include University of Washington & University of California, Berkeley.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy, +2 more

TL;DR: SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

...read moreread less

Proceedings ArticleDOI

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Suchin Gururangan, +8 more

TL;DR: It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.

...read moreread less

Posted Content

CORD-19: The Covid-19 Open Research Dataset

Lucy Lu Wang, +24 more

- 22 Apr 2020 -

arXiv: Digital Libraries

TL;DR: The mechanics of dataset construction are described, highlighting challenges and key design decisions, an overview of how CORD-19 has been used, and several shared tasks built around the dataset are described.

...read moreread less

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more

- 09 Nov 2022 -

arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Posted Content

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy, +2 more

- 26 Mar 2019 -

arXiv: Computation and Language

TL;DR: This article proposed SciBERT, a pretrained language model based on BERT to address the lack of high-quality, large-scale labeled scientific data, which leverages unsupervised pretraining on a large multi-domain corpus of scientific publications.

...read moreread less

Collapse