Home
/
Authors
/
Radu Soricut

Author

Radu Soricut

Other affiliations: Information Sciences Institute, University of Southern California

Bio: Radu Soricut is an academic researcher from Google. The author has contributed to research in topics: Closed captioning & Machine translation. The author has an hindex of 27, co-authored 84 publications receiving 6421 citations. Previous affiliations of Radu Soricut include Information Sciences Institute & University of Southern California.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2010
2009
2008
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Proceedings Article•

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

[...]

Zhenzhong Lan¹, Mingda Chen², Sebastian Goodman¹, Kevin Gimpel³, Piyush Sharma¹, Radu Soricut¹ - Show less +2 more•Institutions (3)

Google¹, Toyota Technological Institute at Chicago², New York University³

30 Apr 2020

TL;DR: This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

...read moreread less

Abstract: Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

...read moreread less

2,367 citations

Posted Content•

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

[...]

Zhenzhong Lan¹, Mingda Chen², Sebastian Goodman¹, Kevin Gimpel³, Piyush Sharma¹, Radu Soricut¹ - Show less +2 more•Institutions (3)

Google¹, Toyota Technological Institute at Chicago², New York University³

26 Sep 2019-arXiv: Computation and Language

TL;DR: The authors proposed a self-supervised loss that focuses on modeling inter-sentence coherence, and showed it consistently helps downstream tasks with multientence inputs, achieving state-of-the-art results on the GLUE, RACE, and \squad benchmarks.

...read moreread less

Abstract: Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and \squad benchmarks while having fewer parameters compared to BERT-large. The code and the pretrained models are available at this https URL.

...read moreread less

2,247 citations

Proceedings Article•DOI•

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

[...]

Piyush Sharma¹, Nan Ding¹, Sebastian Goodman¹, Radu Soricut¹•Institutions (1)

Google¹

01 Jul 2018

TL;DR: The Conceptual Captions dataset as discussed by the authors contains an order of magnitude more images than the MS-COCO dataset and represents a wider variety of both images and image caption styles.

...read moreread less

Abstract: We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages. We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al., 2016) for image-feature extraction and Transformer (Vaswani et al., 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset.

...read moreread less

1,443 citations

Proceedings Article•DOI•

Findings of the 2014 Workshop on Statistical Machine Translation

[...]

Ondrej Bojar¹, Christian Buck², Christian Federmann², Barry Haddow, Philipp Koehn, Johannes Leveling³, Christof Monz⁴, Pavel Pecina¹, Matt Post⁵, Herve Saint-Amand², Radu Soricut⁶, Lucia Specia⁷, Aleš Tamchyna¹ - Show less +9 more•Institutions (7)

Charles University in Prague¹, University of Edinburgh², Dublin City University³, University of Amsterdam⁴, Johns Hopkins University⁵, Google⁶, University of Sheffield⁷

01 Jun 2014

TL;DR: The results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translationtask, a task for run-time estimation of machine translation quality, and a metrics task, are presented.

...read moreread less

Abstract: This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

...read moreread less

511 citations

Proceedings Article•DOI•

Sentence level discourse parsing using syntactic and lexical information

[...]

Radu Soricut¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

27 May 2003

TL;DR: Two probabilistic models that can be used to identify elementary discourse units and build sentence-level discourse parse trees are introduced and shown to be sophisticated enough to yield discourse trees at an accuracy level that matches near-human levels of performance.

...read moreread less

Abstract: We introduce two probabilistic models that can be used to identify elementary discourse units and build sentence-level discourse parse trees. The models use syntactic and lexical features. A discourse parsing algorithm that implements these models derives discourse parse trees with an error reduction of 18.8% over a state-of-the-art decision-based discourse parser. A set of empirical evaluations shows that our discourse parsing model is sophisticated enough to yield discourse trees at an accuracy level that matches near-human levels of performance.

...read moreread less

477 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Language Models are Few-Shot Learners

[...]

Tom B. Brown¹, Benjamin Mann, Nick Ryder², Melanie Subbiah, Jared Kaplan³, Prafulla Dhariwal¹, Arvind Neelakantan⁴, Pranav Shyam, Girish Sastry¹, Amanda Askell¹, Sandhini Agarwal¹, Ariel Herbert-Voss¹, Gretchen Krueger¹, Thomas Henighan¹, Rewon Child¹, Aditya Ramesh¹, Daniel M. Ziegler⁵, Jeffrey Wu¹, Clemens Winter, Christopher Hesse¹, Mark Chen¹, Eric Sigler, Mateusz Litwin, Scott Gray¹, Benjamin Chess¹, Jack Clark¹, Christopher Berner, Samuel McCandlish¹, Alec Radford¹, Ilya Sutskever¹, Dario Amodei¹ - Show less +27 more•Institutions (5)

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

...read moreread less

10,132 citations

Journal Article•DOI•

Enriching Word Vectors with Subword Information

[...]

Piotr Bojanowski¹, Edouard Grave¹, Armand Joulin¹, Tomas Mikolov¹•Institutions (1)

Facebook¹

12 Jun 2017-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

...read moreread less

7,537 citations

Posted Content•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[...]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

23 Oct 2019-arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

6,953 citations

Proceedings Article•DOI•

Transformers: State-of-the-Art Natural Language Processing

[...]

Thomas Wolf¹, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Clara Ma, Yacine Jernite², Julien Plu³, Canwen Xu⁴, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander M. Rush⁵ - Show less +12 more•Institutions (5)

Central European University¹, New York University², Institut Eurécom³, Wuhan University⁴, Cornell University⁵

01 Oct 2020

TL;DR: Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.

...read moreread less

Abstract: Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/huggingface/transformers.

...read moreread less

4,798 citations

Proceedings Article•DOI•

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

[...]

Michael Lewis¹, Yinhan Liu¹, Naman Goyal¹, Marjan Ghazvininejad¹, Abdelrahman Mohamed¹, Omer Levy², Veselin Stoyanov¹, Luke Zettlemoyer¹ - Show less +4 more•Institutions (2)

Facebook¹, University of Washington²

01 Jul 2020

TL;DR: BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

...read moreread less

Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and other recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 3.5 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also replicate other pretraining schemes within the BART framework, to understand their effect on end-task performance.

...read moreread less

4,505 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse