Home
/
Authors
/
Ivan Vulić

Author

Ivan Vulić

Other affiliations: Google, University of Copenhagen Faculty of Science, University of Mannheim ...read more

Bio: Ivan Vulić is an academic researcher from University of Cambridge. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 38, co-authored 143 publications receiving 4515 citations. Previous affiliations of Ivan Vulić include Google & University of Copenhagen Faculty of Science.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings

[...]

Ivan Vulić¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

09 Aug 2015

TL;DR: A novel word representation learning model called Bilingual Word Embeddings Skip-Gram (BWESG) is presented which is the first model able to learn bilingual word embeddings solely on the basis of document-aligned comparable data.

...read moreread less

Abstract: We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings (WE) from comparable data. To this end, we make several important contributions: (1) We present a novel word representation learning model called Bilingual Word Embeddings Skip-Gram (BWESG) which is the first model able to learn bilingual word embeddings solely on the basis of document-aligned comparable data; (2) We demonstrate a simple yet effective approach to building document embeddings from single word embeddings by utilizing models from compositional distributional semantics. BWESG induces a shared cross-lingual embedding vector space in which both words, queries, and documents may be presented as dense real-valued vectors; (3) We build novel ad-hoc MoIR and CLIR models which rely on the induced word and document embeddings and the shared cross-lingual embedding space; (4) Experiments for English and Dutch MoIR, as well as for English-to-Dutch and Dutch-to-English CLIR using benchmarking CLEF 2001-2003 collections and queries demonstrate the utility of our WE-based MoIR and CLIR models. The best results on the CLEF collections are obtained by the combination of the WE-based approach and a unigram language model. We also report on significant improvements in ad-hoc IR tasks of our WE-based framework over the state-of-the-art framework for learning text representations from comparable data based on latent Dirichlet allocation (LDA).

...read moreread less

303 citations

Journal Article•DOI•

A Survey Of Cross-lingual Word Embedding Models

[...]

Sebastian Ruder, Ivan Vulić, Anders Søgaard

01 May 2019-Journal of Artificial Intelligence Research

TL;DR: Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of crosslingual transfer when developing natural language processin... as discussed by the authors.

...read moreread less

Abstract: Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processin...

...read moreread less

288 citations

Proceedings Article•DOI•

On the Limitations of Unsupervised Bilingual Dictionary Induction

[...]

Anders Søgaard¹, Sebastian Ruder², Ivan Vulić³•Institutions (3)

University of Copenhagen¹, National University of Ireland, Galway², University of Cambridge³

09 May 2018

TL;DR: This article showed that weak supervision from identical words enables more robust dictionary induction and established a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

...read moreread less

Abstract: Unsupervised machine translation - i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora - seems impossible, but nevertheless, Lample et al. (2017) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised cross-lingual word embedding technique for bilingual dictionary induction (Conneau et al., 2017), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

...read moreread less

256 citations

Proceedings Article•DOI•

Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems

[...]

Paweł Budzianowski¹, Ivan Vulić²•Institutions (2)

University of Copenhagen¹, University of Cambridge²

12 Jul 2019

TL;DR: This paper proposes a task-oriented dialogue model that operates solely on text input: it effectively bypasses explicit policy and language generation modules and holds promise to mitigate the data scarcity problem, and to support the construction of more engaging and more eloquent task- oriented conversational agents.

...read moreread less

Abstract: Data scarcity is a long-standing and crucial challenge that hinders quick development of task-oriented dialogue systems across multiple domains: task-oriented dialogue models are expected to learn grammar, syntax, dialogue reasoning, decision making, and language generation from absurdly small amounts of task-specific data. In this paper, we demonstrate that recent progress in language modeling pre-training and transfer learning shows promise to overcome this problem. We propose a task-oriented dialogue model that operates solely on text input: it effectively bypasses explicit policy and language generation modules. Building on top of the TransferTransfo framework (Wolf et al., 2019) and generative model pre-training (Radford et al., 2019), we validate the approach on complex multi-domain task-oriented dialogues from the MultiWOZ dataset. Our automatic and human evaluations show that the proposed model is on par with a strong task-specific neural baseline. In the long run, our approach holds promise to mitigate the data scarcity problem, and to support the construction of more engaging and more eloquent task-oriented conversational agents.

...read moreread less

250 citations

Posted Content•

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

[...]

Jonas Pfeiffer¹, Ivan Vulić², Iryna Gurevych¹, Sebastian Ruder³•Institutions (3)

Technische Universität Darmstadt¹, University of Mannheim², Google³

30 Apr 2020-arXiv: Computation and Language

TL;DR: MAD-X is proposed, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations and introduces a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language.

...read moreread less

Abstract: The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pre-training. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning, and achieves competitive results on question answering. Our code and adapters are available at this http URL

...read moreread less

228 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Word translation without parallel data

[...]

Guillaume Lample¹, Alexis Conneau¹, Marc'Aurelio Ranzato¹, Ludovic Denoyer², Hervé Jégou¹ - Show less +1 more•Institutions (2)

Facebook¹, University of Paris²

15 Feb 2018

TL;DR: It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.

...read moreread less

1,068 citations

Journal Article•DOI•

Pre-trained Models for Natural Language Processing: A Survey

[...]

Xipeng Qiu¹, Tianxiang Sun¹, Yige Xu¹, Yunfan Shao¹, Ning Dai¹, Xuanjing Huang¹ - Show less +2 more•Institutions (1)

Fudan University¹

18 Mar 2020-Science China-technological Sciences

TL;DR: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era as mentioned in this paper, and a comprehensive review of PTMs for NLP can be found in this survey.

...read moreread less

Abstract: Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next, we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.

...read moreread less

755 citations

Proceedings Article•DOI•

Improving Vector Space Word Representations Using Multilingual Correlation

[...]

Manaal Faruqui¹, Chris Dyer¹•Institutions (1)

Carnegie Mellon University¹

01 Apr 2014

TL;DR: This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually.

...read moreread less

Abstract: The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually. We evaluate the resulting word representations on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representations than monolingual techniques.

...read moreread less

687 citations

DOI•

The World Atlas of Language Structures Online

[...]

Robert Forkel¹•Institutions (1)

Max Planck Society¹

01 Jan 2009

662 citations

Posted Content•

Efficient Transformers: A Survey

[...]

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

14 Sep 2020-arXiv: Learning

TL;DR: This paper characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.

...read moreread less

Abstract: Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains.

...read moreread less

627 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse