Home
/
Authors
/
Aitor Soroa

Author

Aitor Soroa

Other affiliations: National University of Distance Education, Polytechnic University of Catalonia

Bio: Aitor Soroa is an academic researcher from University of the Basque Country. The author has contributed to research in topics: WordNet & Computer science. The author has an hindex of 24, co-authored 96 publications receiving 3551 citations. Previous affiliations of Aitor Soroa include National University of Distance Education & Polytechnic University of Catalonia.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2003
2002
2000
1999
1996

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

[...]

Eneko Agirre¹, Enrique Alfonseca², Keith Hall², Jana Kravalova³, Marius Pasca², Aitor Soroa¹ - Show less +2 more•Institutions (3)

University of the Basque Country¹, Google², Charles University in Prague³

31 May 2009

TL;DR: This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.

...read moreread less

Abstract: This paper presents and compares WordNet-based and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.

...read moreread less

936 citations

Proceedings Article•DOI•

Personalizing PageRank for Word Sense Disambiguation

[...]

Eneko Agirre¹, Aitor Soroa¹•Institutions (1)

University of the Basque Country¹

30 Mar 2009

TL;DR: This paper proposes a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation, performing better than previous approaches in English all-words datasets.

...read moreread less

Abstract: In this paper we propose a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation. Our algorithm uses the full graph of the LKB efficiently, performing better than previous approaches in English all-words datasets. We also show that the algorithm can be easily ported to other languages with good results, with the only requirement of having a wordnet. In addition, we make an analysis of the performance of the algorithm, showing that it is efficient and that it could be tuned to be faster.

...read moreread less

608 citations

Journal Article•DOI•

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

[...]

Teven Le Scao, Angela Fan, Christopher Akiki, Elizabeth-Jane Pavlick +383 more

09 Nov 2022-arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

...read moreread less

407 citations

Journal Article•DOI•

Random walks for knowledge-based word sense disambiguation

[...]

Eneko Agirre¹, Oier Lopez de Lacalle², Aitor Soroa¹•Institutions (2)

University of the Basque Country¹, University of Edinburgh²

01 Mar 2014-Computational Linguistics

TL;DR: This article presents a WSD algorithm based on random walks over large Lexical Knowledge Bases (LKB) that performs better than other graph-based methods when run on a graph built from WordNet and eXtended WordNet.

...read moreread less

Abstract: Word Sense Disambiguation WSD systems automatically choose the intended meaning of a word in context. In this article we present a WSD algorithm based on random walks over large Lexical Knowledge Bases LKB. We show that our algorithm performs better than other graph-based methods when run on a graph built from WordNet and eXtended WordNet. Our algorithm and LKB combination compares favorably to other knowledge-based approaches in the literature that use similar knowledge on a variety of English data sets and a data set on Spanish. We include a detailed analysis of the factors that affect the algorithm. The algorithm and the LKBs used are publicly available, and the results easily reproducible.

...read moreread less

263 citations

Proceedings Article•DOI•

SemEval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems

[...]

Eneko Agirre¹, Aitor Soroa¹•Institutions (1)

University of the Basque Country¹

23 Jun 2007

TL;DR: This work reused the SemEval-2007 English lexical sample subtask of task 17, and set up both clustering-style unsupervised evaluation and a supervised evaluation (using the part of the dataset for mapping) to allow for comparison across sense-induction and discrimination systems.

...read moreread less

Abstract: The goal of this task is to allow for comparison across sense-induction and discrimination systems, and also to compare these systems to other supervised and knowledge-based systems. In total there were 6 participating systems. We reused the SemEval-2007 English lexical sample subtask of task 17, and set up both clustering-style unsupervised evaluation (using OntoNotes senses as gold-standard) and a supervised evaluation (using the part of the dataset for mapping). We provide a comparison to the results of the systems participating in the lexical sample subtask of task 17.

...read moreread less

183 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

Journal Article•DOI•

Word sense disambiguation: A survey

[...]

Roberto Navigli¹•Institutions (1)

Sapienza University of Rome¹

23 Feb 2009-ACM Computing Surveys

TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.

...read moreread less

Abstract: Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.

...read moreread less

2,178 citations

Journal Article•DOI•

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

[...]

Roberto Navigli¹, Simone Paolo Ponzetto¹•Institutions (1)

Sapienza University of Rome¹

01 Dec 2012-Artificial Intelligence

TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.

...read moreread less

1,522 citations

Proceedings Article•DOI•

Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

[...]

Marco Baroni¹, Georgiana Dinu¹, Germán Kruszewski¹•Institutions (1)

University of Trento¹

01 Jun 2014

TL;DR: An extensive evaluation of context-predicting models with classic, count-vector-based distributional semantic approaches, on a wide range of lexical semantics tasks and across many parameter settings shows that the buzz around these models is fully justified.

...read moreread less

Abstract: Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a wide range of lexical semantics tasks and across many parameter settings. The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts.

...read moreread less

1,405 citations

Journal Article•DOI•

Improving Distributional Similarity with Lessons Learned from Word Embeddings

[...]

Omer Levy¹, Yoav Goldberg¹, Ido Dagan¹•Institutions (1)

Bar-Ilan University¹

04 May 2015-Transactions of the Association for Computational Linguistics

TL;DR: It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.

...read moreread less

Abstract: Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks. We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.

...read moreread less

1,374 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse