Home
/
Authors
/
Ira Leviant

Author

Ira Leviant

Technion – Israel Institute of Technology

Bio: Ira Leviant is an academic researcher from Technion – Israel Institute of Technology. The author has contributed to research in topics: Semantic similarity & Lexical semantics. The author has an hindex of 6, co-authored 7 publications receiving 339 citations.

Topics: Semantic similarity, Lexical semantics, Noun, Similarity (psychology), Semantics ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

[...]

Nikola Mrksic¹, Nikola Mrksic², Ivan Vulić², Diarmuid Ó Séaghdha¹, Ira Leviant³, Roi Reichart³, Milica Gasic², Anna Korhonen², Steve Young¹, Steve Young² - Show less +6 more•Institutions (3)

Apple Inc.¹, University of Cambridge², Technion – Israel Institute of Technology³

01 Sep 2017-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. But the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high-to lower-resource ones.

...read moreread less

Abstract: We present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources Attract-Repel facilitates the use of constraints from mono- and cross-lingual resources, yielding semantically specialized cross-lingual vector spaces Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones The effectiveness of our approach is demonstrated with state-of-the-art results on semantic similarity datasets in six languages We next show that Attract-Repel-specialized vectors boost performance in the downstream task of dialogue state tracking (DST) across multiple languages Finally, we show that cross-lingual vector spaces produced by our algorithm facilitate the training of multilingual DST models, which brings further performance improvements

...read moreread less

177 citations

Posted Content•

Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling

[...]

Ira Leviant, Roi Reichart

01 Aug 2015-arXiv: Computation and Language

TL;DR: The analysis reveals that human judgments are strongly impacted by the judgment language, and it is shown that in a large number of setups, multilingual VSM combination results in improved correlations with human judgments, suggesting that multilingualism may partially compensate for the judge language effect on human judgments.

...read moreread less

Abstract: A common evaluation practice in the vector space models (VSMs) literature is to measure the models' ability to predict human judgments about lexical semantic relations between word pairs. Most existing evaluation sets, however, consist of scores collected for English word pairs only, ignoring the potential impact of the judgment language in which word pairs are presented on the human scores. In this paper we translate two prominent evaluation sets, wordsim353 (association) and SimLex999 (similarity), from English to Italian, German and Russian and collect scores for each dataset from crowdworkers fluent in its language. Our analysis reveals that human judgments are strongly impacted by the judgment language. Moreover, we show that the predictions of monolingual VSMs do not necessarily best correlate with human judgments made with the language used for model training, suggesting that models and humans are affected differently by the language they use when making semantic judgments. Finally, we show that in a large number of setups, multilingual VSM combination results in improved correlations with human judgments, suggesting that multilingualism may partially compensate for the judgment language effect on human judgments.

...read moreread less

79 citations

Posted Content•

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

[...]

Nikola Mrkšić, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gasic, Anna Korhonen, Steve Young - Show less +4 more

01 Jun 2017-arXiv: Computation and Language

TL;DR: The evaluation shows that the Attract-Repel method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones.

...read moreread less

Abstract: We present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. Attract-Repel facilitates the use of constraints from mono- and cross-lingual resources, yielding semantically specialised cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones. The effectiveness of our approach is demonstrated with state-of-the-art results on semantic similarity datasets in six languages. We next show that Attract-Repel-specialised vectors boost performance in the downstream task of dialogue state tracking (DST) across multiple languages. Finally, we show that cross-lingual vector spaces produced by our algorithm facilitate the training of multilingual DST models, which brings further performance improvements.

...read moreread less

55 citations

Journal Article•DOI•

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

[...]

Ivan Vulić¹, Simon Baker¹, Edoardo Maria Ponti¹, Ulla Petti¹, Ira Leviant², Kelly Wing¹, Olga Majewska¹, Eden Bar², Matt Malone¹, Thierry Poibeau³, Roi Reichart², Anna Korhonen¹ - Show less +8 more•Institutions (3)

University of Cambridge¹, Technion – Israel Institute of Technology², University of Paris III: Sorbonne Nouvelle³

01 Feb 2021-Computational Linguistics

TL;DR: Multi-SimLex as discussed by the authors is a large-scale lexical resource and evaluation benchmark covering datasets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones.

...read moreread less

Abstract: We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering datasets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets. Due to its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and cross-lingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and cross-lingual representation models, including static and contextualized word embeddings (such as fastText, M-BERT and XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings. We also present a step-by-step dataset creation protocol for creating consistent, Multi-Simlex-style resources for additional languages. We make these contributions -- the public release of Multi-SimLex datasets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning -- available via a website which will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.

...read moreread less

45 citations

Posted Content•

Judgment Language Matters: Multilingual Vector Space Models for Judgment Language Aware Lexical Semantics.

[...]

Ira Leviant, Roi Reichart

01 Aug 2015

TL;DR: This paper shows that the judgment language in which word pairs are presented to human evaluators, all fluent in that language, has a substantial impact on their produced similarity scores and highlights the importance of constructing judgment language aware VSMs.

...read moreread less

Abstract: It is a common practice in the vector space model (VSM) literature to evaluate the models’ ability to predict human similarity scores for a set of word pairs. However, existing evaluation sets, even those used to evaluate multilingual VSMs, consist of English words only. In this paper we show that this practice may have significant undesired effects on VSM evaluation. By translating the popular wordsim353 evaluation set to three languages and training state-of-the-art VSMs on corpora of the corresponding languages as well as on English, we show that: (a) The judgment language in which word pairs are presented to human evaluators, all fluent in that language, has a substantial impact on their produced similarity scores; (b) Given the judgment language of an evaluation set, this judgment language is a good choice for the VSM training corpus language; and (c) Monolingual VSMs can be combined into multilingual VSMs that can predict human similarity scores for a variety of judgment languages better than any monolingual model. Our results highlight the impact of the judgment language on the human generated similarity scores and point on the importance of constructing judgment language aware VSMs.1

...read moreread less

39 citations

Cited by

PDF

Open Access

More filters

DOI•

The World Atlas of Language Structures Online

[...]

Robert Forkel¹•Institutions (1)

Max Planck Society¹

01 Jan 2009

662 citations

Posted Content•

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

[...]

Paweł Budzianowski¹, Tsung-Hsien Wen¹, Bo-Hsiang Tseng¹, Iñigo Casanueva¹, Stefan Ultes¹, Osman Ramadan, Milica Gasic¹ - Show less +3 more•Institutions (1)

University of Cambridge¹

29 Sep 2018-arXiv: Computation and Language

TL;DR: The Multi-Domain Wizard-of-Oz dataset (MultiWOZ) as discussed by the authors is a fully-labeled collection of human-human written conversations spanning over multiple domains and topics.

...read moreread less

Abstract: Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of $10$k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.

...read moreread less

623 citations

Journal Article•DOI•

Analysis Methods in Neural Language Processing: A Survey

[...]

Yonatan Belinkov¹, James Glass²•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

01 Apr 2019-Transactions of the Association for Computational Linguistics

TL;DR: Analysis methods in neural language processing are reviewed, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

...read moreread less

Abstract: The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

...read moreread less

442 citations

Posted Content•

Massively Multilingual Word Embeddings

[...]

Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith - Show less +2 more

05 Feb 2016-arXiv: Computation and Language

TL;DR: New methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space are introduced and a new evaluation method is shown to correlate better than previous ones with two downstream tasks.

...read moreread less

Abstract: We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.

...read moreread less

308 citations

Proceedings Article•DOI•

Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog

[...]

Sebastian Schuster¹, Sonal Gupta², Rushin Shah², Michael Lewis²•Institutions (2)

Max F. Perutz Laboratories¹, Facebook²

01 Jun 2019

TL;DR: This paper presents a new data set of 57k annotated utterances in English, Spanish, Spanish and Thai and uses this data set to evaluate three different cross-lingual transfer methods, finding that given several hundred training examples in the the target language, the latter two methods outperform translating the training data.

...read moreread less

Abstract: One of the first steps in the utterance interpretation pipeline of many task-oriented conversational AI systems is to identify user intents and the corresponding slots. Since data collection for machine learning models for this task is time-consuming, it is desirable to make use of existing data in a high-resource language to train models in low-resource languages. However, development of such models has largely been hindered by the lack of multilingual training data. In this paper, we present a new data set of 57k annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the domains weather, alarm, and reminder. We use this data set to evaluate three different cross-lingual transfer methods: (1) translating the training data, (2) using cross-lingual pre-trained embeddings, and (3) a novel method of using a multilingual machine translation encoder as contextual word representations. We find that given several hundred training examples in the the target language, the latter two methods outperform translating the training data. Further, in very low-resource settings, multilingual contextual word representations give better results than using cross-lingual static embeddings. We also compare the cross-lingual methods to using monolingual resources in the form of contextual ELMo representations and find that given just small amounts of target language data, this method outperforms all cross-lingual methods, which highlights the need for more sophisticated cross-lingual methods.

...read moreread less

238 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

Collapse