Construction of a word similarity dataset and evaluation of word similarity techniques for Vietnamese

doi:10.1109/KSE.2017.8119436

Proceedings ArticleDOI

Construction of a word similarity dataset and evaluation of word similarity techniques for Vietnamese

Bui Van Tan, +2 more

- pp 65-70

Chats0

TLDR

In this article, the authors constructed a benchmark dataset for evaluation of similar techniques to the Vietnamese language, and experiment with some similarity techniques based on WordNet and word embeddings, and propose an extension for Lesk algorithm in order to improve the efficiency of similar measuring with Vietnamese language.

Abstract:

Measuring word similarity is a core issue because it has many applications in natural language processing. Although many studies have been reported and the techniques have been developed for addressing this issue for English, however, the study dealing with the applications, analyses and evaluation word similarity techniques to Vietnamese still has not reported yet. Especially, there is still lack of the benchmark Vietnamese dataset for evaluating these techniques. In this paper, we report three main topics including: firstly, construct a benchmark dataset for evaluation of similar techniques to the Vietnamese language; secondly, experiment with some similarity techniques based on WordNet and word embeddings; and finally, propose an extension for Lesk algorithm in order to improving the efficiency of similar measuring with Vietnamese language.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A survey of semantic relatedness evaluation datasets and procedures

Mohamed Ali Hadj Taieb, +2 more

- 01 Aug 2020 -

Artificial Intelligence Review

TL;DR: This article gives a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsic approaches.

...read moreread less

Journal ArticleDOI

A Neural Network Model for Efficient Antonymy-Synonymy Classification by Exploiting Co-occurrence Contexts and Word-Structure Patterns

Van-Tan Bui, +4 more

- 29 Feb 2020 -

International Journal of Intelligent Eng...

TL;DR: A deep neural network model (DVASNet) is proposed that can utilize not only embedding representations of words but also co-occurrence contexts and specific patterns of Vietnamese word structure and achieved significant improvements in comparison with a number of state-of-the-art methods.

...read moreread less

Proceedings ArticleDOI

Enhancing Performance of Lexical Entailment Recognition for Vietnamese based on Exploiting Lexical Structure Features

Bui Van Tan, +2 more

TL;DR: This study proposes a novel method (VLER) for lexical entailment recognition problem on Vietnamese that first exploits lexical structure information of words as a feature, then combines this feature with vectors representation of words such as a unique feature for recognizing the relation.

...read moreread less

Journal ArticleDOI

Combining Specialized Word Embeddings and Subword Semantic Features for Lexical Entailment Recognition

Van-Tan Bui, +2 more

- 01 Aug 2022 -

Data and Knowledge Engineering

TL;DR: The authors proposed a method called LERC (Lexical Entailment Recognition Recognition Combination) to solve the problem by combining embedding representations and subword semantic features, which outperformed several methods published recently.

...read moreread less

Book ChapterDOI

Antonyms-Synonyms Discrimination Based on Exploiting Rich Vietnamese Features.

Bui Van Tan, +3 more

TL;DR: A framework which exploits exhaustively special Vietnamese features to distinguish between antonyms from synonyms is introduced and a deep neural network model (ViASNet) is proposed that can utilize not only lexico-syntactic information captured from the context of word pairs in a corpus but also its word-level features, and distribution features as well.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Journal ArticleDOI

WordNet: a lexical database for English

George A. Miller

- 01 Nov 1995 -

Communications of The ACM

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

...read moreread less

Journal ArticleDOI

Features of Similarity

Amos Tversky

- 01 Jul 1977 -

Psychological Review

TL;DR: The metric and dimensional assumptions that underlie the geometric representation of similarity are questioned on both theoretical and empirical grounds and a set of qualitative assumptions are shown to imply the contrast model, which expresses the similarity between objects as a linear combination of the measures of their common and distinctive features.

...read moreread less

Journal ArticleDOI

A neural probabilistic language model

Yoshua Bengio, +3 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Proceedings Article

An Information-Theoretic Definition of Similarity

Dekang Lin

TL;DR: This work presents an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model and demonstrates how this definition can be used to measure the similarity in a number of different domains.

...read moreread less