scispace - formally typeset
Proceedings ArticleDOI

Construction of a word similarity dataset and evaluation of word similarity techniques for Vietnamese

Reads0
Chats0
TLDR
In this article, the authors constructed a benchmark dataset for evaluation of similar techniques to the Vietnamese language, and experiment with some similarity techniques based on WordNet and word embeddings, and propose an extension for Lesk algorithm in order to improve the efficiency of similar measuring with Vietnamese language.
Abstract
Measuring word similarity is a core issue because it has many applications in natural language processing. Although many studies have been reported and the techniques have been developed for addressing this issue for English, however, the study dealing with the applications, analyses and evaluation word similarity techniques to Vietnamese still has not reported yet. Especially, there is still lack of the benchmark Vietnamese dataset for evaluating these techniques. In this paper, we report three main topics including: firstly, construct a benchmark dataset for evaluation of similar techniques to the Vietnamese language; secondly, experiment with some similarity techniques based on WordNet and word embeddings; and finally, propose an extension for Lesk algorithm in order to improving the efficiency of similar measuring with Vietnamese language.

read more

Citations
More filters
Journal ArticleDOI

A survey of semantic relatedness evaluation datasets and procedures

TL;DR: This article gives a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsic approaches.
Journal ArticleDOI

A Neural Network Model for Efficient Antonymy-Synonymy Classification by Exploiting Co-occurrence Contexts and Word-Structure Patterns

TL;DR: A deep neural network model (DVASNet) is proposed that can utilize not only embedding representations of words but also co-occurrence contexts and specific patterns of Vietnamese word structure and achieved significant improvements in comparison with a number of state-of-the-art methods.
Proceedings ArticleDOI

Enhancing Performance of Lexical Entailment Recognition for Vietnamese based on Exploiting Lexical Structure Features

TL;DR: This study proposes a novel method (VLER) for lexical entailment recognition problem on Vietnamese that first exploits lexical structure information of words as a feature, then combines this feature with vectors representation of words such as a unique feature for recognizing the relation.
Journal ArticleDOI

Combining Specialized Word Embeddings and Subword Semantic Features for Lexical Entailment Recognition

TL;DR: The authors proposed a method called LERC (Lexical Entailment Recognition Recognition Combination) to solve the problem by combining embedding representations and subword semantic features, which outperformed several methods published recently.
Book ChapterDOI

Antonyms-Synonyms Discrimination Based on Exploiting Rich Vietnamese Features.

TL;DR: A framework which exploits exhaustively special Vietnamese features to distinguish between antonyms from synonyms is introduced and a deep neural network model (ViASNet) is proposed that can utilize not only lexico-syntactic information captured from the context of word pairs in a corpus but also its word-level features, and distribution features as well.
References
More filters
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Journal ArticleDOI

Features of Similarity

Amos Tversky
- 01 Jul 1977 - 
TL;DR: The metric and dimensional assumptions that underlie the geometric representation of similarity are questioned on both theoretical and empirical grounds and a set of qualitative assumptions are shown to imply the contrast model, which expresses the similarity between objects as a linear combination of the measures of their common and distinctive features.
Journal ArticleDOI

A neural probabilistic language model

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Proceedings Article

An Information-Theoretic Definition of Similarity

Dekang Lin
TL;DR: This work presents an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model and demonstrates how this definition can be used to measure the similarity in a number of different domains.
Related Papers (5)