scispace - formally typeset
Search or ask a question
Topic

Lexical similarity

About: Lexical similarity is a research topic. Over the lifetime, 373 publications have been published within this topic receiving 7480 citations.


Papers
More filters
Proceedings Article
02 Jun 2010
TL;DR: A simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results.
Abstract: This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Google produces strong, if less consistent, results, while our results over WordNet are patchy at best.

832 citations

Posted Content
TL;DR: This paper proposes a neural model that reads two sentences to determine entailment using long short-term memory units and extends this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases, and presents a qualitative analysis of attention weights produced by this model.
Abstract: While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.

633 citations

Proceedings Article
27 Apr 2018
TL;DR: A new dataset and model for textual entailment, derived from treating multiple-choice question-answering as an entailment problem, is presented, and it is demonstrated that one can improve accuracy on SCITAIL by 5% using a new neural model that exploits linguistic structure.
Abstract: We present a new dataset and model for textual entailment, derived from treating multiple-choice question-answering as an entailment problem. SciTail is the first entailment set that is created solely from natural sentences that already exist independently ``in the wild'' rather than sentences authored specifically for the entailment task. Different from existing entailment datasets, we create hypotheses from science questions and the corresponding answer candidates, and premises from relevant web sentences retrieved from a large corpus. These sentences are often linguistically challenging. This, combined with the high lexical similarity of premise and hypothesis for both entailed and non-entailed pairs, makes this new entailment task particularly difficult. The resulting challenge is evidenced by state-of-the-art textual entailment systems achieving mediocre performance on SciTail, especially in comparison to a simple majority class baseline. As a step forward, we demonstrate that one can improve accuracy on SciTail by 5% using a new neural model that exploits linguistic structure.

445 citations

Journal ArticleDOI
TL;DR: This paper investigated the effects of word frequency, neighborhood size, and bigram frequency on lexical decision and word-naming performance in five experiments, and found that bigram frequencies had the strongest effect on word naming performance.
Abstract: Five experiments investigated the effects of word frequency, neighborhood size, and bigram frequency on lexical decision and word-naming performance

394 citations

Proceedings Article
01 Jan 2016
TL;DR: This article proposed a neural model that reads two sentences to determine entailment using LSTM units and extended this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases.
Abstract: While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.

237 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Parsing
21.5K papers, 545.4K citations
84% related
Semantics
24.9K papers, 653K citations
83% related
Machine translation
22.1K papers, 574.4K citations
82% related
Language model
17.5K papers, 545K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202118
202027
201926
201823
201726