scispace - formally typeset
Open Access

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

TLDR
This paper presents a new approach for measuring semantic similarity/distance between words and concepts that combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data.
Abstract
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Sentiment Analysis and Opinion Mining

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Book

Speech and Language Processing

Dan Jurafsky, +1 more
TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artificial agent such as HAL would be avail-able as discussed by the authors.
Journal ArticleDOI

From frequency to meaning: vector space models of semantics

TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
Book

Ontology Matching

TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Proceedings Article

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
References
More filters
Journal ArticleDOI

Introduction to WordNet: An On-line Lexical Database

TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.
Journal ArticleDOI

Word association norms, mutual information, and lexicography

TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Posted Content

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).
Proceedings Article

Using information content to evaluate semantic similarity in a taxonomy

TL;DR: This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.
Journal ArticleDOI

Development and application of a metric on semantic nets

TL;DR: Experiments in which distance is applied to pairs of concepts and to sets of concepts in a hierarchical knowledge base show the power of hierarchical relations in representing information about the conceptual distance between concepts.