Open AccessPosted Content
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
Jay J. Jiang,David W. Conrath +1 more
TLDR
This paper proposed a new approach for measuring semantic similarity/distance between words and concepts, which combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data.Abstract:
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task.read more
Citations
More filters
Posted Content
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
TL;DR: This article presented an unsupervised learning algorithm for recognizing synonyms based on statistical data acquired by querying a web search engine, called Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words.
Proceedings Article
Automatic Evaluation of Topic Coherence
TL;DR: A simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results.
Proceedings ArticleDOI
Evaluation of Output Embeddings for Fine-Grained Image Classification
TL;DR: In this article, given image and class embeddings, they learn a compatibility function such that matching embedding are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score.
Proceedings ArticleDOI
Evaluation of output embeddings for fine-grained image classification
TL;DR: This project shows that compelling classification performance can be achieved on fine-grained categories even without labeled training data, and establishes a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets.
Journal ArticleDOI
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis
TL;DR: DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective and to verify disease relevance in a biological experiment and identify unexpected disease associations.
References
More filters
Journal ArticleDOI
Word association norms, mutual information, and lexicography
Kenneth Church,Patrick Hanks +1 more
TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Posted Content
Using Information Content to Evaluate Semantic Similarity in a Taxonomy
TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).
Journal ArticleDOI
Development and application of a metric on semantic nets
TL;DR: Experiments in which distance is applied to pairs of concepts and to sets of concepts in a hierarchical knowledge base show the power of hierarchical relations in representing information about the conceptual distance between concepts.
Proceedings ArticleDOI
A semantic concordance
TL;DR: A semantic concordance is a textual corpus and a lexicon so combined that every substantive word in the text is linked to its appropriate sense in the lexicon.
Proceedings ArticleDOI
Noun classification from predicate-argument structures
TL;DR: The resulting quasi-semantic classification of nouns demonstrates the plausibility of the distributional hypothesis, and has potential application to a variety of tasks, including automatic indexing, resolving nominal compounds, and determining the scope of modification.