scispace - formally typeset
Open AccessJournal ArticleDOI

Wikipedia-based semantic interpretation for natural language processing

Reads0
Chats0
TLDR
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.
Abstract
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.
Proceedings ArticleDOI

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.
Journal ArticleDOI

Fast and Accurate Annotation of Short Texts with Wikipedia Pages

TL;DR: Tagme as mentioned in this paper is a cross-referencing system for short text fragments and Wikipedia pages that can accurately manage short textual fragments (such as snippets of search engine results, tweets, news, or blogs) on the fly.
Proceedings ArticleDOI

Large-scale learning of word relatedness with constraints

TL;DR: A large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process, and learns for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears.
Proceedings Article

Analogical Inference for Multi-relational Embeddings

TL;DR: This paper proposed a novel framework for optimizing the latent representations with respect to the \textit{analogical} properties of the embedded entities and relations by formulating the learning objective in a differentiable fashion.
References
More filters
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Book

Numerical Recipes in C: The Art of Scientific Computing

TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Related Papers (5)