Wikipedia-based semantic interpretation for natural language processing
Reads0
Chats0
TLDR
This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.Abstract:
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.read more
Citations
More filters
Journal ArticleDOI
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.
Proceedings ArticleDOI
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)
Paolo Ferragina,Ugo Scaiella +1 more
TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.
Journal ArticleDOI
Fast and Accurate Annotation of Short Texts with Wikipedia Pages
Paolo Ferragina,Ugo Scaiella +1 more
TL;DR: Tagme as mentioned in this paper is a cross-referencing system for short text fragments and Wikipedia pages that can accurately manage short textual fragments (such as snippets of search engine results, tweets, news, or blogs) on the fly.
Proceedings ArticleDOI
Large-scale learning of word relatedness with constraints
TL;DR: A large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process, and learns for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears.
Proceedings Article
Analogical Inference for Multi-relational Embeddings
TL;DR: This paper proposed a novel framework for optimizing the latent representations with respect to the \textit{analogical} properties of the embedded entities and relations by formulating the learning objective in a differentiable fashion.
References
More filters
Journal ArticleDOI
WordNet : an electronic lexical database
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Book
Numerical Recipes in C: The Art of Scientific Computing
TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI
Numerical Recipes in C: The Art of Scientific Computing
Mary C. Seiler,Fritz A. Seiler +1 more