Wikipedia-based semantic interpretation for natural language processing

doi:10.1613/JAIR.2669

Open AccessJournal ArticleDOI

Wikipedia-based semantic interpretation for natural language processing

Evgeniy Gabrilovich, +1 more

- 01 Jan 2009 -

Journal of Artificial Intelligence Resea...

- Vol. 34, Iss: 1, pp 443-498

Chats0

TLDR

This work proposes a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts, which represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence.

Abstract:

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Roberto Navigli, +1 more

- 01 Dec 2012 -

Artificial Intelligence

TL;DR: An automatic approach to the construction of BabelNet, a very large, wide-coverage multilingual semantic network, key to this approach is the integration of lexicographic and encyclopedic knowledge from WordNet and Wikipedia.

...read moreread less

Proceedings ArticleDOI

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

Paolo Ferragina, +1 more

TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.

...read moreread less

Journal ArticleDOI

Fast and Accurate Annotation of Short Texts with Wikipedia Pages

Paolo Ferragina, +1 more

- 01 Jan 2012 -

IEEE Software

TL;DR: Tagme as mentioned in this paper is a cross-referencing system for short text fragments and Wikipedia pages that can accurately manage short textual fragments (such as snippets of search engine results, tweets, news, or blogs) on the fly.

...read moreread less

Proceedings ArticleDOI

Large-scale learning of word relatedness with constraints

Guy Halawi, +3 more

TL;DR: A large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process, and learns for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears.

...read moreread less

Proceedings Article

Analogical Inference for Multi-relational Embeddings

Hanxiao Liu, +2 more

TL;DR: This paper proposed a novel framework for optimizing the latent representations with respect to the \textit{analogical} properties of the embedded entities and relations by formulating the learning objective in a differentiable fashion.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum

- 01 Sep 2000 -

Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Book

Numerical Recipes in C: The Art of Scientific Computing

William H. Press, +3 more

TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.

...read moreread less

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less