scispace - formally typeset
Proceedings ArticleDOI

Short Text Classification Using Wikipedia Concept Based Document Representation

Reads0
Chats0
TLDR
Experimental evaluation on real Google search snippets shows that this approach outperforms the traditional BOW method and gives good performance, and can be easily implemented with low cost.
Abstract
Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional In this paper, we represent short text with Wikipedia concepts for classification Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance Although it's not better than the state-of-the-art classifier (see eg Phan et al WWW '08), our method can be easily implemented with low cost

read more

Citations
More filters
Journal ArticleDOI

HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

TL;DR: A novel heterogeneous graph neural network-based method for semi-supervised short text classification, leveraging full advantage of limited labeled data and large unlabeled data through information propagation along the graph.
Proceedings ArticleDOI

Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

TL;DR: A novel heterogeneous graph neural network based method for semi-supervised short text classification, leveraging full advantage of few labeled data and large unlabeled data through information propagation along the graph is proposed.
Journal ArticleDOI

Microblog semantic context retrieval system based on linked open data and graph-based theory

TL;DR: A graph-of-concepts method that considers the relationships among concepts that match named entities in short text and their related concepts and contextualizes each concept in the graph by leveraging the linked nature of DBpedia as a Linked Open Data knowledge base and graph-based centrality theory.
Journal ArticleDOI

Feature engineering for MEDLINE citation categorization with MeSH.

TL;DR: It is concluded that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation.
Proceedings ArticleDOI

Document Enrichment using DBPedia Ontology for Short Text Classification

TL;DR: This work proposes a new approach that uses DBpedia Spotlight annotation tools, to identify relevant entities in text and enrich short text documents with concepts derived from those entities, represented in DBpedia ontology.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI

Machine learning in automated text categorization

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Proceedings Article

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
Proceedings ArticleDOI

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

TL;DR: A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.
Proceedings ArticleDOI

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.