Short Text Classification Using Wikipedia Concept Based Document Representation

doi:10.1109/ITA.2013.114

Proceedings ArticleDOI

Short Text Classification Using Wikipedia Concept Based Document Representation

Xiang Wang, +3 more

- pp 471-474

Chats0

TLDR

Experimental evaluation on real Google search snippets shows that this approach outperforms the traditional BOW method and gives good performance, and can be easily implemented with low cost.

Abstract:

Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional In this paper, we represent short text with Wikipedia concepts for classification Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance Although it's not better than the state-of-the-art classifier (see eg Phan et al WWW '08), our method can be easily implemented with low cost

Citations

PDF

Open Access

More filters

Journal ArticleDOI

HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

Tianchi Yang, +5 more

- 06 May 2021 -

ACM Transactions on Information Systems

TL;DR: A novel heterogeneous graph neural network-based method for semi-supervised short text classification, leveraging full advantage of limited labeled data and large unlabeled data through information propagation along the graph.

...read moreread less

Proceedings ArticleDOI

Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

Hu Linmei, +4 more

TL;DR: A novel heterogeneous graph neural network based method for semi-supervised short text classification, leveraging full advantage of few labeled data and large unlabeled data through information propagation along the graph is proposed.

...read moreread less

Journal ArticleDOI

Microblog semantic context retrieval system based on linked open data and graph-based theory

Fahd Kalloubi, +2 more

- 01 Jul 2016 -

Expert Systems With Applications

TL;DR: A graph-of-concepts method that considers the relationships among concepts that match named entities in short text and their related concepts and contextualizes each concept in the graph by leveraging the linked nature of DBpedia as a Linked Open Data knowledge base and graph-based centrality theory.

...read moreread less

Journal ArticleDOI

Feature engineering for MEDLINE citation categorization with MeSH.

Antonio Jimeno Yepes, +5 more

- 08 Apr 2015 -

BMC Bioinformatics

TL;DR: It is concluded that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation.

...read moreread less

Proceedings ArticleDOI

Document Enrichment using DBPedia Ontology for Short Text Classification

Jernej Flisar, +1 more

TL;DR: This work proposes a new approach that uses DBpedia Spotlight annotation tools, to identify relevant entities in text and enrich short text documents with concepts derived from those entities, represented in DBpedia ontology.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011 -

ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Journal ArticleDOI

Machine learning in automated text categorization

Fabrizio Sebastiani

- 01 Mar 2002 -

ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Proceedings Article

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

Evgeniy Gabrilovich, +1 more

TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.

...read moreread less

Proceedings ArticleDOI

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Xuan-Hieu Phan, +2 more

TL;DR: A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.

...read moreread less

Proceedings ArticleDOI

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

Paolo Ferragina, +1 more

TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.

...read moreread less

Short Text Classification Using Wikipedia Concept Based Document Representation

Citations

HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

Microblog semantic context retrieval system based on linked open data and graph-based theory

Feature engineering for MEDLINE citation categorization with MeSH.

Document Enrichment using DBPedia Ontology for Short Text Classification

References

LIBSVM: A library for support vector machines

Machine learning in automated text categorization

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

Related Papers (5)

Research on Chinese Short Text Classification Based on Wikipedia

Wikipedia Articles Representation with Matrix’u

A survey on Short text analysis in Web

Document Topic Extraction Based on Wikipedia Category

Exploiting Turkish Wikipedia as a semantic resource for text classification