Proceedings ArticleDOI
Short Text Classification Using Wikipedia Concept Based Document Representation
Xiang Wang,Ruhua Chen,Yan Jia,Bin Zhou +3 more
- pp 471-474
Reads0
Chats0
TLDR
Experimental evaluation on real Google search snippets shows that this approach outperforms the traditional BOW method and gives good performance, and can be easily implemented with low cost.Abstract:
Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional In this paper, we represent short text with Wikipedia concepts for classification Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance Although it's not better than the state-of-the-art classifier (see eg Phan et al WWW '08), our method can be easily implemented with low costread more
Citations
More filters
Journal ArticleDOI
HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification
TL;DR: A novel heterogeneous graph neural network-based method for semi-supervised short text classification, leveraging full advantage of limited labeled data and large unlabeled data through information propagation along the graph.
Proceedings ArticleDOI
Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification
TL;DR: A novel heterogeneous graph neural network based method for semi-supervised short text classification, leveraging full advantage of few labeled data and large unlabeled data through information propagation along the graph is proposed.
Journal ArticleDOI
Microblog semantic context retrieval system based on linked open data and graph-based theory
TL;DR: A graph-of-concepts method that considers the relationships among concepts that match named entities in short text and their related concepts and contextualizes each concept in the graph by leveraging the linked nature of DBpedia as a Linked Open Data knowledge base and graph-based centrality theory.
Journal ArticleDOI
Feature engineering for MEDLINE citation categorization with MeSH.
Antonio Jimeno Yepes,Antonio Jimeno Yepes,Laura Plaza,Jorge Carrillo-de-Albornoz,James G. Mork,Alan R. Aronson +5 more
TL;DR: It is concluded that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation.
Proceedings ArticleDOI
Document Enrichment using DBPedia Ontology for Short Text Classification
Jernej Flisar,Vili Podgorelec +1 more
TL;DR: This work proposes a new approach that uses DBpedia Spotlight annotation tools, to identify relevant entities in text and enrich short text documents with concepts derived from those entities, represented in DBpedia ontology.
References
More filters
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Proceedings Article
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
Proceedings ArticleDOI
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
TL;DR: A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.
Proceedings ArticleDOI
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)
Paolo Ferragina,Ugo Scaiella +1 more
TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.