Hierarchical text classification and evaluation
Aixin Sun,Ee-Peng Lim +1 more
- pp 521-528
Reads0
Chats0
TLDR
In this article, a hierarchical classification method that can classify documents to both leaf and internal categories has been proposed, which considers the degree of misclassification in measuring the classification performance.Abstract:
Hierarchical classification refers to the assignment of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose a top-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar to or not far from correct ones in the category tree. We therefore propose category-similarity measures and distance-based measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance of our proposed hierarchical classification method. The results showed that our method performs well for a Reuters text collection when enough training documents are given and the new measures have indeed considered the contributions of misclassified documents.read more
Citations
More filters
Book
Ontology Matching
Jérôme Euzenat,Pavel Shvaiko +1 more
TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Book
The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Ronen Feldman,James Sanger +1 more
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Book ChapterDOI
Mining Multi-label Data
TL;DR: A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L, however, training examples in several application domains are often associated withA set of labels Y ⊆ L.
Journal ArticleDOI
A survey of hierarchical classification across different application domains
Carlos N. Silla,Alex A. Freitas +1 more
TL;DR: This survey defines what is the task of hierarchical classification and discusses why some related tasks should not be considered hierarchical classification, and presents a new perspective about some existing hierarchical classification approaches and proposes a new unifying framework to classify the existing approaches.
Journal ArticleDOI
Text Classification Algorithms: A Survey
Kamran Kowsari,Kiana Jafari Meimandi,Mojtaba Heidarysafa,Sanjana Mendu,Laura E. Barnes,Donald E. Brown +5 more
TL;DR: An overview of text classification algorithms is discussed, which covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods.
References
More filters
Book ChapterDOI
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Journal ArticleDOI
An Evaluation of Statistical Approaches to Text Categorization
TL;DR: Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.
Proceedings ArticleDOI
Inductive learning algorithms and representations for text categorization
TL;DR: A comparison of the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy is compared.
Proceedings Article
Hierarchically Classifying Documents Using Very Few Words
Daphne Koller,Mehran Sahami +1 more
TL;DR: This work proposes an approach that utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree, which can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand.
Proceedings ArticleDOI
Hierarchical classification of Web content
Susan T. Dumais,Hao Chen +1 more
TL;DR: This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content using support vector machine (SVM) classifiers, which have been shown to be efficient and effective for classification, but not previously explored in the context of hierarchical classification.