scispace - formally typeset
Open AccessProceedings ArticleDOI

Hierarchical text classification and evaluation

Reads0
Chats0
TLDR
In this article, a hierarchical classification method that can classify documents to both leaf and internal categories has been proposed, which considers the degree of misclassification in measuring the classification performance.
Abstract
Hierarchical classification refers to the assignment of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose a top-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar to or not far from correct ones in the category tree. We therefore propose category-similarity measures and distance-based measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance of our proposed hierarchical classification method. The results showed that our method performs well for a Reuters text collection when enough training documents are given and the new measures have indeed considered the contributions of misclassified documents.

read more

Citations
More filters
Book

Ontology Matching

TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Book

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Book ChapterDOI

Mining Multi-label Data

TL;DR: A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L, however, training examples in several application domains are often associated withA set of labels Y ⊆ L.
Journal ArticleDOI

A survey of hierarchical classification across different application domains

TL;DR: This survey defines what is the task of hierarchical classification and discusses why some related tasks should not be considered hierarchical classification, and presents a new perspective about some existing hierarchical classification approaches and proposes a new unifying framework to classify the existing approaches.
Journal ArticleDOI

Text Classification Algorithms: A Survey

TL;DR: An overview of text classification algorithms is discussed, which covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods.
References
More filters
Book ChapterDOI

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Journal ArticleDOI

An Evaluation of Statistical Approaches to Text Categorization

TL;DR: Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.
Proceedings ArticleDOI

Inductive learning algorithms and representations for text categorization

TL;DR: A comparison of the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy is compared.
Proceedings Article

Hierarchically Classifying Documents Using Very Few Words

TL;DR: This work proposes an approach that utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree, which can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand.
Proceedings ArticleDOI

Hierarchical classification of Web content

TL;DR: This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of web content using support vector machine (SVM) classifiers, which have been shown to be efficient and effective for classification, but not previously explored in the context of hierarchical classification.