Hybrid statistical rule-based classifier for Arabic text mining

Open AccessJournal Article

Hybrid statistical rule-based classifier for Arabic text mining

Abdullah Saeed Ghareb, +3 more

- 20 Jan 2015 -

Journal of theoretical and applied infor...

- Vol. 71, Iss: 2, pp 194-204

Chats0

TLDR

A hybrid categorization method for Arabic text mining that combines the merits of statistical classifier (NB) and rule based classifiers (AC) in one framework and tried to overcome their limitations is proposed.

Abstract:

Text categorization is one of key technology for organizing digital dataset. The Naiv Bayes (NB) is popular categorization method due its efficiency and less time complexity, and the Associative Classification (AC) approach has the capability to produces classifier rival to those learned by traditional categorization techniques. However, the independence assumption for text features and the omission of feature frequencies in NB method violates its performance when the selected features are not highly correlated to text categories. Likewise, the lack of useful discovery and usage of categorization rules is the major problem of AC and its performance is declined with large set of rules. This paper proposed a hybrid categorization method for Arabic text mining that combines the merits of statistical classifier (NB) and rule based classifier (AC) in one framework and tried to overcome their limitations. In the first stage, the useful categorization rules are discovered using AC approach and ensure that associated features are highly correlated to their categories. In the second stage, the NB is utilized at the back end of discovery process and takes the discovered rules, concatenates the associated features for each category and classifies texts based on the statistical information of associated features. The proposed method was evaluated on three Arabic text datasets with multiple categories with and without feature selection methods. The experimental results showed that the hybrid method outperforms AC individually with/without feature selection methods and it is better than NB in few cases only with some feature selection methods when the selected feature subset was small.

Hybrid statistical rule-based classifier for Arabic text mining

Citations

PCA and KPCA integrated Support Vector Machine for multi-fault classification

Multimodal Sentiment Analysis: A Comparison Study

Transferring Informal Text in Arabic as Low Resource Languages: State-of-the-Art and Future Research Directions

References

Term Weighting Approaches in Automatic Text Retrieval

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

Combining Pattern Classifiers: Methods and Algorithms

A comparison of event models for naive bayes text classification

An extensive empirical study of feature selection metrics for text classification

Related Papers (5)

Enhanced Filter Feature Selection Methods for Arabic Text Categorization

Class-dependent feature selection algorithm for text categorization

An efficient feature selection using multi-criteria in text categorization

A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm

Different Type of Feature Selection for Text Classification