scispace - formally typeset
Open AccessJournal Article

Hybrid statistical rule-based classifier for Arabic text mining

Reads0
Chats0
TLDR
A hybrid categorization method for Arabic text mining that combines the merits of statistical classifier (NB) and rule based classifiers (AC) in one framework and tried to overcome their limitations is proposed.
Abstract
Text categorization is one of key technology for organizing digital dataset. The Naiv Bayes (NB) is popular categorization method due its efficiency and less time complexity, and the Associative Classification (AC) approach has the capability to produces classifier rival to those learned by traditional categorization techniques. However, the independence assumption for text features and the omission of feature frequencies in NB method violates its performance when the selected features are not highly correlated to text categories. Likewise, the lack of useful discovery and usage of categorization rules is the major problem of AC and its performance is declined with large set of rules. This paper proposed a hybrid categorization method for Arabic text mining that combines the merits of statistical classifier (NB) and rule based classifier (AC) in one framework and tried to overcome their limitations. In the first stage, the useful categorization rules are discovered using AC approach and ensure that associated features are highly correlated to their categories. In the second stage, the NB is utilized at the back end of discovery process and takes the discovered rules, concatenates the associated features for each category and classifies texts based on the statistical information of associated features. The proposed method was evaluated on three Arabic text datasets with multiple categories with and without feature selection methods. The experimental results showed that the hybrid method outperforms AC individually with/without feature selection methods and it is better than NB in few cases only with some feature selection methods when the selected feature subset was small.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

PCA and KPCA integrated Support Vector Machine for multi-fault classification

TL;DR: Simulation results indicate that compared with the original PCA-SVM, KPCA- SVM generates a higher classification rate for the underlying process at the cost of larger computation loads.
Journal ArticleDOI

Multimodal Sentiment Analysis: A Comparison Study

TL;DR: This paper focuses on multimodal sentiment analysis as text, audio and video, by giving a complete image of it and related dataset available and providing brief details for each type, in addition to that present the recent trend of researches in the multimodAL sentiment analysis and its related fields will be explored.
Book ChapterDOI

Transferring Informal Text in Arabic as Low Resource Languages: State-of-the-Art and Future Research Directions

TL;DR: The Arabic and Arabic dialects are focused on as a low resource language in the era of transferring non-stander text using normalization and translation approach because of lack of enough parallel dataset.
References
More filters
Journal ArticleDOI

Term Weighting Approaches in Automatic Text Retrieval

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Book ChapterDOI

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Journal ArticleDOI

Combining Pattern Classifiers: Methods and Algorithms

Subhash C Bagui
- 01 Nov 2005 - 
TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.
Proceedings Article

A comparison of event models for naive bayes text classification

TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Journal Article

An extensive empirical study of feature selection metrics for text classification

TL;DR: An empirical comparison of twelve feature selection methods evaluated on a benchmark of 229 text classification problem instances, revealing that a new feature selection metric, called 'Bi-Normal Separation' (BNS), outperformed the others by a substantial margin in most situations and was the top single choice for all goals except precision.
Related Papers (5)