scispace - formally typeset
Journal ArticleDOI

An extended Chi2 algorithm for discretization of real value attributes

Reads0
Chats0
TLDR
A new algorithm, named the extended Chi2 algorithm, is proposed, which possesses a better performance than the original and modified Chi2 algorithms and ignores the effect of variance in the two merged intervals.
Abstract
The variable precision rough sets (VPRS) model is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS model unfortunately cannot be applied to real-world classification tasks involving continuous attributes. This requires a discretization method to preprocess the data. Discretization is an effective technique to deal with continuous attributes for data mining, especially for the classification problem. The modified Chi2 algorithm is one of the modifications to the Chi2 algorithm, replacing the inconsistency check in the Chi2 algorithm by using the quality of approximation, coined from the rough sets theory (RST), in which it takes into account the effect of degrees of freedom. However, the classification with a controlled degree of uncertainty, or a misclassification error, is outside the realm of RST. This algorithm also ignores the effect of variance in the two merged intervals. In this study, we propose a new algorithm, named the extended Chi2 algorithm, to overcome these two drawbacks. By running the software of See5, our proposed algorithm possesses a better performance than the original and modified Chi2 algorithms.

read more

Citations
More filters
Journal ArticleDOI

Topological approaches to covering rough sets

TL;DR: This paper explores the topological properties of covering-based rough sets, studies the interdependency between the lower and the upper approximation operations, and establishes the conditions under which two coverings generate the same lower approximation operation and the same upper approximation operation.
Journal ArticleDOI

A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning

TL;DR: A survey of discretization methods can be found in this paper, where the main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data.
Journal ArticleDOI

On Three Types of Covering-Based Rough Sets

TL;DR: The relationships among the definable sets are investigated, and certain conditions that the union of the neighborhood and the complementary neighborhood is equal to the indiscernible neighborhood are presented.
Journal ArticleDOI

Generalized rough sets based on relations

TL;DR: This paper studies arbitrary binary relation based generalized rough sets, in which a binary relation can generate a lower approximation operation and an upper approximation operation, but some of common properties of classical lower and upper approximation operations are no longer satisfied.
Journal ArticleDOI

Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

TL;DR: A real world problem presented in the ECDBL’2014 Big Data competition is used to provide a thorough analysis on the application of some preprocessing techniques, their combination and their performance.
References
More filters
Book

Applied Statistics and Probability for Engineers

TL;DR: Montgomery and Runger's Engineering Statistics text as discussed by the authors provides a practical approach oriented to engineering as well as chemical and physical sciences by providing unique problem sets that reflect realistic situations, students learn how the material will be relevant in their careers.
Book ChapterDOI

Supervised and unsupervised discretization of continuous features

TL;DR: Binning, an unsupervised discretization method, is compared to entropy-based and purity-based methods, which are supervised algorithms, and it is found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy- based method.
Journal ArticleDOI

Variable precision rough set model

TL;DR: A generalized model of rough sets called variable precision model (VP-model), aimed at modelling classification problems involving uncertain or imprecise information, is presented and the main concepts are introduced formally and illustrated with simple examples.
Journal ArticleDOI

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

TL;DR: On most datasets studied, the best of very simple rules that classify examples on the basis of a single attribute is as accurate as the rules induced by the majority of machine learning systems.
Proceedings Article

ChiMerge: discretization of numeric attributes

TL;DR: ChiMerge is described, a general, robust algorithm that uses the χ2 statistic to discretize (quantize) numeric attributes.