Handling Class Overlapping to Detect Noisy Instances in Classification

doi:10.1017/S0269888918000115

Journal ArticleDOI

Handling Class Overlapping to Detect Noisy Instances in Classification

Shivani Gupta, +1 more

- 02 Jun 2018 -

Knowledge Engineering Review

- Vol. 33, Iss: 8

TLDR

It is found that class overlap is a principal contributor to introduce class noise in data sets by using four noise filters to identify the overlapped instances and find the noisy instances.

Abstract:

Automated machine classification will play a vital role in the machine learning and data mining. It is probable that each classifier will work well on some data sets and not so well in others, increasing the evaluation significance. The performance of the learning models will intensely rely on upon the characteristics of the data sets. The previous outcomes recommend that overlapping between classes and the presence of noise has the most grounded impact on the performance of learning algorithm. The class overlap problem is a critical problem in which data samples appear as valid instances of more than one class which may be responsible for the presence of noise in data sets.The objective of this paper is to comprehend better the data used as a part of machine learning problems so as to learn issues and to analyze the instances that are profoundly covered by utilizing new proposed overlap measures. The proposed overlap measures are Nearest Enemy Ratio, SubConcept Ratio, Likelihood Ratio and Soft Margin Ratio. To perform this experiment, we have created 438 binary classification data sets from real-world problems and computed the value of 12 data complexity metrics to find highly overlapped data sets. After that we apply measures to identify the overlapped instances and four noise filters to find the noisy instances. From results, we found that 60–80% overlapped instances are noisy instances in data sets by using four noise filters. We found that class overlap is a principal contributor to introduce class noise in data sets.

Handling Class Overlapping to Detect Noisy Instances in Classification

Citations

A novel progressively undersampling method based on the density peaks sequence for imbalanced data

A New Under-Sampling Method to Face Class Overlap and Imbalance

A comparative study on online machine learning techniques for network traffic streams analysis

Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System

An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult

References

Support-Vector Networks

C4.5: Programs for Machine Learning

Nearest neighbor pattern classification

Statistical pattern recognition: a review

Learning from Imbalanced Data

Related Papers (5)

A set of measures designed to identify overlapped instances in software defect prediction

Asymptotic Properties of Nearest Neighbor Rules Using Edited Data

Neighbourhood-based undersampling approach for handling imbalanced and overlapped data

Instance selection based on boosting for instance-based learners

Ranking-based instance selection for pattern classification