Feature Selection with Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum Redundancy

doi:10.1109/TFUZZ.2021.3053844

Journal ArticleDOI

Feature Selection with Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum Redundancy

Lin Sun, +4 more

- 22 Jan 2021 -

IEEE Transactions on Fuzzy Systems

- pp 1-1

Chats0

TLDR

A feature selection algorithm is designed to improve the performance for multilabel data with missing labels and is effective not only for recovering missing labels but also for selecting significant features with better classification performance.

Abstract:

Recently, multilabel classification has generated considerable research interest. However, the high dimensionality of multilabel data incurs high costs; moreover, in many real applications, a number of labels of training samples are randomly missed. Thus, multilabel classification can have great complexity and ambiguity, which means some feature selection methods exhibit poor robustness and yield low prediction accuracy. To solve these issues, this paper presents a novel feature selection method based on multilabel fuzzy neighborhood rough sets (MFNRS) and maximum relevance minimum redundancy (MRMR) that can be used on multilabel data with missing labels. First, to handle multilabel data with missing labels, a relation coefficient of samples, label complement matrix, and label-specific feature matrix are constructed and implemented in a linear regression model to recover missing labels. Second, the margin-based fuzzy neighborhood radius, fuzzy neighborhood similarity relationship, and fuzzy neighborhood information granule are developed. The MFNRS model is built based on multilabel neighborhood rough sets combined with fuzzy neighborhood rough sets. Based on algebra and information views, certain fuzzy neighborhood entropy-based uncertainty measures are proposed for MFNRS. The fuzzy neighborhood mutual information-based MRMR model with label correlation is improved to evaluate the performance of candidate features. Finally, a feature selection algorithm is designed to improve the performance for multilabel data with missing labels. Experiments on twenty datasets verify that our method is effective not only for recovering missing labels but also for selecting significant features with better classification performance.

Feature Selection with Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum Redundancy

Citations

Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification

PrimePatNet87: Prime pattern and tunable q-factor wavelet transform techniques for automated accurate EEG emotion recognition.

Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors

Feature selection techniques in the context of big data: taxonomy and analysis

Practical multi-party private collaborative k-means clustering

References

Statistical Comparisons of Classifiers over Multiple Data Sets

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

ML-KNN: A lazy learning approach to multi-label learning

Feature selection for multi-label naive Bayes classification

Multilabel dimensionality reduction via dependence maximization

Related Papers (5)

A Machine Learning-Based Framework for Diagnosis of Breast Cancer

Multi-label Feature Selection with Fuzzy Rough Sets

Margin based feature selection - theory and algorithms

An Extension of Multi-label Binary Relevance Models Based on Randomized Reference Classifier and Local Fuzzy Confusion Matrix

Classification of text documents based on minimum system entropy