scispace - formally typeset
Journal ArticleDOI

Feature Selection with Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum Redundancy

Reads0
Chats0
TLDR
A feature selection algorithm is designed to improve the performance for multilabel data with missing labels and is effective not only for recovering missing labels but also for selecting significant features with better classification performance.
Abstract
Recently, multilabel classification has generated considerable research interest. However, the high dimensionality of multilabel data incurs high costs; moreover, in many real applications, a number of labels of training samples are randomly missed. Thus, multilabel classification can have great complexity and ambiguity, which means some feature selection methods exhibit poor robustness and yield low prediction accuracy. To solve these issues, this paper presents a novel feature selection method based on multilabel fuzzy neighborhood rough sets (MFNRS) and maximum relevance minimum redundancy (MRMR) that can be used on multilabel data with missing labels. First, to handle multilabel data with missing labels, a relation coefficient of samples, label complement matrix, and label-specific feature matrix are constructed and implemented in a linear regression model to recover missing labels. Second, the margin-based fuzzy neighborhood radius, fuzzy neighborhood similarity relationship, and fuzzy neighborhood information granule are developed. The MFNRS model is built based on multilabel neighborhood rough sets combined with fuzzy neighborhood rough sets. Based on algebra and information views, certain fuzzy neighborhood entropy-based uncertainty measures are proposed for MFNRS. The fuzzy neighborhood mutual information-based MRMR model with label correlation is improved to evaluate the performance of candidate features. Finally, a feature selection algorithm is designed to improve the performance for multilabel data with missing labels. Experiments on twenty datasets verify that our method is effective not only for recovering missing labels but also for selecting significant features with better classification performance.

read more

Citations
More filters
Journal ArticleDOI

Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification

TL;DR: A filter-wrapper preprocessing algorithm for feature selection using the improved Fisher score model is proposed to decrease the spatiotemporal complexity ofMultilabel data, and a heuristic feature selection algorithm is designed for improve classification performance on multilabel datasets.
Journal ArticleDOI

Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors

TL;DR: Zhang et al. as discussed by the authors presented a novel feature reduction method for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors (AWKNN).
Journal ArticleDOI

Feature selection techniques in the context of big data: taxonomy and analysis

TL;DR: A comprehensive review of the latest FS approaches in the context of big data along with a structured taxonomy, which categorizes the existing methods based on their nature, search strategy, evaluation process, and feature structure and highlights the research issues and open challenges related to FS.
Journal ArticleDOI

Practical multi-party private collaborative k-means clustering

TL;DR: Wang et al. as discussed by the authors proposed a protocol for k-means clustering in a collaborative manner, while protecting the privacy of each data record, which is suitable for multi-party collaboration to update cluster centers without leaking data privacy.
References
More filters
Journal Article

Statistical Comparisons of Classifiers over Multiple Data Sets

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

TL;DR: This work derives an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection, and presents a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers).
Journal ArticleDOI

ML-KNN: A lazy learning approach to multi-label learning

TL;DR: Experiments on three different real-world multi-label learning problems, i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that ML-KNN achieves superior performance to some well-established multi- label learning algorithms.
Journal ArticleDOI

Feature selection for multi-label naive Bayes classification

TL;DR: This paper proposes a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances and achieves comparable performance to other well-established multi- label learning algorithms.
Journal ArticleDOI

Multilabel dimensionality reduction via dependence maximization

TL;DR: Zhang et al. as mentioned in this paper proposed a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels.
Related Papers (5)