Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy
Citations
4,835 citations
Cites methods from "Feature selection based on mutual i..."
...Performances of the different algorithms have also been tested on both leukemia data set and colon microarray data set after the minimum-redundancy–maximum-relevance feature selection method [54] being taken (cf....
[...]
3,517 citations
Cites background or methods from "Feature selection based on mutual i..."
...In [23,24] the authors develop a ranking criteria based on class densities for binary data....
[...]
...The mRMR (max-relevancy, min-redundancy) [24] is another method based on MI....
[...]
...SVM [51,2,24,18] is a marginal classifier which maximizes the margin between the data samples in the two classes....
[...]
2,184 citations
1,697 citations
1,566 citations
Cites background from "Feature selection based on mutual i..."
...Peng et al. (2005) proposes a Minimum Redundancy Maximum Relevance (MRMR) criterion to set the value of β to be the reverse of the number of selected features: JMRMR (Xk ) = I (Xk ;Y ) − 1 |S| ∑ X j ∈S I (Xk...
[...]
...…Du et al. 2013; Tang et al. 2014), feature correlation (Koller and Sahami 1995; Guyon and Elisseeff 2003), mutual information (Yu and Liu 2003; Peng et al. 2005; Nguyen et al. 2014; Shishkin et al. 2016; Gao et al. 2016), feature ability to preserve data manifold structure (He et al. 2005;…...
[...]
References
45,034 citations
"Feature selection based on mutual i..." refers background in this paper
...Index Terms—Feature selection, mutual information, minimal redundancy, maximal relevance, maximal dependency, classification....
[...]
40,147 citations
"Feature selection based on mutual i..." refers methods in this paper
...(There are two exceptions in Table 4, for which the obtained feature subsets are comparable: 1) “NCI+LDA+Forward,” where five mRMR features lead to 20 errors (33.33 percent) and seven MaxRel features lead to 19 errors (31.67 ercent) and 2) “LYM+SVM+Backward,” the same error (3.13 percent) is obtained.)...
[...]
...With SVM+40features,we obtained the error rate 23-26 percent formRMR, and 35-38 percent forMaxRel....
[...]
...3a, 3b, and 3c show the classification error rates with classifiers NB, SVM, and LDA, respectively....
[...]
...We use the LIBSVM package [9], which supports both 2-class and multiclass classification....
[...]
...To test this, we consider three widely used classifiers, i.e., Naive Bayes (NB), Support Vector Machine (SVM), and Linear Discrimant Analysis (LDA)....
[...]
10,114 citations
"Feature selection based on mutual i..." refers background in this paper
...As one of the earliest classifiers, LDA [30] learns a linear classification boundary in the input feature space....
[...]
9,493 citations
"Feature selection based on mutual i..." refers background or methods in this paper
...(There are two exceptions in Table 4, for which the obtained feature subsets are comparable: 1) “NCI+LDA+Forward,” where five mRMR features lead to 20 errors (33.33 percent) and seven MaxRel features lead to 19 errors (31.67 ercent) and 2) “LYM+SVM+Backward,” the same error (3.13 percent) is obtained.)...
[...]
...For LYM data in Fig....
[...]
...For the LYMdata,MaxDep needsmore than 200 seconds to find the 50th feature, while mRMR uses only 5 seconds....
[...]
...For example, we compared the average computational time cost to select the top 50 mRMR and MaxDep features for both continuous data sets NCI and LYM, based on parallel experiments on a cluster of eight 3.06G Xeon CPUs running Redhat Linux 9, with the Matlab implementation....
[...]
...The data set LYM [1] has 96 samples of 4,026 gene features....
[...]
8,610 citations
"Feature selection based on mutual i..." refers background or methods in this paper
...A wrapper [ 15 ], [18] is a feature selector that convolves with a classifier (e.g., naive Bayes classifier), with the direct goal to minimize the classification error of the particular classifier....
[...]
...Second, we investigate how to combine mRMR with other feature selection methods (such as wrappers [18], [ 15 ]) into a two-stage selection algorithm....
[...]
...The latter type of approach (e.g., mRMR and Max-Relevance), sometimes called “filter” [18], [ 15 ], often selects features by testing whether some preset conditions about the features and the target class are satisfied....
[...]
...[ 15 ], [22], [12], [5]) and select features with the minimal redundancy (Min-Redundancy)....
[...]
...N many pattern recognition applications, identifying the most characterizing features (or attributes) of the observed data, i.e., feature selection (or variable selection, among many other names) [30], [14], [17], [18], [ 15 ], [12], [11], [19], [31], [32], [5], is critical to minimize the classification error....
[...]