scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1973"


Journal ArticleDOI
TL;DR: A new K-L technique is described that overcomes some of the limitations of the earlier procedures and is suggested that it is particularly useful for pattern recognition when combined with classification procedures based upon discriminant functions obtained by recursive least squares analysis.

132 citations


01 Jan 1973
TL;DR: The investigation provides an explanation for previous observations that the JM-distance and a saturating transform of divergence are highly useful for feature selection in the multiclass case.
Abstract: Distance measures which are useful for feature selection are considered, giving attention to the divergence distance measure and the Jeffreys-Matusita (JM) distance. Experimental studies show that the JM-distance yields more reliable results than other distance measures. A number of questions which are not solved by the experiments are discussed. The investigation provides an explanation for previous observations that the JM-distance and a saturating transform of divergence are highly useful for feature selection in the multiclass case.

128 citations


Journal ArticleDOI
TL;DR: A transformation matrix that minimizes the equivocation in the reduced dimension is obtained and a relationship between the equvocation and the expected divergence between any pair of classes is presented.

52 citations


Journal ArticleDOI
01 Mar 1973
TL;DR: This paper describes the results of experimental investigation of two-feature evaluation criteria, i.e., inter-intra class distance ratio and information content measure, and believes that the criteria can be used for other applications and can especially be used where the statistical independency among features is not assumed.
Abstract: This paper describes the results of experimental investigation of two-feature evaluation criteria, i.e., inter-intra class distance ratio and information content measure. These two indirect statistical measures take into account higher order statistical redundancies among the feature being evaluated. The algorithms are first presented and then they are applied and compared to recognize handprinted alphanumeric characters. Both Highleyman's data and raw data obtained in the Signal Processing Laboratory at Case Western Reserve University, Cleveland, Ohio, were used for the study. It is believed that the criteria can be used for other applications and can especially be used where the statistical independency among features is not assumed.

52 citations


01 Jan 1973
TL;DR: It is shown that the probability of misclassification is minimized if a maximum likelihood classification procedure is used to classify the data.
Abstract: The problem dealt with concerns feature selection or reducing the dimension of the data to be processed from n to k. By reducing the dimension of the data from n to k, classification time is generally reduced. Yet the dimension reduction should not be so great that classification accuracy is impaired. Thus, the general problem is considered of classifying an n-dimensional observation vector x into one of m-distinct classes where each class is normally distributed with mean and covariance. It is shown that the probability of misclassification is minimized if a maximum likelihood classification procedure is used to classify the data. The dimension of each observation vector to be processed is conveniently reduced by performing the transformation y = Bx, where B is a K by n matrix of rank k. Thus, the n-dimensional classification problem transforms into a k-dimensional classification problem.

35 citations


Journal ArticleDOI
TL;DR: In this paper, a non-parametric feature selection criterion with an explicit learning scheme is presented, based on the well-known concept of inter-class and intra-class Euclidean distances as a measure of the separability of the pattern classes in a given feature space.
Abstract: A now method of pattern classification, evolved by integrating in a sequential mode a non-parametric feature selection criterion with an explicit learning scheme, is presented. This feature selection criterion is based on the well-known concept of inter-class and intra-class Euclidean distances as a measure of the separability of the pattern classes in a given feature space. An ‘ effective figure of merit’ is denned and the feature subset in which this figure of merit attains the maximum value is construed as the best feature subset. Usually in most of the existing techniques, a single feature subset is chosen as the hest for the multi-class problem as a whole. A distinctive departure from this practice has been made here in that an individual best feature subset is determined for each of the pattern classes. The values of the effective figure of merit for the best feature subsets of the different pattern classes are sorted to determine the best separable pattern class. The learning scheme developed here ...

8 citations


01 Jan 1973
TL;DR: Non-training data is used to demonstrate the reduction in processing time that can be obtained by using feature extraction rather than feature selection in multispectral scanner data.
Abstract: A method is presented for feature extraction of multispectral scanner data. Non-training data is used to demonstrate the reduction in processing time that can be obtained by using feature extraction rather than feature selection.

2 citations



01 Jan 1973
TL;DR: Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence, compatible with the table look-up classifier requirements.
Abstract: Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence. One of the criterion used is compatible with the table look-up classifier requirements. The software indicates which channel subset best separates (based on average divergence) each class from all other classes. The software employs an exhaustive search technique, and computer time is not prohibitive. A typical task to select the best 4 of 22 channels for 12 classes takes 9 minutes on a Univac 1108 computer.

1 citations


01 Jan 1973
TL;DR: In this paper, column vectors are columnvectors, and column vectors indicate the transpose matrix recognition rate, where column vectors represent the column vectors of column vectors and columns represent column vectors.
Abstract: ~~~~~~~~~are columnvectors, andtindicates thetranspose matrix. recognition rate. (If anindirect measure isnotused, thenthe I ie assume d