scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1978"


Journal ArticleDOI
TL;DR: In this paper, the authors review the most significant methods of variable selection and evaluate them critically and choose those which seem to be most appropriate for regression analysis. But their conclusions and recommendations differ depending on whether the independent variables must be considered as fixed or whether it is possible to regard them as random.
Abstract: Summary In applications of regression analysis for prediction purposes a large number of independent variables is often available. There may be uncertainty as to which of these independent variables should be included in the final analysis as adequate prediction may be possible using only a subset of those available. Many methods of variable selection have been proposed. In deciding on a method it is necessary to evaluate the criterion (of goodness of prediction) on which it is based, and, to some extent, the computational effort involved. We review the most significant methods which have been proposed, evaluate them critically and choose those which seem to us most appropriate. For these chosen methods we discuss the computational procedures involved in their execution. Our conclusions and recommendations differ depending on whether the independent variables must be considered as fixed or whether it is possible to regard them as random. In the fixed case we recommend the 'C,' procedure (Mallows, 1966), or the 'Ap' procedure (Allen, 1971), the latter if the user is prepared to incur the heavier calculation necessary to find an optimal (possibly different) subset of variables for every prediction. In the second case, if the independent variables can be considered as random, both the C, and Ap would still be possible procedures, but we regard the 'S,' method (see e.g. Hocking, 1976) to be preferable in this situation.

185 citations


Journal ArticleDOI
TL;DR: It is shown that in this application only two laboratory tests are necessary to obtain a sufficiently high diagnostic effectiveness when linear discriminant analysis is applied, and the optimal linear combination of laboratory tests obtained by means of linear discriminating analysis results in a better use of the information present in each test.

67 citations


Journal ArticleDOI
TL;DR: Several rules for feature selection in myopic policy are examined for solving the sequential finite classification problem with conditionally independent binary features, finding that no rule is consistently superior to the others.
Abstract: Several rules for feature selection in myopic policy are examined for solving the sequential finite classification problem with conditionally independent binary features. The main finding is that no rule is consistently superior to the others. Likewise no specific strategy for the alternating of rules seems to be significantly more efficient.

47 citations


Journal ArticleDOI
TL;DR: A digital computer based technique for the selection of optimum test frequencies for fault diagnosis of analogue systems is presented and is found to correlate well with the actual diagnosability of faults for a simulation of varying fault levels and including varying production tolerances for the non-faulty components.

45 citations


Journal ArticleDOI
TL;DR: The feature definition procedure which is proposed involves partitioning a large set of highly correlated features into subsets, or clusters, through hierarchical clustering and reducing the original set of correlated features to a small set of nearly uncorrelated features.

37 citations


Journal ArticleDOI
TL;DR: Dynamic programming is applied to the selection of feature subsets in text-independent speaker identification, showing a lower average identification error in comparison to that of the "knock-out" strategy, the cepstral coefficients, and the PARCOR coefficients.
Abstract: Dynamic programming is applied to the selection of feature subsets in text-independent speaker identification. Each feature is long-term averaged in order to reduce its variability to text information. The resulting subset of features shows a lower average identification error in comparison to that of the "knock-out" strategy, the cepstral coefficients, and the PARCOR coefficients.

35 citations


Journal ArticleDOI
TL;DR: The method of linear regression analysis is used to compute binary linear classifiers which can recognize 17 chemical structures of steroids from given low-resolution mass spectra and best classification results are obtained with spectra which are normalized to local ion current and with feature selection based on maximum Fisher ratio.

12 citations



Journal ArticleDOI
TL;DR: The concept of irrelevant features in Bayesian models for pattern recognition is introduced, and its mathematical meaning is explained.
Abstract: The concept of irrelevant features in Bayesian models for pattern recognition is introduced, and its mathematical meaning is explained. A technique for computing the conditional probabilities of irrelevant features, if necessary, is described. The effect of irrelevant features on feature selection in sequential classification is discussed and illustrated.

8 citations


Journal ArticleDOI
TL;DR: This article showed that one of the plots proposed by Spj⊘tvoll has a straightforward and meaningful analogue in the corresponding two-group discriminant analysis, providing a graphical aid to a simultaneous procedure for variable selection in discriminant analyses.
Abstract: Alternatives to Mallows' (1964) well‐known graphical aid to variable selection in multiple regression have been suggested by Spj⊘tvoll (1977). The purpose of this note is to show that one of the plots proposed by Spj⊘tvoll has a straightforward and meaningful analogue in the corresponding two‐group discriminant analysis. This provides a graphical aid to a simultaneous procedure for variable selection in discriminant analysis proposed by McKay (1976) and also to a less conservative alternative to that procedure suggested by arguments paralleling those of Spj⊘tvoll in the regression context.

6 citations


Journal ArticleDOI
TL;DR: In this paper, a random n-vector whose density function is given by a mixtur is assumed to be a convex combination of known multivariate normal density functions whose corresponding mixture proportions are also unknown.
Abstract: Let X be a random n-vector whose density function is given by a mixtur.e of two density functions, h1 and h2 with unknown mixture proportions, Y1 and Y2 We assume that each of h1 and h2 is a convex combination of known multivariate normal density functions whose corresponding mixture proportions are also unknown. We present three numerically tractable methods for estimating Y1 and Y2 related to the technique of Guseman and Walton (1977), and based on the linear feature selection technique of Guseman, Peters and Walker (1975).

Book ChapterDOI
01 Jul 1978
TL;DR: Two contrasting views of feature extraction can be identified, one of which emphasizes invariant feature detection and one which emphasizes flexible feature selection, which have important implications for the development of auditory pattern recognition theory.
Abstract: : Feature extraction plays a fundamental role in most theories of pattern recognition, but despite its importance, the extraction process is not well defined Two contrasting views of feature extraction can be identified, one which emphasizes invariant feature detection and one which emphasizes flexible feature selection The invariant detector approach assumes that the auditory system is equipped with finely tuned feature detectors that respond to specific stimulus properties In this view, stimuli are described in terms of property lists of specific features In contrast, the more flexible, process-oriented approach assumes that the auditory system is equipped with a set of rules and criteria for feature selection In this view, the important perceptual features reflect the underlying structure of the stimuli Research on timbre and pitch perception has supported a flexible, process-oriented approach The flexibility of this approach offers particular advantages in that it can explain the effects of stimulus and task context on performance Both types of context influence the perception of complex sounds Stimulus context affects the structure of the stimulus space and consequently the features that would be extracted by a structure preserving transformation Task context affects the relative importance of features in making similarity judgements and classification decisions The two approaches to feature extraction have important implications for the development of auditory pattern recognition theory



Journal ArticleDOI
TL;DR: In this paper, an algorithm for describing optimal linear combinations in the feature selection process is considered, and various proofs for its correctness are presented. But this algorithm is not suitable for feature selection in general.
Abstract: An algorithm for describing optimal linear combinations in the feature selection process is considered. Various proofs are presented.

Journal ArticleDOI
TL;DR: In this paper, the authors present a procedure for the choice of a regression model, of the degree of regression polynomial or of regressor variables, of a simple model with few parameters, a small upper confidence bound for the model specification error, and high reliability of this bound.
Abstract: The paper presents a procedure for the choice of a regression model, of the degree of a regression polynomial or of regressor variables. Three cases are considered: 1) No knowledge on the structure of the regression function f, 2) Quasilinear f and 3) Nonlinear f. The procedure gives a possibility to choose the model in an informal way by use of a graphical representation, looking at a compromise between the following partly contradictory objectives: a) simple model with few parameters, b) small upper confidence bound for the model specification error, c) high reliability (confidence level) of this bound.

Book ChapterDOI
01 Jan 1978
TL;DR: A new algebraic method, the so-called “structure theory” developed by Blickle and coworkers applied to solving a wide-range of chemical engineering problems proves to be a very useful tool for learning processes.
Abstract: A new algebraic method, the so-called “structure theory” developed by Blickle and coworkers applied to solving a wide-range of chemical engineering problems proves to be a very useful tool for learning processes. The learning algorithms based on the structure theory consist of the following steps: labelling, feature selection, training, determination of the structure and recognition.