scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 1995"


Book
01 Jan 1995
TL;DR: In this article, the authors describe and display multivariate data, characterizing and displaying Multivariate Data, Characterizing and Displaying Multivariate data and characterising and displaying multivariate Data.
Abstract: Introduction. Matrix Algebra. Characterizing and Displaying Multivariate Data. The Multivariate Normal Distribution. Tests on One or Two Mean Vectors. Multivariate Analysis of Variance. Tests on Covariance Matrices. Discriminant Analysis: Description of Group Separation. Classification Analysis: Allocation of Observations to Groups. Multivariate Regression. Canonical Correlation. Principal Component Analysis. Factor Analysis. Cluster Analysis. Graphical Procedures. Tables. Answers and Hints to Problems. Data Sets and SAS Files. References. Index.

2,620 citations


Journal ArticleDOI
TL;DR: After pointing out the key assumptions underlying CCA, the paper focuses on the interpretation of CCA ordination diagrams and some advanced uses, such as ranking environmental variables in importance and the statistical testing of effects are illustrated on a typical macroinvertebrate data-set.
Abstract: Canonical correspondence analysis (CCA) is a multivariate method to elucidate the relationships between biological assemblages of species and their environment. The method is designed to extract synthetic environmental gradients from ecological data-sets. The gradients are the basis for succinctly describing and visualizing the differential habitat preferences (niches) of taxa via an ordination diagram. Linear multivariate methods for relating two set of variables, such as twoblock Partial Least Squares (PLS2), canonical correlation analysis and redundancy analysis, are less suited for this purpose because habitat preferences are often unimodal functions of habitat variables. After pointing out the key assumptions underlying CCA, the paper focuses on the interpretation of CCA ordination diagrams. Subsequently, some advanced uses, such as ranking environmental variables in importance and the statistical testing of effects are illustrated on a typical macroinvertebrate data-set. The paper closes with comparisons with correspondence analysis, discriminant analysis, PLS2 and co-inertia analysis. In an appendix a new method, named CCA-PLS, is proposed that combines the strong features of CCA and PLS2.

1,715 citations


Journal ArticleDOI
TL;DR: A penalized version of Fisher's linear discriminant analysis is described, designed for situations in which there are many highly correlated predictors, such as those obtained by discretizing a function, or the grey-scale values of the pixels in a series of images.
Abstract: Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a penalized version of LDA. It is designed for situations in which there are many highly correlated predictors, such as those obtained by discretizing a function, or the grey-scale values of the pixels in a series of images. In cases such as these it is natural, efficient and sometimes essential to impose a spatial smoothness constraint on the coefficients, both for improved prediction performance and interpretability. We cast the classification problem into a regression framework via optimal scoring. Using this, our proposal facilitates the use of any penalized regression technique in the classification setting. The technique is illustrated with examples in speech recognition and handwritten character recognition.

890 citations


Journal ArticleDOI
TL;DR: The SAMANN network offers the generalization ability of projecting new data, which is not present in the original Sammon's projection algorithm; the NDA method and NP-SOM network provide new powerful approaches for visualizing high dimensional data.
Abstract: Classical feature extraction and data projection methods have been well studied in the pattern recognition and exploratory data analysis literature. We propose a number of networks and learning algorithms which provide new or alternative tools for feature extraction and data projection. These networks include a network (SAMANN) for J.W. Sammon's (1969) nonlinear projection, a linear discriminant analysis (LDA) network, a nonlinear discriminant analysis (NDA) network, and a network for nonlinear projection (NP-SOM) based on Kohonen's self-organizing map. A common attribute of these networks is that they all employ adaptive learning algorithms which makes them suitable in some environments where the distribution of patterns in feature space changes with respect to time. The availability of these networks also facilitates hardware implementation of well-known classical feature extraction and projection approaches. Moreover, the SAMANN network offers the generalization ability of projecting new data, which is not present in the original Sammon's projection algorithm; the NDA method and NP-SOM network provide new powerful approaches for visualizing high dimensional data. We evaluate five representative neural networks for feature extraction and data projection based on a visual judgement of the two-dimensional projection maps and three quantitative criteria on eight data sets with various properties. >

695 citations



Journal ArticleDOI
TL;DR: Three problems with stepwise applications are explored in some detail, including the fact that computer packages use incorrect degrees of freedom in their stepwise computations, resulting in artifactually greater likelihood of obtaining spurious statistical significance.
Abstract: Stepwise methods are frequently employed in educational and psychological research, both to select useful subsets of variables and to evaluate the order of importance of variables. Three problems with stepwise applications are explored in some detail. First, computer packages use incorrect degrees of freedom in their stepwise computations, resulting in artifactually greater likelihood of obtaining spurious statistical significance. Second, stepwise methods do not correctly identify the best variable set of a given size, as illustrated by a concrete heuristic example. Third, stepwise methods tend to capitalize on sampling error and thus tend to yield results that are not replicable.

425 citations


Journal ArticleDOI

320 citations


Journal ArticleDOI
TL;DR: It is shown that good face reconstructions can be obtained using 83 model parameters, and that high recognition rates can be achieved.

313 citations


Journal ArticleDOI
TL;DR: A set of data set descriptors is developed to help decide which algorithms are suited to particular data sets, including data sets with extreme distributions and with many binary/categorical attributes.
Abstract: This paper describes work in the StatLog project comparing classification algorithms on large real-world problems The algorithms compared were from symbolic learning (CART C45, NewID, AC2,ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (backpropagation, radial basis functions) Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance We found that which algorithm performed best depended critically on the data set investigated We therefore developed a set of data set descriptors to help decide which algorithms are suited to particular data sets For example, data sets with extreme distributions (skew > l and kurtosis > 7) and with many binary/categorical attributes (>38%) tend to favor symbolic learning algorithms We suggest how classification algorithms can be extended in a number of d

312 citations


Journal ArticleDOI
TL;DR: In this paper, the authors derived figures of merit for image quality on the basis of the performance of mathematical observers on specific detection and estimation tasks, which were based on the Fisher information matrix relevant to estimation of the Fourier coefficients and closely related Fourier crosstalk matrix introduced earlier by Barrett and Gifford.
Abstract: Figures of merit for image quality are derived on the basis of the performance of mathematical observers on specific detection and estimation tasks. The tasks include detection of a known signal superimposed on a known background, detection of a known signal on a random background, estimation of Fourier coefficients of the object, and estimation of the integral of the object over a specified region of interest. The chosen observer for the detection tasks is the ideal linear discriminant, which we call the Hotelling observer. The figures of merit are based on the Fisher information matrix relevant to estimation of the Fourier coefficients and the closely related Fourier crosstalk matrix introduced earlier by Barrett and Gifford [Phys. Med. Biol. 39, 451 (1994)]. A finite submatrix of the infinite Fisher information matrix is used to set Cramer-Rao lower bounds on the variances of the estimates of the first N Fourier coefficients. The figures of merit for detection tasks are shown to be closely related to the concepts of noise-equivalent quanta (NEQ) and generalized NEQ, originally derived for linear, shift-invariant imaging systems and stationary noise. Application of these results to the design of imaging systems is discussed.

244 citations


Journal ArticleDOI
TL;DR: The results demonstrate the feasibility of using linear discriminant analysis in the texture feature space for classification of true and false detections of masses on mammograms in a computer-aided diagnosis scheme.
Abstract: The authors studied the effectiveness of using texture features derived from spatial grey level dependence (SGLD) matrices for classification of masses and normal breast tissue on mammograms. One hundred and sixty-eight regions of interest (ROIS) containing biopsy-proven masses and 504 ROIS containing normal breast tissue were extracted from digitized mammograms for this study. Eight features were calculated for each ROI. The importance of each feature in distinguishing masses from normal tissue was determined by stepwise linear discriminant analysis. Receiver operating characteristic (ROC) methodology was used to evaluate the classification accuracy. The authors investigated the dependence of classification accuracy on the input features, and on the pixel distance and bit depth in the construction of the SGLD matrices. It was found that five of the texture features were important for the classification. The dependence of classification accuracy on distance and bit depth was weak for distances greater than 12 pixels and bit depths greater than seven bits. By randomly and equally dividing the data set into two groups, the classifier was trained and tested on independent data sets. The classifier achieved an average area under the ROC curve, Az, of 0.84 during training and 0.82 during testing. The results demonstrate the feasibility of using linear discriminant analysis in the texture feature space for classification of true and false detections of masses on mammograms in a computer-aided diagnosis scheme.

Journal ArticleDOI
TL;DR: An extension to the “best-basis” method to select an orthonormal basis suitable for signal/image classification problems from a large collection of Orthonormal bases consisting of wavelet packets or local trigonometric bases and a method to extract signal component from data consisting of signal and textured background is described.
Abstract: We describe an extension to the “best-basis” method to select an orthonormal basis suitable for signal/image classification problems from a large collection of orthonormal bases consisting of wavelet packets or local trigonometric bases The original best-basis algorithm selects a basis minimizing entropy from such a “library of orthonormal bases” whereas the proposed algorithm selects a basis maximizing a certain discriminant measure (eg, relative entropy) among classes Once such a basis is selected, a small number of most significant coordinates (features) are fed into a traditional classifier such as Linear Discriminant Analysis (LDA) or Classification and Regression Tree (CARTTM) The performance of these statistical methods is enhanced since the proposed methods reduce the dimensionality of the problem at hand without losing important information for that problem Here, the basis functions which are well-localized in the time-frequency plane are used as feature extractors We applied our method to two signal classification problems and an image texture classification problem These experiments show the superiority of our method over the direct application of these classifiers on the input signals As a further application, we also describe a method to extract signal component from data consisting of signal and textured background

Journal ArticleDOI
TL;DR: In this paper, the authors consider some ways of estimating linear discriminant functions without such prior selection, and several spectroscopic data sets are analysed with each method, and questions of bias of assessment procedures are investigated.
Abstract: SUMMARY Currently popular techniques such as experimental spectroscopy and computer-aided molecular modelling lead to data having very many variables observed on each of relatively few individuals. A common objective is discrimination between two or more groups, but the direct application of standard discriminant methodology fails because of singularity of covariance matrices. The problem has been circumvented in the past by prior selection of a few transformed variables, using either principal component analysis or partial least squares. Although such selection ensures nonsingularity of matrices, the decision process is arbitrary and valuable information on group structure may be lost. We therefore consider some ways of estimating linear discriminant functions without such prior selection. Several spectroscopic data sets are analysed with each method, and questions of bias of assessment procedures are investigated. All proposed methods seem worthy of consideration in practice.

Journal ArticleDOI
TL;DR: In this paper, the authors used wavelet transforms to describe and recognize isolated cardiac beats and evaluated their capability of discriminating between normal, premature ventricular contraction, and ischemic beats by means of linear discriminant analysis.
Abstract: The authors' study made use of wavelet transforms to describe and recognize isolated cardiac beats. The choice of the wavelet family as well as the selection of the analyzing function into these families are discussed. The criterion used in the first case was the correct classification rate, and in the second case, the correlation coefficient between the original pattern and the reconstructed one. Two types of description have been considered-the energy-based representation and the extrema distribution estimated at each decomposition level-and their quality has been assessed by using principal component analysis. Their capability of discrimination between normal, premature ventricular contraction, and ischemic beats has been studied by means of linear discriminant analysis. This work leads also, for the problem at hand, to the identification of the most relevant resolution levels. >

Journal ArticleDOI
TL;DR: The study examines the effectiveness of different neural networks in predicting bankruptcy filing and demonstrates that the performance of the neural networks tested is sensitive to the choice of variables selected and that the networks cannot be relied upon to “sift through” variables and focus on the most important variables.
Abstract: The study examines the effectiveness of different neural networks in predicting bankruptcy filing. Two approaches for training neural networks, Back-Propagation and Optimal Estimation Theory, are considered. Within the back-propagation training method, four different models (Back-Propagation, Functional Link Back-Propagation With Sines, Pruned Back-Propagation, and Cumulative Predictive Back-Propagation) are tested. The neural networks are compared against traditional bankruptcy prediction techniques such as discriminant analysis, logit, and probit. The results show that the level of Type I and Type II errors varies greatly across techniques. The Optimal Estimation Theory neural network has the lowest level of Type I error and the highest level of Type II error while the traditional statistical techniques have the reverse relationship (i.e., high Type I error and low Type II error). The back-propagation neural networks have intermediate levels of Type I and Type II error. We demonstrate that the performance of the neural networks tested is sensitive to the choice of variables selected and that the networks cannot be relied upon to “sift through” variables and focus on the most important variables (network performance based on the combined set of Ohlson and Altman data was frequently worse than their performance with one of the subsets). It is also important to note that the results are quite sensitive to sampling error. The significant variations across replications for some of the models indicate the sensitivity of the models to variations in the data.

Patent
Thomas D. Arbuckle1
28 Jul 1995
TL;DR: In this article, a system comprising a neural network, or computer, implementing a feature detection and a statistical procedure, together with fuzzy logic for solving the problem of recognition of faces or other objects at multiple resolutions is described.
Abstract: A system comprising a neural network, or computer, implementing a feature detection and a statistical procedure, together with fuzzy logic for solving the problem of recognition of faces or other objects) at multiple resolutions is described. A plurality of previously described systems for recognizing faces (or other objects) which use local autocorrelation coefficients and linear discriminant analysis are trained on a data set to recognize facial images each at a particular resolution. In a second training stage, each of the previously described systems is tested on a second training set in which the images presented to the previously described recognition systems have a matching resolution to those of the first training set, the statistical performance of this second training stage being used to train a fuzzy combination technique, that of fuzzy integrals. Finally, in a test stage, the results from the classifiers at the multiple resolutions are combined using fuzzy combination to produce an aggregated system whose performance is higher than that of any of the individual systems and shows very good performance relative to all known face recognitior systems which operate on similar types of training and testing data, this aggregated system, however, not being limited to the recognition of faces and being able to be applied to the recognition of other objects.

Journal ArticleDOI
TL;DR: The results showed that higher classification accuracies were generally derived from the artificial neural network, especially when small training sets only were available, and it was apparent that the opportunity of the artificial Neural Network to learn class appearance was influenced by the composition of the training set.
Abstract: Training set characteristics can have a significant effect on the performance of an image classification In this paper the effect of variations in training set size and composition on the accuracy of classifications of synthetic and remotely sensed data sets by an artificial neural network and discriminant analysis are assessed Attention is focused on the effects of variations in the overall size of the training set, in terms of the number of training samples, as well as on variations in the size of individual classes in the training set The results showed that higher classification accuracies were generally derived from the artificial neural network, especially when small training sets only were available It was also apparent that the opportunity of the artificial neural network to learn class appearance was influenced by the composition of the training set The results indicated that the size of each class in the training set had an effect similar to that of including a priori probabilitie

Journal Article
TL;DR: In this paper a feed-forward artificial neural network using a variant of the back-propagation learning algorithm was used to classify agricultural crops from synthetic aperture radar data, demonstrating the dependency of the two classification techniques on representative training samples and normally distributed data.
Abstract: Artificial neural networks have considerable potential for the classification of remotely sensed data. In this paper a feed-forward artificial neural network using a variant of the back-propagation learning algorithm was used to classify agricultural crops from synthetic aperture radar data. The performance of the classification, in terms of classification accuracy, was assessed relative to a conventional statistical classifier, a discriminant analysis. Classifications of training data sets showed that the artificial neural network appears able to characterize classes better than the discriminant analysis, with accuracies of up to 98 percent observed. This better characterization of the training data need not, however, translate into a significantly more accurate classification of an independent testing set. The results of a series of classifications are presented which show that in general markedly higher classification accuracies may be obtained from the artificial neural network, except when a priori information on class occurrence is incorporated into the discriminant analysis, when the classification performance was similar to that of the artificial neural network. These and other issues were analyzed further with reference to classifications of synthetic data sets. The results illustrate the dependency of the two classification techniques on representative training samples and normally distributed data

Journal ArticleDOI
TL;DR: It is found that texture features at large pixel distances are important for the classification task and a linear discriminant classifier using the multiresolution texture features can effectively classify masses from normal tissue on mammograms.
Abstract: We investigated the feasibility of using multiresolution texture analysis for differentiation of masses from normal breast tissue on mammograms. The wavelet transform was used to decompose regions of interest (ROIs) on digitized mammograms into several scales. Multiresolution texture features were calculated from the spatial gray level dependence matrices of (1) the original images at variable distances between the pixel pairs, (2) the wavelet coefficients at different scales, and (3) the wavelet coefficients up to certain scale and then at variable distances between the pixel pairs. In this study, 168 ROIs containing biopsy-proven masses and 504 ROIs containing normal parenchyma were used as the data set. The mass ROIs were randomly and equally divided into training and test groups along with corresponding normal ROIs from the same film. Stepwise linear discriminant analysis was used to select optimal features from the multiresolution texture feature space to maximize the separation of mass and normal tissue for all ROIs. We found that texture features at large pixel distances are important for the classification task. The wavelet transform can effectively condense the image information into its coefficients. With texture features based on the wavelet coefficients and variable distances, the area Az under the receiver operating characteristic curve reached 0.89 and 0.86 for the training and test groups, respectively. The results demonstrate that a linear discriminant classifier using the multiresolution texture features can effectively classify masses from normal tissue on mammograms.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of discriminant analysis and back-propagation neural networks in predicting reservoir properties by considering log and core data from a shaly glauconitic reservoir.
Abstract: The application of a genetic reservoir characterisation concept to the calculation of petrophysical properties requires the prediction of lithofacies followed by the assignment of petrophysical properties according to the specific lithofacies predicted. Common classification methods which fulfil this task include discriminant analysis and back-propagation neural networks. While discriminant analysis is a well-established statistical classification method back-propagation neural networks are relatively new and their performance in predicting lithofacies porosity and permeability when compared to discriminant analysis has not been widely studied. This work compares the performance of these two methods in prediction of reservoir properties by considering log and core data from a shaly glauconitic reservoir.

Proceedings Article
01 Jan 1995
TL;DR: A gene structure prediction system FGENE has been developed based on the exon recognition functions and compares very favorably with the other programs currently used to predict protein-coding regions.
Abstract: Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project Discriminant analysis was applied to the construction of recognition functions for various components of gene structure Linear discriminant functions for splice sites, 5’coding, internal exon, and 3’-coding region recognition have been developed A gene structure prediction system FGENE has been developed based on the exon recognition functions We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 090 Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C=086) FGENE compares very favorably with the other programs currently used to predict proteincoding regions Analysis of uncharacterized human sequences based on our methods for splice site (HSPL, RNASPL),

Proceedings Article
27 Nov 1995
TL;DR: A locally adaptive form of nearest neighbor classification is proposed to try to finesse this curse of dimensionality, and a method for global dimension reduction, that combines local dimension information is proposed.
Abstract: Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions We propose a locally adaptive form of nearest neighbor classification to try to finesse this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighborhoods. We determine the local decision boundaries from centroid information, and then shrink neighborhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighborhood-based classifier can be employed, using the modified neighborhoods. We also propose a method for global dimension reduction, that combines local dimension information. We indicate how these techniques can be extended to the regression problem.

Proceedings ArticleDOI
20 Jun 1995
TL;DR: A multiclass, multivariate discriminant analysis to automatically select the most discriminating features (MDF), a space partition tree to achieve a logarithmic retrieval time complexity for a database of n items, and a general interpolation scheme to do view inference and generalization in the MDF space based on a small number of training samples are presented.
Abstract: We present a self-organizing framework called the SHOSLIF-M for learning and recognizing spatiotemporal events (or patterns) from intensity image sequences. The proposed framework consists of a multiclass, multivariate discriminant analysis to automatically select the most discriminating features (MDF), a space partition tree to achieve a logarithmic retrieval time complexity for a database of n items, and a general interpolation scheme to do view inference and generalization in the MDF space based on a small number of training samples. The system is tested to recognize 28 different hand signs. The experimental results show that the learned system can achieve a 96% recognition rate for test sequences that have not been used in the training phase. >

Journal ArticleDOI
TL;DR: The authors explore the potential of artificial neural networks in assisting industrial marketers faced with a segmentation problem by comparing their classification ability with discriminant analysis and logistic regression.

Journal ArticleDOI
TL;DR: The neural network solutions do not achieve the 'magical' results that literature in this field often promises, although there are notable 'pockets' of superior performance by the neural networks, depending on particular combinations of proportions of bankrupt firms in training and testing data sets and assumptions about the relative costs of Type I and Type II errors.
Abstract: This paper investigates the performance of Artificial Neural Networks for the classification and subsequent prediction of business entities into failed and non-failed classes. Two techniques, back-propagation and Optimal Estimation Theory OET, are used to train the neural networks to predict bankruptcy filings. The data are drawn from Compustat data tapes representing a cross-section of industries. The results obtained with the neural networks are compared with other well-known bankruptcy prediction techniques such as discriminant analysis, probit and logit, as well as against benchmarks provided by directly applying the bankruptcy prediction models developed by Altman 1968 and Ohlson 1980 to our data set. We control the degree of 'disproportionate sampling' by creating 'training' and 'testing' populations with proportions of bankrupt firms ranging from 1% to 50%. For each population, we apply each technique 50 times to determine stable accuracy rates in terms of Type I, Type II and Total Error. We show that the performance of various classification techniques, in terms of their classification errors, depends on the proportions of bankrupt firms in the training and testing data sets, the variables used in the models, and assumptions about the relative costs of Type I and Type II errors. The neural network solutions do not achieve the 'magical' results that literature in this field often promises, although there are notable 'pockets' of superior performance by the neural networks, depending on particular combinations of proportions of bankrupt firms in training and testing data sets and assumptions about the relative costs of Type I and Type II errors. However, since we tested only one architecture for the neural network, it will be necessary to investigate potential improvements in neural network performance through systematic changes in neural network architecture.

Journal ArticleDOI
TL;DR: In this paper, a new classification strategy called Computerized Consensus Diagnosis (CCD) is proposed to provide robust, reliable classification of biomedical data, which involves cross-validated training of several classifiers of diverse conceptual and methodological origin on the same data, and appropriately combining their outcomes.
Abstract: We introduce and apply a new classification strategy we call computerized consensus diagnosis (CCD). Its purpose is to provide robust, reliable classification of biomedical data. The strategy involves the cross-validated training of several classifiers of diverse conceptual and methodological origin on the same data, and appropriately combining their outcomes. The strategy is tested on proton magnetic resonance spectra of human thyroid biopsies, which are successfully allocated to normal or carcinoma classes. We used Linear Discriminant Analysis, a Neural Net-based method, and Genetic Programming as independent classifiers on two spectral regions, and chose the median of the six classification outcomes as the consensus. This procedure yielded 100% specificity and 100% sensitivity on the training sets, and 100% specificity and 98% sensitivity on samples of known malignancy in the test sets. We discuss the necessary steps any classification approach must take to guarantee reliability, and stress the importance of fuzziness and undecidability in robust classification.

Journal ArticleDOI
TL;DR: The results indicate that frequency and topographical information about the EEG provides useful knowledge with regard to the nature of cognitive activity.

Journal ArticleDOI
TL;DR: A probabilistic interpretation is presented for two important issues in neural network based classification, namely the interpretation of discriminative training criteria and the neural network outputs as well as the interpretation in terms of weighted maximum likelihood estimation.
Abstract: A probabilistic interpretation is presented for two important issues in neural network based classification, namely the interpretation of discriminative training criteria and the neural network outputs as well as the interpretation of the structure of the neural network. The problem of finding a suitable structure of the neural network can be linked to a number of well established techniques in statistical pattern recognition. Discriminative training of neural network outputs amounts to approximating the class or posterior probabilities of the classical statistical approach. This paper extends these links by introducing and analyzing novel criteria such as maximizing the class probability and minimizing the smoothed error rate. These criteria are defined in the framework of class conditional probability density functions. We show that these criteria can be interpreted in terms of weighted maximum likelihood estimation. In particular, this approach covers widely used techniques such as corrective training, learning vector quantization, and linear discriminant analysis. >

Journal ArticleDOI
TL;DR: This work presents a new approach to solving classification problems by combining the predictions of a well-known statistical tool with those of an NN to create composite predictions that are more accurate than either of the individual techniques used in isolation.
Abstract: A number of recent studies have compared the performance of neural networks (NNs) to a variety of statistical techniques for the classification problem in discriminant analysis. The empirical results of these comparative studies indicate that while NNs often outperform the more traditional statistical approaches to classification, this is not always the case. Thus, decision makers interested in solving classification problems are left in a quandary as to what tool to use on a particular data set. We present a new approach to solving classification problems by combining the predictions of a well-known statistical tool with those of an NN to create composite predictions that are more accurate than either of the individual techniques used in isolation.

Journal ArticleDOI
TL;DR: New techniques that apply machine learning and discriminant analysis show promise as alternatives to neural networks in protein secondary structure prediction methods.