scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 2001"


Journal ArticleDOI
TL;DR: In this article, the authors show that when the training data set is small, PCA can outperform LDA and, also, that PCA is less sensitive to different training data sets.
Abstract: In the context of the appearance-based paradigm for object recognition, it is generally believed that algorithms based on LDA (linear discriminant analysis) are superior to those based on PCA (principal components analysis). In this communication, we show that this is not always the case. We present our case first by using intuitively plausible arguments and, then, by showing actual results on a face database. Our overall conclusion is that when the training data set is small, PCA can outperform LDA and, also, that PCA is less sensitive to different training data sets.

3,102 citations


Journal ArticleDOI
TL;DR: However, for a task with very high dimensional data such as images, the traditional LDA algorithm encounters several difficulties, and before LDA can be used to reduce dimensionality, another procedure has to be first applied for dimensionality reduction.

1,579 citations


Journal ArticleDOI
TL;DR: In this paper, a comparative study using three different chemometric techniques to evaluate both spatial and temporal changes in Suquia River water quality, with a special emphasis on the improvement obtained using discriminant analysis for such evaluation.

859 citations


Journal ArticleDOI
TL;DR: A class of computationally inexpensive linear dimension reduction criteria is derived by introducing a weighted variant of the well-known K-class Fisher criterion associated with linear discriminant analysis (LDA).
Abstract: We derive a class of computationally inexpensive linear dimension reduction criteria by introducing a weighted variant of the well-known K-class Fisher criterion associated with linear discriminant analysis (LDA). It can be seen that LDA weights contributions of individual class pairs according to the Euclidean distance of the respective class means. We generalize upon LDA by introducing a different weighting function.

471 citations


Journal ArticleDOI
TL;DR: By extracting uncorrelated discriminant features, face recognition could be performed with higher accuracy on lower than 16×16 resolution mosaic images and it is suggested that the optimal face image resolution can be regarded as the resolution m × n which makes the dimensionality N = mn of the original image vector space be larger and closer to the number of known-face classes.

383 citations


Journal ArticleDOI
TL;DR: In this paper, a landslide susceptibility map, based on the scores of the discriminant function, has been prepared for Ensija range in the Eastern Pyrenees, and an index of relative landslide density has been obtained.
Abstract: Several multivariate statistical analyses have been performed to identify the most influential geological and geomorphological parameters on shallow landsliding and to quantify their relative contribution. A data set was first prepared including more than 30 attributes of 230 failed and unfailed slopes. The performance of principal component analysis, t-test and one-way test, allowed a preliminary selection of the most significant variables, which were used as input variables for the discriminant analysis. The function obtained has classified successfully 88·5 per cent of the overall slope population and 95·6 per cent of the failed slopes. Slope gradient, watershed area and land-use appeared as the most powerful discriminant factors. A landslide susceptibility map, based on the scores of the discriminant function, has been prepared for Ensija range in the Eastern Pyrenees. An index of relative landslide density shows that the results of the map are consistent. Copyright © 2001 John Wiley & Sons, Ltd. Language: en

367 citations


Journal ArticleDOI
TL;DR: FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one‐ or two‐dimensional pictorial representation of a set of curves.
Abstract: We introduce a technique for extending the classical method of linear discriminant analysis (LDA) to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis (FLDA), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one- or two-dimensional pictorial representation of a set of curves. We also extend this procedure to provide generalizations of quadratic and regularized discriminant analysis.

309 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a discussion of the collinearity problem in regression and discriminant analysis, and explain the reasons why it is a problem for the prediction ability and classification ability of the classical methods.
Abstract: This paper presents a discussion of the collinearity problem in regression and discriminant analysis. The paper describes reasons why the collinearity is a problem for the prediction ability and classification ability of the classical methods. The discussion is based on established formulae for prediction errors. Special emphasis is put on differences and similarities between regression and classification. Some typical ways of handling the collinearity problems based on PCA will be described. The theoretical discussion will be accompanied by empirical illustrations.

273 citations


Journal ArticleDOI
TL;DR: A general framework to incorporate feature (gene) selection into pattern recognition in the process to identify biomarkers is proposed and three feature wrappers that search through the space of feature subsets using the classification error as measure of goodness for a particular feature subset being "wrapped around" are developed.
Abstract: Gene expression studies bridge the gap between DNA information and trait information by dissecting biochemical pathways into intermediate components between genotype and phenotype. These studies open new avenues for identifying complex disease genes and biomarkers for disease diagnosis and for assessing drug efficacy and toxicity. However, the majority of analytical methods applied to gene expression data are not efficient for biomarker identification and disease diagnosis. In this paper, we propose a general framework to incorporate feature (gene) selection into pattern recognition in the process to identify biomarkers. Using this framework, we develop three feature wrappers that search through the space of feature subsets using the classification error as measure of goodness for a particular feature subset being "wrapped around": linear discriminant analysis, logistic regression, and support vector machines. To effectively carry out this computationally intensive search process, we employ sequential forward search and sequential forward floating search algorithms. To evaluate the performance of feature selection for biomarker identification we have applied the proposed methods to three data sets. The preliminary results demonstrate that very high classification accuracy can be attained by identified composite classifiers with several biomarkers.

271 citations


Journal ArticleDOI
TL;DR: The new coding and face recognition method, EFC, performs the best among the eigenfaces method using L(1) or L(2) distance measure, and the Mahalanobis distance classifiers using a common covariance matrix for all classes or a pooled within-class covariance Matrix.
Abstract: This paper introduces a new face coding and recognition method, the enhanced Fisher classifier (EFC), which employs the enhanced Fisher linear discriminant model (EFM) on integrated shape and texture features. Shape encodes the feature geometry of a face while texture provides a normalized shape-free image. The dimensionalities of the shape and the texture spaces are first reduced using principal component analysis, constrained by the EFM for enhanced generalization. The corresponding reduced shape and texture features are then combined through a normalization procedure to form the integrated features that are processed by the EFM for face recognition. Experimental results, using 600 face images corresponding to 200 subjects of varying illumination and facial expressions, show that (1) the integrated shape and texture features carry the most discriminating information followed in order by textures, masked images, and shape images, and (2) the new coding and face recognition method, EFC, performs the best among the eigenfaces method using L/sub 1/ or L/sub 2/ distance measure, and the Mahalanobis distance classifiers using a common covariance matrix for all classes or a pooled within-class covariance matrix. In particular, EFC achieves 98.5% recognition accuracy using only 25 features.

263 citations


Journal ArticleDOI
TL;DR: Results showed that the MI is an effective feature selection criterion for nonlinear CAD models overcoming some of the well-known limitations and computational complexities of other popular feature selection techniques in the field.
Abstract: The purpose of this study was to investigate an information theoretic approach to feature selection for computer-aided diagnosis(CAD). The approach is based on the mutual information (MI) concept. MI measures the general dependence of random variables without making any assumptions about the nature of their underlying relationships. Consequently, MI can potentially offer some advantages over feature selection techniques that focus only on the linear relationships of variables. This study was based on a database of statistical texture features extracted from perfusion lung scans. The ultimate goal was to select the optimal subset of features for the computer-aided diagnosis of acute pulmonary embolism (PE). Initially, the study addressed issues regarding the approximation of MI in a limited dataset as it is often the case in CAD applications. The MI selected features were compared to those features selected using stepwise linear discriminant analysis and genetic algorithms for the same PE database. Linear and nonlinear decision models were implemented to merge the selected features into a final diagnosis. Results showed that the MI is an effective feature selection criterion for nonlinear CAD models overcoming some of the well-known limitations and computational complexities of other popular feature selection techniques in the field.

Proceedings Article
28 Jun 2001
TL;DR: This paper proposes an algorithm for constructing a kernel Fisher discriminant from training examples with noisy labels and utilises an expectation maximization (EM) algorithm for updating the probabilities.
Abstract: Data noise is present in many machine learning problems domains, some of these are well studied but others have received less attention. In this paper we propose an algorithm for constructing a kernel Fisher discriminant (KFD) from training examples with noisy labels. The approach allows to associate with each example a probability of the label being flipped. We utilise an expectation maximization (EM) algorithm for updating the probabilities. The E-step uses class conditional probabilities estimated as a by-product of the KFD algorithm. The M-step updates the flip probabilities and determines the parameters of the discriminant. We demonstrate the feasibility of the approach on two real-world data-sets.

Journal ArticleDOI
TL;DR: The results indicate that combining texture features with morphological features extracted from automatically segmented mass boundaries will be an effective approach for computer-aided characterization of mammographic masses.
Abstract: We are developing new computer vision techniques for characterization of breast masses on mammograms. We had previously developed a characterization method based on texture features. The goal of the present work was to improve our characterization method by making use of morphological features. Toward this goal, we have developed a fully automated, three-stage segmentation method that includes clustering, active contour, and spiculation detection stages. After segmentation, morphological features describing the shape of the mass were extracted. Texture features were also extracted from a band of pixels surrounding the mass. Stepwise feature selection and linear discriminant analysis were employed in the morphological, texture, and combined feature spaces for classifier design. The classification accuracy was evaluated using the area Az under the receiver operating characteristic curve. A data set containing 249 films from 102 patients was used. When the leave-one-case-out method was applied to partition the data set into trainers and testers, the average test Az for the task of classifying the mass on a single mammographic view was 0.83 +/- 0.02, 0.84 +/- 0.02, and 0.87 +/- 0.02 in the morphological, texture, and combined feature spaces, respectively. The improvement obtained by supplementing texture features with morphological features in classification was statistically significant (p = 0.04). For classifying a mass as malignant or benign, we combined the leave-one-case-out discriminant scores from different views of a mass to obtain a summary score. In this task, the test Az value using the combined feature space was 0.91 +/- 0.02. Our results indicate that combining texture features with morphological features extracted from automatically segmented mass boundaries will be an effective approach for computer-aided characterization of mammographic masses.

Proceedings ArticleDOI
07 May 2001
TL;DR: New acoustic features for continuous speech recognition based on the short-term Fourier phase spectrum are introduced for mono (telephone) recordings and improvements in word error rate are obtained.
Abstract: New acoustic features for continuous speech recognition based on the short-term Fourier phase spectrum are introduced for mono (telephone) recordings. The new phase based features were combined with standard Mel Frequency Cepstral Coefficients (MFCC), and results were produced with and without using additional linear discriminant analysis (LDA) to choose the most relevant features. Experiments were performed on the SieTill corpus for telephone line recorded German digit strings. Using LDA to combine purely phase based features with MFCCs, we obtained improvements in word error rate of up to 25% relative to using MFCCs alone with the same overall number of parameters in the system.

Journal ArticleDOI
TL;DR: This paper investigates the use of Fisher's linear discriminant function, coupled with feature selection techniques as a means for selecting neural network training features for power system security assessment.
Abstract: One of the most important considerations in applying neural networks to power system security assessment is the proper selection of training features. Modem interconnected power systems often consist of thousands of pieces of equipment, each of which may have an affect on the security of the system. Neural networks have shown great promise for their ability to quickly and accurately predict the system security when trained with data collected from a small subset of system variables. This paper investigates the use of Fisher's linear discriminant function, coupled with feature selection techniques as a means for selecting neural network training features for power system security assessment. A case study is performed on the IEEE 50-generator system to illustrate the effectiveness of the proposed techniques.

Journal ArticleDOI
TL;DR: In this paper, a permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion, which can be viewed as pre-processors, aiding the analyst's understanding of the data and the choice of a final classifier.
Abstract: Summary This paper discusses visualization methods for discriminant analysis. It does not address numerical methods for classification per se, but rather focuses on graphical methods that can be viewed as pre-processors, aiding the analyst’s understanding of the data and the choice of a final classifier. The methods are adaptations of recent results in dimension reduction for regression, including sliced inverse regression and sliced average variance estimation. A permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion.

Proceedings ArticleDOI
09 Dec 2001
TL;DR: This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker based on a corpus of human-machine dialogues recorded from a commercial application deployed by SpeechWorks.
Abstract: This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogues recorded from a commercial application deployed by SpeechWorks. Linear discriminant classification with Gaussian class-conditional probability distribution and k-nearest neighbors methods are used to classify utterances into two basic emotion states, negative and non-negative The features used by the classifiers are utterance-level statistics of the fundamental frequency and energy of the speech signal. To improve classification performance, two specific feature selection methods are used; namely, promising first selection and forward feature selection. Principal component analysis is used to reduce the dimensionality of the features while maximizing classification accuracy. Improvements obtained by feature selection and PCA are reported. We also report the results.

Journal ArticleDOI
TL;DR: The least-squares fitting problem proposed here is mathematically formalized as a quadratic constrained minimization problem with mixed variables and an iterative alternating least-Squares algorithm based on two main steps is proposed to solve the quadRatic constrained problem.

Journal ArticleDOI
TL;DR: By comparing several feature selection methods, this work demonstrates how phenotypic classes can be predicted by combining feature selection and discriminant analysis and shows that the right dimension reduction strategy is of crucial importance for the classification performance.
Abstract: Molecular portraits, such as mRNA expression or DNA methylation patterns, have been shown to be strongly correlated with phenotypical parameters. These molecular patterns can be revealed routinely on a genomic scale. However, class prediction based on these patterns is an under-determined problem, due to the extreme high dimensionality of the data compared to the usually small number of available samples. This makes a reduction of the data dimensionality necessary. Here we demonstrate how phenotypic classes can be predicted by combining feature selection and discriminant analysis. By comparing several feature selection methods we show that the right dimension reduction strategy is of crucial importance for the classification performance. The techniques are demonstrated by methylation pattern based discrimination between acute lymphoblastic leukemia and acute myeloid leukemia.

Journal ArticleDOI
TL;DR: A hybrid method that combines the best features of several classification models is developed to increase the prediction performance and empirical tests show that such a hybrid method produces higher prediction accuracy than individual classifiers.
Abstract: This paper uses a data mining approach to the prediction of corporate failure. Initially, we use four single classifiers — discriminant analysis, logistic regression, neural networks and C5.0 — each based on two feature selection methods for predicting corporate failure. Of the two feature selection methods — human judgement based on financial theory and ANOVA statistical method — we found the ANOVA method performs better than the human judgement method in all classifiers except discriminant analysis. Among the individual classifiers, decision trees and neural networks were found to provide better results. Finally, a hybrid method that combines the best features of several classification models is developed to increase the prediction performance. The empirical tests show that such a hybrid method produces higher prediction accuracy than individual classifiers.

Journal ArticleDOI
TL;DR: A linear constrained distance-based discriminant analysis (LCDA) that uses a criterion for optimality derived from Fisher's ratio criterion that maximizes the ratio of inter-distance between classes to intra-distance within classes but also imposes a constraint that all class centers must be aligned along predetermined directions.

Book
01 Jan 2001
TL;DR: In this article, the authors present a taxonomy of pattern classification algorithms and their generalization error for different types of classifiers, such as the Fisher Linear Discriminant Function (FLDF), the Anderson-Bahadur Linear DF (ALDF), and the Parzen Window Classifier.
Abstract: 1. Quick Overview.- 1.1 The Classifier Design Problem.- 1.2 Single Layer and Multilayer Perceptrons.- 1.3 The SLP as the Euclidean Distance and the Fisher Linear Classifiers.- 1.4 The Generalisation Error of the EDC and the Fisher DF.- 1.5 Optimal Complexity - The Scissors Effect.- 1.6 Overtraining in Neural Networks.- 1.7 Bibliographical and Historical Remarks.- 2. Taxonomy of Pattern Classification Algorithms.- 2.1 Principles of Statistical Decision Theory.- 2.2 Four Parametric Statistical Classifiers.- 2.2.1 The Quadratic Discriminant Function.- 2.2.2 The Standard Fisher Linear Discriminant Function.- 2.2.3 The Euclidean Distance Classifier.- 2.2.4 The Anderson-Bahadur Linear DF.- 2.3 Structures of the Covariance Matrices.- 2.3.1 A Set of Standard Assumptions.- 2.3.2 Block Diagonal Matrices.- 2.3.3 The Tree Type Dependence Models.- 2.3.4 Temporal Dependence Models.- 2.4 The Bayes Predictive Approach to Design Optimal Classification Rules.- 2.4.1 A General Theory.- 2.4.2 Learning the Mean Vector.- 2.4.3 Learning the Mean Vector and CM.- 2.4.4 Qualities and Shortcomings.- 2.5. Modifications of the Standard Linear and Quadratic DF.- 2.5.1 A Pseudo-Inversion of the Covariance Matrix.- 2.5.2 Regularised Discriminant Analysis (RDA).- 2.5.3 Scaled Rotation Regularisation.- 2.5.4 Non-Gausian Densities.- 2.5.5 Robust Discriminant Analysis.- 2.6 Nonparametric Local Statistical Classifiers.- 2.6.1 Methods Based on Mixtures of Densities.- 2.6.2 Piecewise-Linear Classifiers.- 2.6.3 The Parzen Window Classifier.- 2.6.4 The k-NN Rule and a Calculation Speed.- 2.6.5 Polynomial and Potential Function Classifiers.- 2.7 Minimum Empirical Error and Maximal Margin Linear Classifiers.- 2.7.1 The Minimum Empirical Error Classifier.- 2.7.2 The Maximal Margin Classifier.- 2.7.3 The Support Vector Machine.- 2.8 Piecewise-Linear Classifiers.- 2.8.1 Multimodal Density Based Classifiers.- 2.8.2 Architectural Approach to Design of the Classifiers.- 2.8.3 Decision Tree Classifiers.- 2.9 Classifiers for Categorical Data.- 2.9.1 Multinornial Classifiers.- 2.9.2 Estimation of Parameters.- 2.9.3 Decision Tree and the Multinornial Classifiers.- 2.9.4 Linear Classifiers.- 2.9.5 Nonparametric Local Classifiers.- 2.10 Bibliographical and Historical Remarks.- 3. Performance and the Generalisation Error.- 3.1 Bayes, Conditional, Expected, and Asymptotic Probabilities of Misclassification.- 3.1.1 The Bayes Probability of Misclassification.- 3.1.2 The Conditional Probability of Misclassification.- 3.1.3 The Expected Probability of Misclassification.- 3.1.4 The Asymptotic Probability of Misclassification.- 3.1.5 Learning Curves: An Overview of Different Analysis Methods.- 3.1.6 Error Estimation.- 3.2 Generalisation Error of the Euclidean Distance Classifier.- 3.2.1 The Classification Algorithm.- 3.2.2 Double Asymptotics in the Error Analysis.- 3.2.3 The Spherical Gaussian Case.- 3.2.3.1 The Case N2 = N1.- 3.2.3.2 The Case N2 ? N1.- 3.3 Most Favourable and Least Favourable Distributions of the Data.- 3.3.1 The Non-Spherical Gaussian Case.- 3.3.2 The Most Favourable Distributions of the Data.- 3.3.3 The Least Favourable Distributions of the Data.- 3.3.4 Intrinsic Dimensionality.- 3.4 Generalisation Errors for Modifications of the Standard Linear Classifier.- 3.4.1 The Standard Fisher Linear DF.- 3.4.2 The Double Asymptotics for the Expected Error.- 3.4.3 The Conditional Probability of Misc1assification.- 3.4.4 A Standard Deviation of the Conditional Error.- 3.4.5 Favourable and Unfavourable Distributions.- 3.4.6 Theory and Real-World Problems.- 3.4.7 The Linear Classifier D for the Diagonal CM.- 3.4.8 The Pseudo-Fisher Classifier.- 3.4.9 The Regularised Discriminant Analysis.- 3.5 Common Parameters in Different Competing Pattern Classes.- 3.5.1 The Generalisation Error of the Quadratic DF.- 3.5.2 The Effect of Common Parameters in Two Competing Classes.- 3.5.3 Unequal Sampie Sizes in Plug-In Classifiers.- 3.6 Minimum Empirical Error and Maximal Margin Classifiers.- 3.6.1 Favourable Distributions of the Pattern Classes.- 3.6.2 VC Bounds for the Conditional Generalisation Error.- 3.6.3 Unfavourable Distributions for the Euclidean Distance and Minimum Empirical Error Classifiers.- 3.6.4 Generalisation Error in the Spherical Gaussian Case.- 3.6.5 Intrinsic Dimensionality.- 3.6.6 The Influence of the Margin.- 3.6.7 Characteristics of the Learning Curves.- 3.7 Parzen Window Classifier.- 3.7.1 The Decision Boundary of the PW Classifier with Spherical Kerneis.- 3.7.2 The Generalisation Error.- 3.7.3 Intrinsic Dimensionality.- 3.7.4 Optimal Value of the Smoothing Parameter.- 3.7.5 The k-NN Rule.- 3.8 Multinomial Classifier.- 3.9 Bibliographical and Historical Remarks.- 4. Neural Network Classifiers.- 4.1 Training Dynamics of the Single Layer Perceptron.- 4.1.1 The SLP and its Training Rule.- 4.1.2 The SLP as Statistical Classifier.- 4.1.2.1 The Euclidean Distance Classifier.- 4.1.2.2 The Regularised Discriminant Analysis.- 4.1.2.3 The Standard Linear Fisher Classifier.- 4.1.2.4 The Pseudo-Fisher Classifier.- 4.1.2.5 Dynamics of the Magnitudes of the Weights.- 4.1.2.6 The Robust Discriminant Analysis.- 4.1.2.7 The Minimum Empirical Error Classifier.- 4.1.2.8 The Maximum Margin (Support Vector) Classifier.- 4.1.3 Training Dynamics and Generalisation.- 4.2 Non-linear Decision Boundaries.- 4.2.1 The SLP in Transformed Feature Space.- 4.2.2 The MLP Classifier.- 4.2.3 Radial Basis-Function Networks.- 4.2.4 Learning Vector Quantisation Networks.- 4.3 Training Peculiarities of the Perceptrons.- 4.3.1 Cost Function Surfaces of the SLP Classifier.- 4.3.2 Cost Function Surfaces of the MLP Classifier.- 4.3.3 The Gradient Minimisation of the Cost Function.- 4.4 Generalisation of the Perceptrons.- 4.4.1 Single Layer Perceptron.- 4.4.1.1 Theoretical Background.- 4.4.1.2 The Experiment Design.- 4.4.1.3 The SLP and Parametric Classifiers.- 4.4.1.4 The SLP and Structural (Nonparametric) Classifiers.- 4.4.2 Multilayer Perceptron.- 4.4.2.1 Weights of the Hidden Layer Neurones are Common for all Outputs.- 4.4.2.2 Intrinsic Dimensionality Problems.- 4.4.2.3 An Effective Capacity of the Network.- 4.5 Overtraining and Initialisation.- 4.5.1 Overtraining.- 4.5.2 Effect of Initial Values.- 4.6 Tools to Control Complexity.- 4.6.1 The Number of Iterations.- 4.6.2 The Weight Decay Term.- 4.6.3 The Antiregularisation Technique.- 4.6.4 Noise Injection.- 4.6.4.1 Noise Injection into Inputs.- 4.6.4.2 Noise Injection into the Weights and into the Outputs of the Network.- 4.6.4.3 "Coloured" Noise Injection into Inputs.- 4.6.5 Control of Target Values.- 4.6.6 The Learning Step.- 4.6.7 Optimal Values of the Training Parameters.- 4.6.8 Learning Step in the Hidden Layer of MLP.- 4.6.9 Sigmoid Scaling.- 4.7 The Co-Operation of the Neural Networks.- 4.7.1 The Boss Decision Rule.- 4.7.2 Small Sampie Problems and Regularisation.- 4.8 Bibliographical and Historical Remarks.- 5. Integration of Statistical and Neural Approaches.- 5.1 Statistical Methods or Neural Nets?.- 5.2 Positive and Negative Attributes of Statistical Pattern Recognition.- 5.3 Positive and Negative Attributes of Artificial Neural Networks.- 5.4 Merging Statistical Classifiers and Neural Networks.- 5.4.1 Three Key Points in the Solution.- 5.4.2 Data Transformation or Statistical Classifier?.- 5.4.3 The Training Speed and Data Whitening Transformation.- 5.4.4 Dynamics of the Classifier after the Data Whitening Transformation.- 5.5 Data Transformations for the Integrated Approach.- 5.5.1 Linear Transformations.- 5.5.2 Non-linear Transformations.- 5.5.3 Performance of the Integrated Classifiers in Solving Real-World Problems.- 5.6 The Statistical Approach in Multilayer Feed-forward Networks.- 5.7 Concluding and Bibliographical Remarks.- 6. Model Selection.- 6.1 Classification Errors and their Estimation Methods.- 6.1.1 Types of Classification Error.- 6.1.2 Taxonomy of Error Rate Estimation Methods.- 6.1.2.1 Methods for Splitting the Design Set into Training and Validation Sets.- 6.1.2.2 Practical Aspects of using the Leave-One-Out Method.- 6.1.2.3 Pattern Error Functions.- 6.2 Simplified Performance Measures.- 6.2.1 Performance Criteria for Feature Extraction.- 6.2.1.1 Unsupervised Feature Extraction.- 6.2.1.2 Supervised Feature Extraction.- 6.2.2 Performance Criteria for Feature Selection.- 6.2.3 Feature Selection Strategies.- 6.3 Accuracy of Performance Estimates.- 6.3.1 Error Counting Estimates.- 6.3.1.1 The Hold-Out Method.- 6.3.1.2 The Resubstitution Estimator.- 6.3.1.3 The Leaving-One-Out Estimator.- 6.3.1.4 The Bootstrap Estimator.- 6.3.2 Parametric Estimators for the Linear Fisher Classifier.- 6.3.3 Associations Between the Classification Performance Measures.- 6.4 Feature Ranking and the Optimal Number of Feature.- 6.4.1 The Complexity of the Classifiers.- 6.4.2 Feature Ranking.- 6.4.3 Determining the Optimal Number of Features.- 6.5 The Accuracy of the Model Selection.- 6.5.1 True, Apparent and Ideal Classification Errors.- 6.5.2 An Effect of the Number of Variants.- 6.5.3 Evaluation of the Bias.- 6.6 Additional Bibliographical Remarks.- Appendices.- A.1 Elements of Matrix Algebra.- A.2 The First Order Tree Type Dependence Model.- A.3 Temporal Dependence Models.- A.4 Pikelis Algorithm for Evaluating Means and Variances of the True, Apparent and Ideal Errors in Model Selection.- A.5 Matlab Codes (the Non-Linear SLP Training, the First Order Tree Dependence Model, and Data Whitening Transformation).- References.

Proceedings ArticleDOI
08 Dec 2001
TL;DR: This work extends FERET by considering when differences in recognition rates are statistically distinguishable subject to changes in test imagery and makes the source code for the algorithms, scoring procedures and Monte Carlo study available in the hopes others will extend this comparison to newer algorithms.
Abstract: The FERET evaluation compared recognition rates for different semi-automated and automated face recognition algorithms. We extend FERET by considering when differences in recognition rates are statistically distinguishable subject to changes in test imagery. Nearest Neighbor classifiers using principal component and linear discriminant subspaces are compared using different choices of distance metric. Probability distributions for algorithm recognition rates and pairwise differences in recognition rates are determined using a permutation methodology. The principal component subspace with Mahalanobis distance is the best combination; using L2 is second best. Choice of distance measure for the linear discriminant subspace matters little, and performance is always worse than the principal components classifier using either Mahalanobis or L1 distance. We make the source code for the algorithms, scoring procedures and Monte Carlo study available in the hopes others will extend this comparison to newer algorithms.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: Experimental results show that fusion of evidences from multi-views can produce better results than using the result from a single view, and that this kernel machine based approach for learning nonlinear mappings for multi-view face detection and pose estimation yields high detection and low false alarm rates.
Abstract: Face images are subject to changes in view and illumination. Such changes cause data distribution to be highly nonlinear and complex in the image space. It is desirable to learn a nonlinear mapping from the image space to a low dimensional space such that the distribution becomes simpler tighter and therefore more predictable for better modeling effaces. In this paper we present a kernel machine based approach for learning such nonlinear mappings. The aim is to provide an effective view-based representation for multi-view face detection and pose estimation. Assuming that the view is partitioned into a number of distinct ranges, one nonlinear view-subspace is learned for each (range of) view from a set of example face images of that view (range), by using kernel principal component analysis (KPCA). Projections of the data onto the view-subspaces are then computed as view-based nonlinear features. Multi-view face detection and pose estimation are performed by classifying a face into one of the facial views or into the nonface class, by using a multi-class kernel support vector classifier (KSVC). Experimental results show that fusion of evidences from multi-views can produce better results than using the result from a single view; and that our approach yields high detection and low false alarm rates in face detection and good accuracy in pose estimation, in comparison with the linear counterpart composed of linear principal component analysis (PCA) feature extraction and Fisher linear discriminant based classification (FLDC).

Journal ArticleDOI
TL;DR: It is proved that the classical optimal discriminant vectors are equivalent to UODV, which can be used to extract (L−1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion function.

Journal ArticleDOI
TL;DR: Different supervised pattern recognition treatments were applied to the signals generated by an electronic nose for the classification of vegetable oils, indicating good classification and prediction capabilities and neural networks being those that afforded the best results.

Journal ArticleDOI
TL;DR: Using a real data set regarding Japanese banks and a large simulation study, this research confirms that the Extended DEA-DA outperforms conventional linear and nonlinear discriminant analysis techniques.

Proceedings ArticleDOI
01 Jan 2001
TL;DR: The kernel-based biased discriminant analysis (KBDA) is proposed to fit the unique nature of relevance feedback as a biased classification problem and provides a trade-off between discriminant transform and regression.
Abstract: Various relevance feedback algorithms have been proposed in recent years in the area of content-based image retrieval. This paper gives a brief review and analysis on existing techniques-from early heuristic-based feature weighting schemes to recently proposed optimal learning algorithms. In addition, the kernel-based biased discriminant analysis (KBDA) is proposed to fit the unique nature of relevance feedback as a biased classification problem. As a novel variant of traditional discriminant analysis, the proposed algorithm provides a trade-off between discriminant transform and regression. The kernel form is derived to deal with non-linearity in an elegant way. Experimental results indicate that significant improvement in retrieval performance is achieved by the new scheme.

Journal ArticleDOI
TL;DR: A new classification scheme was developed to classify mammographic masses as malignant and benign by using interval change information and stepwise feature selection and linear discriminant analysis classification were used to select and merge the most useful features.
Abstract: A new classification scheme was developed to classify mammographic masses as malignant and benign by using interval change information. The masses on both the current and the prior mammograms were automatically segmented using an active contour method. From each mass, 20 run length statistics (RLS) texture features, 3 speculation features, and 12 morphological features were extracted. Additionally, 20 difference RLS features were obtained by subtracting the prior RLS features from the corresponding current RLS features. The feature space consisted of the current RLS features, the difference RLS features, the current and prior speculation features, and the current and prior mass sizes. Stepwise feature selection and linear discriminant analysis classification were used to select and merge the most useful features. A leave-one-case-out resampling scheme was used to train and test the classifier using 140 temporal image pairs (85 malignant, 55 benign) obtained from 57 biopsy-proven masses (33 malignant, 24 benign) in 56 patients. An average of 10 features were selected from the 56 training subsets: 4 difference RLS features, 4 RLS features, and 1 speculation feature from the current image, and 1 speculation feature from the prior, were most often chosen. The classifier achieved an average training A z of 0.92 and a test A z of 0.88. For comparison, a classifier was trained and tested using features extracted from the 120 current single images. This classifier achieved an average training A z of 0.90 and a test A z of 0.82. The information on the prior image significantly (p=0.015) improved the accuracy for classification of the masses.

Proceedings ArticleDOI
15 Jul 2001
TL;DR: It is proved that KMSE is identical to the kernel Fisher discriminant (KFD) except for an unimportant scale factor, and it is directly equivalent to the least square version for support vector machine (LS-SVM).
Abstract: We generalize the conventional minimum squared error (MSE) method to yield a new nonlinear learning machine by using the kernel idea and adding different regularization terms. We name it kernel minimum squared error (KMSE) algorithm, which can deal with linear and nonlinear classification and regression problems. With proper choices of the output coding schemes and regularization terms, we prove that KMSE is identical to the kernel Fisher discriminant (KFD) except for an unimportant scale factor, and it is directly equivalent to the least square version for support vector machine (LS-SVM). For continuous real output values, we find that KMSE is the kernel ridge regression (KRR) with a bias. Therefore KMSE can act as a general framework that includes KFD, LS-SVM and KRR as its particular cases. In addition, we simplify the formula to estimate the projecting direction of KFD. Experiments on artificial and real world data sets in numerical computation aspects demonstrate that KMSE is a class of powerful kernel learning machines.