scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 2009"


Journal ArticleDOI
TL;DR: This paper empirically evaluates facial representation based on statistical local features, Local Binary Patterns, for person-independent facial expression recognition, and observes that LBP features perform stably and robustly over a useful range of low resolutions of face images, and yield promising performance in compressed low-resolution video sequences captured in real-world environments.

2,098 citations


Journal ArticleDOI
TL;DR: A comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications is carried out.
Abstract: Neural networks are being used in areas of prediction and classification, the areas where statistical methods have traditionally been used. Both the traditional statistical methods and neural networks are looked upon as competing model-building techniques in literature. This paper carries out a comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications. Tabular presentations highlighting the important features of these articles are also provided. This study aims to give useful insight into the capabilities of neural networks and statistical methods used in different kinds of applications.

731 citations


Journal ArticleDOI
TL;DR: This article compares the two approaches (linear model on the one hand and two versions of random forests on the other hand) and finds both striking similarities and differences, some of which can be explained whereas others remain a challenge.
Abstract: Relative importance of regressor variables is an old topic that still awaits a satisfactory solution. When interest is in attributing importance in linear regression, averaging over orderings methods for decomposing R2 are among the state-of-the-art methods, although the mechanism behind their behavior is not (yet) completely understood. Random forests—a machine-learning tool for classification and regression proposed a few years ago—have an inherent procedure of producing variable importances. This article compares the two approaches (linear model on the one hand and two versions of random forests on the other hand) and finds both striking similarities and differences, some of which can be explained whereas others remain a challenge. The investigation improves understanding of the nature of variable importance in random forests. This article has supplementary material online.

690 citations


Journal ArticleDOI
TL;DR: In this article, the authors propose the concept of the coefficient of discrimination (COC) as a measure of explanatory power for logistic regression models, which is an extension of R2.
Abstract: Many analogues to the coefficient of determination R2 in ordinary regression models have been proposed in the context of logistic regression. Our starting point is a study of three definitions related to quadratic measures of variation. We discuss the properties of these statistics, and show that the family can be extended in a natural way by a fourth statistic with an even simpler interpretation, namely the difference between the averages of fitted values for successes and failures, respectively. We propose the name “the coefficient of discrimination” for this statistic, and recommend its use as a standard measure of explanatory power. In its intuitive interpretation, this quantity has no immediate relation to the classical versions of R2, but it turns out to be related to these by two exact relations, which imply that all these statistics are asymptotically equivalent.

591 citations


Journal ArticleDOI
TL;DR: Preliminary experimental results show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Abstract: Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.

581 citations


Journal ArticleDOI
TL;DR: An efficient version of the RLDA recently presented by Ye to cope with critical ill-posed hyperspectral image classification problems is introduced in the remote sensing community and several LDA-based classifiers are compared theoretically and experimentally with the standard LDA and theRLDA.
Abstract: This paper analyzes the classification of hyperspectral remote sensing images with linear discriminant analysis (LDA) in the presence of a small ratio between the number of training samples and the number of spectral features. In these particular ill-posed problems, a reliable LDA requires one to introduce regularization for problem solving. Nonetheless, in such a challenging scenario, the resulting regularized LDA (RLDA) is highly sensitive to the tuning of the regularization parameter. In this context, we introduce in the remote sensing community an efficient version of the RLDA recently presented by Ye to cope with critical ill-posed problems. In addition, several LDA-based classifiers (i.e., penalized LDA, orthogonal LDA, and uncorrelated LDA) are compared theoretically and experimentally with the standard LDA and the RLDA. Method differences are highlighted through toy examples and are exhaustively tested on several ill-posed problems related to the classification of hyperspectral remote sensing images. Experimental results confirm the effectiveness of the presented RLDA technique and point out the main properties of other analyzed LDA techniques in critical ill-posed hyperspectral image classification problems.

568 citations


Journal ArticleDOI
TL;DR: An unsupervised algorithm is proposed to enhance P300 evoked potentials by estimating spatial filters; the raw EEG signals are projected into the estimated signal subspace, and the results show that the proposed method is efficient and accurate.
Abstract: A brain-computer interface (BCI) is a communication system that allows to control a computer or any other device thanks to the brain activity. The BCI described in this paper is based on the P300 speller BCI paradigm introduced by Farwell and Donchin. An unsupervised algorithm is proposed to enhance P300 evoked potentials by estimating spatial filters; the raw EEG signals are then projected into the estimated signal subspace. Data recorded on three subjects were used to evaluate the proposed method. The results, which are presented using a Bayesian linear discriminant analysis classifier, show that the proposed method is efficient and accurate.

451 citations


Book
28 Sep 2009
TL;DR: This book presents a meta-analysis of Mouse Urine Spectroscopy for Salival Analysis of the Effect of Mouthwash, which highlights the importance of knowing the carrier and removal status of the gas molecule.
Abstract: Acknowledgements. Preface. 1 Introduction. 1.1 Past, Present and Future. 1.2 About this Book. Bibliography. 2 Case Studies. 2.1 Introduction. 2.2 Datasets, Matrices and Vectors. 2.3 Case Study 1: Forensic Analysis of Banknotes. 2.4 Case Study 2: Near Infrared Spectroscopic Analysis of Food. 2.5 Case Study 3: Thermal Analysis of Polymers. 2.6 Case Study 4: Environmental Pollution using Headspace Mass Spectrometry. 2.7 Case Study 5: Human Sweat Analysed by Gas Chromatography Mass Spectrometry. 2.8 Case Study 6: Liquid Chromatography Mass Spectrometry of Pharmaceutical Tablets. 2.9 Case Study 7: Atomic Spectroscopy for the Study of Hypertension. 2.10 Case Study 8: Metabolic Profiling of Mouse Urine by Gas Chromatography of Urine Extracts. 2.11 Case Study 9: Nuclear Magnetic Resonance Spectroscopy for Salival Analysis of the Effect of Mouthwash. 2.12 Case Study 10: Simulations. 2.13 Case Study 11: Null Dataset. 2.14 Case Study 12: GCMS and Microbiology of Mouse Scent Marks. Bibliography. 3 Exploratory Data Analysis. 3.1 Introduction. 3.2 Principal Components Analysis. 3.2.1 Background. 3.2.2 Scores and Loadings. 3.2.3 Eigenvalues. 3.2.4 PCA Algorithm. 3.2.5 Graphical Representation. 3.3 Dissimilarity Indices, Principal Co-ordinates Analysis and Ranking. 3.3.1 Dissimilarity. 3.3.2 Principal Co-ordinates Analysis. 3.3.3 Ranking. 3.4 Self Organizing Maps. 3.4.1 Background. 3.4.2 SOM Algorithm. 3.4.3 Initialization. 3.4.4 Training. 3.4.5 Map Quality. 3.4.6 Visualization. Bibliography. 4 Preprocessing. 4.1 Introduction. 4.2 Data Scaling. 4.2.1 Transforming Individual Elements. 4.2.2 Row Scaling. 4.2.3 Column Scaling. 4.3 Multivariate Methods of Data Reduction. 4.3.1 Largest Principal Components. 4.3.2 Discriminatory Principal Components. 4.3.3 Partial Least Squares Discriminatory Analysis Scores. 4.4 Strategies for Data Preprocessing. 4.4.1 Flow Charts. 4.4.2 Level 1. 4.4.3 Level 2. 4.4.4 Level 3. 4.4.5 Level 4. Bibliography. 5 Two Class Classifiers. 5.1 Introduction. 5.1.1 Two Class Classifiers. 5.1.2 Preprocessing. 5.1.3 Notation. 5.1.4 Autoprediction and Class Boundaries. 5.2 Euclidean Distance to Centroids. 5.3 Linear Discriminant Analysis. 5.4 Quadratic Discriminant Analysis. 5.5 Partial Least Squares Discriminant Analysis. 5.5.1 PLS Method. 5.5.2 PLS Algorithm. 5.5.3 PLS-DA. 5.6 Learning Vector Quantization. 5.6.1 Voronoi Tesselation and Codebooks. 5.6.2 LVQ1. 5.6.3 LVQ3. 5.6.4 LVQ Illustration and Summary of Parameters. 5.7 Support Vector Machines. 5.7.1 Linear Learning Machines. 5.7.2 Kernels. 5.7.3 Controlling Complexity and Soft Margin SVMs. 5.7.4 SVM Parameters. Bibliography. 6 One Class Classifiers. 6.1 Introduction. 6.2 Distance Based Classifiers. 6.3 PC Based Models and SIMCA. 6.4 Indicators of Significance. 6.4.1 Gaussian Density Estimators and Chi-Squared. 6.4.2 Hotelling's T 2 . 6.4.3 D-Statistic. 6.4.4 Q-Statistic or Squared Prediction Error. 6.4.5 Visualization of D- and Q-Statistics for Disjoint PC Models. 6.4.6 Multivariate Normality and What to do if it Fails. 6.5 Support Vector Data Description. 6.6 Summarizing One Class Classifiers. 6.6.1 Class Membership Plots. 6.6.2 ROC Curves. Bibliography. 7 Multiclass Classifiers. 7.1 Introduction. 7.2 EDC, LDA and QDA. 7.3 LVQ. 7.4 PLS. 7.4.1 PLS2. 7.4.2 PLS1. 7.5 SVM. 7.6 One against One Decisions. Bibliography. 8 Validation and Optimization. 8.1 Introduction. 8.1.1 Validation. 8.1.2 Optimization. 8.2 Classification Abilities, Contingency Tables and Related Concepts. 8.2.1 Two Class Classifiers. 8.2.2 Multiclass Classifiers. 8.2.3 One Class Classifiers. 8.3 Validation. 8.3.1 Testing Models. 8.3.2 Test and Training Sets. 8.3.3 Predictions. 8.3.4 Increasing the Number of Variables for the Classifier. 8.4 Iterative Approaches for Validation. 8.4.1 Predictive Ability, Model Stability, Classification by Majority Vote and Cross Classification Rate. 8.4.2 Number of Iterations. 8.4.3 Test and Training Set Boundaries. 8.5 Optimizing PLS Models. 8.5.1 Number of Components: Cross-Validation and Bootstrap. 8.5.2 Thresholds and ROC Curves. 8.6 Optimizing Learning Vector Quantization Models. 8.7 Optimizing Support Vector Machine Models. Bibliography. 9 Determining Potential Discriminatory Variables. 9.1 Introduction. 9.1.1 Two Class Distributions. 9.1.2 Multiclass Distributions. 9.1.3 Multilevel and Multiway Distributions. 9.1.4 Sample Sizes. 9.1.5 Modelling after Variable Reduction. 9.1.6 Preliminary Variable Reduction. 9.2 Which Variables are most Significant?. 9.2.1 Basic Concepts: Statistical Indicators and Rank. 9.2.2 T-Statistic and Fisher Weights. 9.2.3 Multiple Linear Regression, ANOVA and the F-Ratio. 9.2.4 Partial Least Squares. 9.2.5 Relationship between the Indicator Functions. 9.3 How Many Variables are Significant? 9.3.1 Probabilistic Approaches. 9.3.2 Empirical Methods: Monte Carlo. 9.3.3 Cost/Benefit of Increasing the Number of Variables. Bibliography. 10 Bayesian Methods and Unequal Class Sizes. 10.1 Introduction. 10.2 Contingency Tables and Bayes' Theorem. 10.3 Bayesian Extensions to Classifiers. Bibliography. 11 Class Separation Indices. 11.1 Introduction. 11.2 Davies Bouldin Index. 11.3 Silhouette Width and Modified Silhouette Width. 11.3.1 Silhouette Width. 11.3.2 Modified Silhouette Width. 11.4 Overlap Coefficient. Bibliography. 12 Comparing Different Patterns. 12.1 Introduction. 12.2 Correlation Based Methods. 12.2.1 Mantel Test. 12.2.2 R V Coefficient. 12.3 Consensus PCA. 12.4 Procrustes Analysis. Bibliography. Index.

402 citations


BookDOI
23 Oct 2009
TL;DR: This paper presents a meta-modelling architecture for semi-supervised image classification of hyperspectral remote sensing data using a SVM and a proposed circular validation strategy for land-cover maps updating.
Abstract: About the editors. List of authors. Preface. Acknowledgments. List of symbols. List of abbreviations. I Introduction. 1 Machine learning techniques in remote sensing data analysis (Bjorn Waske, Mathieu Fauvel, Jon Atli Benediktsson and Jocelyn Chanussot). 1.1 Introduction. 1.2 Supervised classification: algorithms and applications. 1.3 Conclusion. Acknowledgments. References. 2 An introduction to kernel learning algorithms (Peter V. Gehler and Bernhard Scholkopf). 2.1 Introduction. 2.2 Kernels. 2.3 The representer theorem. 2.4 Learning with kernels. 2.5 Conclusion. References. II Supervised image classification. 3 The Support Vector Machine (SVM) algorithm for supervised classification of hyperspectral remote sensing data (J. Anthony Gualtieri). 3.1 Introduction. 3.2 Aspects of hyperspectral data and its acquisition. 3.3 Hyperspectral remote sensing and supervised classification. 3.4 Mathematical foundations of supervised classification. 3.5 From structural risk minimization to a support vector machine algorithm. 3.6 Benchmark hyperspectral data sets. 3.7 Results. 3.8 Using spatial coherence. 3.9 Why do SVMs perform better than other methods? 3.10 Conclusions. References. 4 On training and evaluation of SVM for remote sensing applications (Giles M. Foody). 4.1 Introduction. 4.2 Classification for thematic mapping. 4.3 Overview of classification by a SVM. 4.4 Training stage. 4.5 Testing stage. 4.6 Conclusion. Acknowledgments. References. 5 Kernel Fisher's Discriminant with heterogeneous kernels (M. Murat Dundar and Glenn Fung). 5.1 Introduction. 5.2 Linear Fisher's Discriminant. 5.3 Kernel Fisher Discriminant. 5.4 Kernel Fisher's Discriminant with heterogeneous kernels. 5.5 Automatic kernel selection KFD algorithm. 5.6 Numerical results. 5.7 Conclusion. References. 6 Multi-temporal image classification with kernels (Jordi Munoz-Mari, Luis Gomez-Choa, Manel Martinez-Ramon, Jose Luis Rojo-Alvarez, Javier Calpe-Maravilla and Gustavo Camps-Valls). 6.1 Introduction. 6.2 Multi-temporal classification and change detection with kernels. 6.3 Contextual and multi-source data fusion with kernels. 6.4 Multi-temporal/-source urban monitoring. 6.5 Conclusions. Acknowledgments. References. 7 Target detection with kernels (Nasser M. Nasrabadi). 7.1 Introduction. 7.2 Kernel learning theory. 7.3 Linear subspace-based anomaly detectors and their kernel versions. 7.4 Results. 7.5 Conclusion. References. 8 One-class SVMs for hyperspectral anomaly detection (Amit Banerjee, Philippe Burlina and Chris Diehl). 8.1 Introduction. 8.2 Deriving the SVDD. 8.3 SVDD function optimization. 8.4 SVDD algorithms for hyperspectral anomaly detection. 8.5 Experimental results. 8.6 Conclusions. References. III Semi-supervised image classification. 9 A domain adaptation SVM and a circular validation strategy for land-cover maps updating (Mattia Marconcini and Lorenzo Bruzzone). 9.1 Introduction. 9.2 Literature survey. 9.3 Proposed domain adaptation SVM. 9.4 Proposed circular validation strategy. 9.5 Experimental results. 9.6 Discussions and conclusion. References. 10 Mean kernels for semi-supervised remote sensing image classification (Luis Gomez-Chova, Javier Calpe-Maravilla, Lorenzo Bruzzone and Gustavo Camps-Valls). 10.1 Introduction. 10.2 Semi-supervised classification with mean kernels. 10.3 Experimental results. 10.4 Conclusions. Acknowledgments. References. IV Function approximation and regression. 11 Kernel methods for unmixing hyperspectral imagery (Joshua Broadwater, Amit Banerjee and Philippe Burlina). 11.1 Introduction. 11.2 Mixing models. 11.3 Proposed kernel unmixing algorithm. 11.4 Experimental results of the kernel unmixing algorithm. 11.5 Development of physics-based kernels for unmixing. 11.6 Physics-based kernel results. 11.7 Summary. References. 12 Kernel-based quantitative remote sensing inversion (Yanfei Wang, Changchun Yang and Xiaowen Li). 12.1 Introduction. 12.2 Typical kernel-based remote sensing inverse problems. 12.3 Well-posedness and ill-posedness. 12.4 Regularization. 12.5 Optimization techniques. 12.6 Kernel-based BRDF model inversion. 12.7 Aerosol particle size distribution function retrieval. 12.8 Conclusion. Acknowledgments. References. 13 Land and sea surface temperature estimation by support vector regression (Gabriele Moser and Sebastiano B. Serpico). 13.1 Introduction. 13.2 Previous work. 13.3 Methodology. 13.4 Experimental results. 13.5 Conclusions. Acknowledgments. References. V Kernel-based feature extraction. 14 Kernel multivariate analysis in remote sensing feature extraction (Jeronimo Arenas-Garcia and Kaare Brandt Petersen). 14.1 Introduction. 14.2 Multivariate analysis methods. 14.3 Kernel multivariate analysis. 14.4 Sparse Kernel OPLS. 14.5 Experiments: pixel-based hyperspectral image classification. 14.6 Conclusions. Acknowledgments. References. 15 KPCA algorithm for hyperspectral target/anomaly detection (Yanfeng Gu). 15.1 Introduction. 15.2 Motivation. 15.3 Kernel-based feature extraction in hyperspectral images. 15.4 Kernel-based target detection in hyperspectral images. 15.5 Kernel-based anomaly detection in hyperspectral images. 15.6 Conclusions. Acknowledgments References. 16 Remote sensing data Classification with kernel nonparametric feature extractions (Bor-Chen Kuo, Jinn-Min Yang and Cheng-Hsuan Li). 16.1 Introduction. 16.2 Related feature extractions. 16.3 Kernel-based NWFE and FLFE. 16.4 Eigenvalue resolution with regularization. 16.5 Experiments. 16.6 Comments and conclusions. References. Index.

393 citations


Journal ArticleDOI
TL;DR: A new dimensionality reduction algorithm is developed, termed discrim inative locality alignment (DLA), by imposing discriminative information in the part optimization stage, and thorough empirical studies demonstrate the effectiveness of DLA compared with representative dimensionality Reduction algorithms.
Abstract: Spectral analysis-based dimensionality reduction algorithms are important and have been popularly applied in data mining and computer vision applications. To date many algorithms have been developed, e.g., principal component analysis, locally linear embedding, Laplacian eigenmaps, and local tangent space alignment. All of these algorithms have been designed intuitively and pragmatically, i.e., on the basis of the experience and knowledge of experts for their own purposes. Therefore, it will be more informative to provide a systematic framework for understanding the common properties and intrinsic difference in different algorithms. In this paper, we propose such a framework, named "patch alignment,rdquo which consists of two stages: part optimization and whole alignment. The framework reveals that (1) algorithms are intrinsically different in the patch optimization stage and (2) all algorithms share an almost identical whole alignment stage. As an application of this framework, we develop a new dimensionality reduction algorithm, termed discriminative locality alignment (DLA), by imposing discriminative information in the part optimization stage. DLA can (1) attack the distribution nonlinearity of measurements; (2) preserve the discriminative ability; and (3) avoid the small-sample-size problem. Thorough empirical studies demonstrate the effectiveness of DLA compared with representative dimensionality reduction algorithms.

390 citations


Proceedings Article
01 Jan 2009
TL;DR: A new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor is presented, using the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score.
Abstract: This paper presents a new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor. In this modeling, the JFA is used to define a new low-dimensional space named the total variability factor space, instead of both channel and speaker variability spaces for the classical JFA. The main contribution in this approach, is the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score. This last scoring method makes the process faster and less computation complex compared to others classical methods. We tested several intersession compensation methods in total factors, and we found that the combination of Linear Discriminate Analysis and Within Class Covariance Normalization achieved the best performance. We achieved a remarkable results using fast scoring method based only on cosine kernel especially for male trials, we yield an EER of 1.12% and MinDCF of 0.0094 on the English trials of the NIST 2008 SRE dataset. Index Terms: Total variability space, cosine kernel, fast scoring, support vector machines.

01 Jan 2009
TL;DR: The populations are to be considered as giv ing rise to observable individuals each of which may be (partially ) characterized by a set of k measurements, which are assumed to be multivariate normal, with known parameters, for each population.
Abstract: THE PROBLEM to be considered here is that of identifying, or of classifying, an observed individual as being a member of one of two "populations." This problem arises in some form in most sciences. A recent example is the problem, associated with certain international ten sions, of classifying salmon caught in the North Pacific fishery as having arisen from the Asiatic or American salmon populations. The populations are to be considered as giv ing rise to observable individuals each of which may be (partially ) characterized by a set of k measurements. The measurements of individuals from either population are distributed as if they were independent observations on a multivariate distribution of probability. These distributions are assumed to be multivariate normal, with known parameters, for each population.

Journal ArticleDOI
TL;DR: A novel face recognition method which exploits both global and local discriminative features, and which encodes the holistic facial information, such as facial contour, is proposed.
Abstract: In the literature of psychophysics and neurophysiology, many studies have shown that both global and local features are crucial for face representation and recognition. This paper proposes a novel face recognition method which exploits both global and local discriminative features. In this method, global features are extracted from the whole face images by keeping the low-frequency coefficients of Fourier transform, which we believe encodes the holistic facial information, such as facial contour. For local feature extraction, Gabor wavelets are exploited considering their biological relevance. After that, Fisher's linear discriminant (FLD) is separately applied to the global Fourier features and each local patch of Gabor features. Thus, multiple FLD classifiers are obtained, each embodying different facial evidences for face recognition. Finally, all these classifiers are combined to form a hierarchical ensemble classifier. We evaluate the proposed method using two large-scale face databases: FERET and FRGC version 2.0. Experiments show that the results of our method are impressively better than the best known results with the same evaluation protocol.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: The proposed MDA method is evaluated on the tasks of object recognition with image sets, including face recognition and object categorization, and seeks to learn an embedding space, where manifolds with different class labels are better separated, and local data compactness within each manifold is enhanced.
Abstract: This paper presents a novel discriminative learning method, called manifold discriminant analysis (MDA), to solve the problem of image set classification. By modeling each image set as a manifold, we formulate the problem as classification-oriented multi-manifolds learning. Aiming at maximizing “manifold margin”, MDA seeks to learn an embedding space, where manifolds with different class labels are better separated, and local data compactness within each manifold is enhanced. As a result, new testing manifold can be more reliably classified in the learned embedding space. The proposed method is evaluated on the tasks of object recognition with image sets, including face recognition and object categorization. Comprehensive comparisons and extensive experiments demonstrate the effectiveness of our method.

Journal ArticleDOI
TL;DR: There was no statistical difference in classification accuracy based on 4 or 11 call variables, but this efficient data reduction technique in conjunction with the high classification accuracy of the SVM is a promising combination for automated species identification by sound.

Journal ArticleDOI
TL;DR: Electroencephalogram (EEG) signals are analyzed with the objective of classifying the two groups and it is shown that EEG signals can be a useful tool for discrimination of the schizophrenic and control participants.

Journal ArticleDOI
TL;DR: Results show that multi-layer perceptron and learning vector quantization can be considered as the most successful models in predicting the financial failure of banks.
Abstract: Bank failures threaten the economic system as a whole. Therefore, predicting bank financial failures is crucial to prevent and/or lessen the incoming negative effects on the economic system. This is originally a classification problem to categorize banks as healthy or non-healthy ones. This study aims to apply various neural network techniques, support vector machines and multivariate statistical methods to the bank failure prediction problem in a Turkish case, and to present a comprehensive computational comparison of the classification performances of the techniques tested. Twenty financial ratios with six feature groups including capital adequacy, asset quality, management quality, earnings, liquidity and sensitivity to market risk (CAMELS) are selected as predictor variables in the study. Four different data sets with different characteristics are developed using official financial data to improve the prediction performance. Each data set is also divided into training and validation sets. In the category of neural networks, four different architectures namely multi-layer perceptron, competitive learning, self-organizing map and learning vector quantization are employed. The multivariate statistical methods; multivariate discriminant analysis, k-means cluster analysis and logistic regression analysis are tested. Experimental results are evaluated with respect to the correct accuracy performance of techniques. Results show that multi-layer perceptron and learning vector quantization can be considered as the most successful models in predicting the financial failure of banks.

Journal ArticleDOI
TL;DR: In this paper, a new method for variable selection in complex spectral profiles is presented, which is validated by comparing samples from cerebrospinal fluid (CSF) with the same samples spiked with peptide and protein standards at different concentration levels.

Journal ArticleDOI
TL;DR: In this paper, the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values.
Abstract: Summary. Spline-based approaches to non-parametric and semiparametric regression, as well as to regression of scalar outcomes on functional predictors, entail choosing a parameter controlling the extent to which roughness of the fitted function is penalized. We demonstrate that the equations determining two popular methods for smoothing parameter selection, generalized cross-validation and restricted maximum likelihood, share a similar form that allows us to prove several results which are common to both, and to derive a condition under which they yield identical values. These ideas are illustrated by application of functional principal component regression, a method for regressing scalars on functions, to two chemometric data sets.

Journal ArticleDOI
TL;DR: In this article, the potential of near-infrared hyperspectral imaging for the detection of insect-damaged wheat kernels was investigated, where healthy wheat kernels and wheat kernels visibly damaged by Sitophilus oryzae, Rhyzopertha dominica, Cryptolestes ferrugineus, and Tribolium castaneum were scanned in the 1000-1600-nm wavelength range using an NIR hyperspectra imaging system.

Journal ArticleDOI
TL;DR: Various changes to the visual aspects of this protocol are explored as well as their effects on classification, and the best performances, across both classifiers, were obtained with the white background (WB) visual protocol.
Abstract: The best known P300 speller brain-computer interface (BCI) paradigm is the Farwell and Donchin paradigm. In this paper, various changes to the visual aspects of this protocol are explored as well as their effects on classification. Changes to the dimensions of the symbols, the distance between the symbols and the colours used were tested. The purpose of the present work was not to achieve the highest possible accuracy results, but to ascertain whether these simple modifications to the visual protocol will provide classification differences between them and what these differences will be. Eight subjects were used, with each subject carrying out a total of six different experiments. In each experiment, the user spelt a total of 39 characters. Two types of classifiers were trained and tested to determine whether the results were classifier dependant. These were a support vector machine (SVM) with a radial basis function (RBF) kernel and Fisher's linear discriminant (FLD). The single-trial classification results and multiple-trial classification results were recorded and compared. Although no visual protocol was the best for all subjects, the best performances, across both classifiers, were obtained with the white background (WB) visual protocol. The worst performance was obtained with the small symbol size (SSS) visual protocol.

Journal ArticleDOI
TL;DR: The paper presents a novel method for the extraction of facial features based on the Gabor-wavelet representation of face images and the kernel partial-least-squares discrimination (KPLSD) algorithm, which outperforms feature-extraction methods such as principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal components analysis (KPCA) or generalized discriminantAnalysis (GDA).
Abstract: The paper presents a novel method for the extraction of facial features based on the Gabor-wavelet representation of face images and the kernel partial-least-squares discrimination (KPLSD) algorithm. The proposed feature-extraction method, called the Gabor-based kernel partial-least-squares discrimination (GKPLSD), is performed in two consecutive steps. In the first step a set of forty Gabor wavelets is used to extract discriminative and robust facial features, while in the second step the kernel partial-least-squares discrimination technique is used to reduce the dimensionality of the Gabor feature vector and to further enhance its discriminatory power. For optimal performance, the KPLSD-based transformation is implemented using the recently proposed fractional-power-polynomial models. The experimental results based on the XM2VTS and ORL databases show that the GKPLSD approach outperforms feature-extraction methods such as principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA) or generalized discriminant analysis (GDA) as well as combinations of these methods with Gabor representations of the face images. Furthermore, as the KPLSD algorithm is derived from the kernel partial-least-squares regression (KPLSR) model it does not suffer from the small-sample-size problem, which is regularly encountered in the field of face recognition.

Journal ArticleDOI
01 Aug 2009
TL;DR: An effective fusion scheme that combines information presented by multiple domain experts based on the rank-level fusion integration method is presented and results indicate that fusion of individual modalities can improve the overall performance of the biometric system, even in the presence of low quality data.
Abstract: In many real-world applications, unimodal biometric systems often face significant limitations due to sensitivity to noise, intraclass variability, data quality, nonuniversality, and other factors. Attempting to improve the performance of individual matchers in such situations may not prove to be highly effective. Multibiometric systems seek to alleviate some of these problems by providing multiple pieces of evidence of the same identity. These systems help achieve an increase in performance that may not be possible using a single-biometric indicator. This paper presents an effective fusion scheme that combines information presented by multiple domain experts based on the rank-level fusion integration method. The developed multimodal biometric system possesses a number of unique qualities, starting from utilizing principal component analysis and Fisher's linear discriminant methods for individual matchers (face, ear, and signature) identity authentication and utilizing the novel rank-level fusion method in order to consolidate the results obtained from different biometric matchers. The ranks of individual matchers are combined using the highest rank, Borda count, and logistic regression approaches. The results indicate that fusion of individual modalities can improve the overall performance of the biometric system, even in the presence of low quality data. Insights on multibiometric design using rank-level fusion and its performance on a variety of biometric databases are discussed in the concluding section.

Journal ArticleDOI
TL;DR: The conclusions are that the performance of the classifiers depends very much on the distribution of data, and it is recommended to look at the data structure prior to model building to determine the optimal type of model.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new formulation of scatter matrices to extend the two-class nonparametric discriminant analysis to multi-class cases, and developed two more improved multiclass NDA-based algorithms (NSA and NFA) with each one having two complementary methods based on the principal space and the null space of the intra-class scatter matrix respectively.
Abstract: In this paper, we develop a new framework for face recognition based on nonparametric discriminant analysis (NDA) and multi-classifier integration. Traditional LDA-based methods suffer a fundamental limitation originating from the parametric nature of scatter matrices, which are based on the Gaussian distribution assumption. The performance of these methods notably degrades when the actual distribution is Non-Gaussian. To address this problem, we propose a new formulation of scatter matrices to extend the two-class nonparametric discriminant analysis to multi-class cases. Then, we develop two more improved multi-class NDA-based algorithms (NSA and NFA) with each one having two complementary methods based on the principal space and the null space of the intra-class scatter matrix respectively. Comparing to the NSA, the NFA is more effective in the utilization of the classification boundary information. In order to exploit the complementary nature of the two kinds of NFA (PNFA and NNFA), we finally develop a dual NFA-based multi-classifier fusion framework by employing the over complete Gabor representation to boost the recognition performance. We show the improvements of the developed new algorithms over the traditional subspace methods through comparative experiments on two challenging face databases, Purdue AR database and XM2VTS database.

Book
26 May 2009
TL;DR: A large number of the models used in this study are logistic regression-based, which is a very simple way of looking at the structure of the data and its role in the design of the model.
Abstract: 1 Introduction. Part I Methodology. 2 Organisation of the data. 2.1 Statistical units and statistical variables. 2.2 Data matrices and their transformations. 2.3 Complex data structures. 2.4 Summary. 3 Summary statistics. 3.1 Univariate exploratory analysis. 3.1.1 Measures of location. 3.1.2 Measures of variability. 3.1.3 Measures of heterogeneity. 3.1.4 Measures of concentration. 3.1.5 Measures of asymmetry. 3.1.6 Measures of kurtosis. 3.2 Bivariate exploratory analysis of quantitative data. 3.3 Multivariate exploratory analysis of quantitative data. 3.4 Multivariate exploratory analysis of qualitative data. 3.4.1 Independence and association. 3.4.2 Distance measures. 3.4.3 Dependency measures. 3.4.4 Model-based measures. 3.5 Reduction of dimensionality. 3.5.1 Interpretation of the principal components. 3.6 Further reading. 4 Model specification. 4.1 Measures of distance. 4.1.1 Euclidean distance. 4.1.2 Similarity measures. 4.1.3 Multidimensional scaling. 4.2 Cluster analysis. 4.2.1 Hierarchical methods. 4.2.2 Evaluation of hierarchical methods. 4.2.3 Non-hierarchical methods. 4.3 Linear regression. 4.3.1 Bivariate linear regression. 4.3.2 Properties of the residuals. 4.3.3 Goodness of fit. 4.3.4 Multiple linear regression. 4.4 Logistic regression. 4.4.1 Interpretation of logistic regression. 4.4.2 Discriminant analysis. 4.5 Tree models. 4.5.1 Division criteria. 4.5.2 Pruning. 4.6 Neural networks. 4.6.1 Architecture of a neural network. 4.6.2 The multilayer perceptron. 4.6.3 Kohonen networks. 4.7 Nearest-neighbour models. 4.8 Local models. 4.8.1 Association rules. 4.8.2 Retrieval by content. 4.9 Uncertainty measures and inference. 4.9.1 Probability. 4.9.2 Statistical models. 4.9.3 Statistical inference. 4.10 Non-parametric modelling. 4.11 The normal linear model. 4.11.1 Main inferential results. 4.12 Generalised linear models. 4.12.1 The exponential family. 4.12.2 Definition of generalised linear models. 4.12.3 The logistic regression model. 4.13 Log-linear models. 4.13.1 Construction of a log-linear model. 4.13.2 Interpretation of a log-linear model. 4.13.3 Graphical log-linear models. 4.13.4 Log-linear model comparison. 4.14 Graphical models. 4.14.1 Symmetric graphical models. 4.14.2 Recursive graphical models. 4.14.3 Graphical models and neural networks. 4.15 Survival analysis models. 4.16 Further reading. 5 Model evaluation. 5.1 Criteria based on statistical tests. 5.1.1 Distance between statistical models. 5.1.2 Discrepancy of a statistical model. 5.1.3 Kullback-Leibler discrepancy. 5.2 Criteria based on scoring functions. 5.3 Bayesian criteria. 5.4 Computational criteria. 5.5 Criteria based on loss functions. 5.6 Further reading. Part II Business case studies. 6 Describing website visitors. 6.1 Objectives of the analysis. 6.2 Description of the data. 6.3 Exploratory analysis. 6.4 Model building. 6.4.1 Cluster analysis. 6.4.2 Kohonen networks. 6.5 Model comparison. 6.6 Summary report. 7 Market basket analysis. 7.1 Objectives of the analysis. 7.2 Description of the data. 7.3 Exploratory data analysis. 7.4 Model building. 7.4.1 Log-linear models. 7.4.2 Association rules. 7.5 Model comparison. 7.6 Summary report. 8 Describing customer satisfaction. 8.1 Objectives of the analysis. 8.2 Description of the data. 8.3 Exploratory data analysis. 8.4 Model building. 8.5 Summary. 9 Predicting credit risk of small businesses. 9.1 Objectives of the analysis. 9.2 Description of the data. 9.3 Exploratory data analysis. 9.4 Model building. 9.5 Model comparison. 9.6 Summary report. 10 Predicting e-learning student performance. 10.1 Objectives of the analysis. 10.2 Description of the data. 10.3 Exploratory data analysis. 10.4 Model specification. 10.5 Model comparison. 10.6 Summary report. 11 Predicting customer lifetime value. 11.1 Objectives of the analysis. 11.2 Description of the data. 11.3 Exploratory data analysis. 11.4 Model specification. 11.5 Model comparison. 11.6 Summary report. 12 Operational risk management. 12.1 Context and objectives of the analysis. 12.2 Exploratory data analysis. 12.3 Model building. 12.4 Model comparison. 12.5 Summary conclusions. References. Index.

Journal ArticleDOI
TL;DR: The newKNWFE possesses the advantages of both linear and nonlinear transformation, and the experimental results show that KNWFE outperforms NWFE, decision-boundary feature extraction, independent component analysis, kernel-based principal component analysis and generalized discriminant analysis.
Abstract: In recent years, many studies show that kernel methods are computationally efficient, robust, and stable for pattern analysis. Many kernel-based classifiers were designed and applied to classify remote-sensed data, and some results show that kernel-based classifiers have satisfying performances. Many studies about hyperspectral image classification also show that nonparametric weighted feature extraction (NWFE) is a powerful tool for extracting hyperspectral image features. However, NWFE is still based on linear transformation. In this paper, the kernel method is applied to extend NWFE to kernel-based NWFE (KNWFE). The new KNWFE possesses the advantages of both linear and nonlinear transformation, and the experimental results show that KNWFE outperforms NWFE, decision-boundary feature extraction, independent component analysis, kernel-based principal component analysis, and generalized discriminant analysis.

Journal ArticleDOI
TL;DR: This paper proposes a novel semi-supervised orthogonal discriminant analysis via label propagation that propagates the label information from the labeled data to the unlabeled data through a specially designed label propagation, and thus the distribution of the unl labeled data can be explored more effectively to learn a better subspace.

Journal ArticleDOI
TL;DR: It is demonstrated that the authors' proposed segmentation and feature extraction techniques are promising for classifying lung nodules on CT images and if a support vector machine (SVM) classifier can achieve improved performance over the LDA classifier.
Abstract: The purpose of this work is to develop a computer-aided diagnosis (CAD) system to differentiate malignant and benign lung nodules on CT scans. A fully automated system was designed to segment the nodule from its surrounding structured background in a local volume of interest (VOI) and to extract image features for classification. Image segmentation was performed with a 3D active contour method. The initial contour was obtained as the boundary of a binary object generated by k-means clustering within the VOI and smoothed by morphological opening. A data set of 256 lung nodules (124 malignant and 132 benign) from 152 patients was used in this study. In addition to morphological and texture features, the authors designed new nodule surface features to characterize the lung nodule surface smoothness and shape irregularity. The effects of two demographic features, age and gender, as adjunct to the image features were also investigated. A linear discriminant analysis (LDA) classifier built with features from stepwise feature selection was trained using simplex optimization to select the most effective features. A two-loop leave-one-out resampling scheme was developed to reduce the optimistic bias in estimating the test performance of the CAD system. The area under the receiver operating characteristic curve, Az, for the test cases improved significantly (p 0.05) when they were added to the feature space containing the morphological, texture, and new gradient field and radius features. To investigate if a support vector machine (SVM) classifier can achieve improved performance over the LDA classifier, we compared the performance of the LDA and SVMs with various kernels and parameters. Principal component analysis was used to reduce the dimensionality of the feature space for both the LDA and the SVM classifiers. When the number of selected principal components was varied, the highest test Az among the SVMs of various kernels and parameters was slightly higher than that of the LDA in one-loop leave-one-case-out resampling. However, no SVM with fixed architecture consistently performed better than the LDA in the range of principal components selected. This study demonstrated that the authors’ proposed segmentation and feature extraction techniques are promising for classifying lung nodules on CT images.

Journal ArticleDOI
TL;DR: An empirical Bayes approach to large-scale prediction, where the optimum Bayes prediction rule is estimated employing the data from all of the predictors, is proposed.
Abstract: Classical prediction methods, such as Fisher’s linear discriminant function, were designed for small-scale problems in which the number of predictors N is much smaller than the number of observations n. Modern scientific devices often reverse this situation. A microarray analysis, for example, might include n=100 subjects measured on N=10,000 genes, each of which is a potential predictor. This article proposes an empirical Bayes approach to large-scale prediction, where the optimum Bayes prediction rule is estimated employing the data from all of the predictors. Microarray examples are used to illustrate the method. The results demonstrate a close connection with the shrunken centroids algorithm of Tibshirani et al. (2002), a frequentist regularization approach to large-scale prediction, and also with false discovery rate theory.