scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 2007"


Journal ArticleDOI
TL;DR: A general tensor discriminant analysis (GTDA) is developed as a preprocessing step for LDA for face recognition and achieves good performance for gait recognition based on image sequences from the University of South Florida (USF) HumanID Database.
Abstract: Traditional image representations are not suited to conventional classification methods such as the linear discriminant analysis (LDA) because of the undersample problem (USP): the dimensionality of the feature space is much higher than the number of training samples. Motivated by the successes of the two-dimensional LDA (2DLDA) for face recognition, we develop a general tensor discriminant analysis (GTDA) as a preprocessing step for LDA. The benefits of GTDA, compared with existing preprocessing methods such as the principal components analysis (PCA) and 2DLDA, include the following: 1) the USP is reduced in subsequent classification by, for example, LDA, 2) the discriminative information in the training tensors is preserved, and 3) GTDA provides stable recognition rates because the alternating projection optimization algorithm to obtain a solution of GTDA converges, whereas that of 2DLDA does not. We use human gait recognition to validate the proposed GTDA. The averaged gait images are utilized for gait representation. Given the popularity of Gabor-function-based image decompositions for image understanding and object recognition, we develop three different Gabor-function-based image representations: 1) GaborD is the sum of Gabor filter responses over directions, 2) GaborS is the sum of Gabor filter responses over scales, and 3) GaborSD is the sum of Gabor filter responses over scales and directions. The GaborD, GaborS, and GaborSD representations are applied to the problem of recognizing people from their averaged gait images. A large number of experiments were carried out to evaluate the effectiveness (recognition rate) of gait recognition based on first obtaining a Gabor, GaborD, GaborS, or GaborSD image representation, then using GDTA to extract features and, finally, using LDA for classification. The proposed methods achieved good performance for gait recognition based on image sequences from the University of South Florida (USF) HumanID Database. Experimental comparisons are made with nine state-of-the-art classification methods in gait recognition.

1,160 citations


Journal Article
TL;DR: A new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP, and which can be easily computed just by solving a generalized eigenvalue problem.
Abstract: Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called locality-preserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick.

1,055 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper proposes a novel method, called Semi- supervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples to learn a discriminant function which is as smooth as possible on the data manifold.
Abstract: Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. In practice, when there is no sufficient training samples, the covariance matrix of each class may not be accurately estimated. In this paper, we propose a novel method, called Semi- supervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples. The labeled data points are used to maximize the separability between different classes and the unlabeled data points are used to estimate the intrinsic geometric structure of the data. Specifically, we aim to learn a discriminant function which is as smooth as possible on the data manifold. Experimental results on single training image face recognition and relevance feedback image retrieval demonstrate the effectiveness of our algorithm.

730 citations


Journal ArticleDOI
TL;DR: A novel discriminative learning method over sets is proposed for set classification that maximizes the canonical correlations of within-class sets and minimizes thecanon correlations of between- class sets.
Abstract: We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object's appearance due to changing camera pose and lighting conditions. canonical correlations (also known as principal or canonical angles), which can be thought of as the angles between two d-dimensional subspaces, have recently attracted attention for image set matching. Canonical correlations offer many benefits in accuracy, efficiency, and robustness compared to the two main classical methods: parametric distribution-based and nonparametric sample-based matching of sets. Here, this is first demonstrated experimentally for reasonably sized data sets using existing methods exploiting canonical correlations. Motivated by their proven effectiveness, a novel discriminative learning method over sets is proposed for set classification. Specifically, inspired by classical linear discriminant analysis (LDA), we develop a linear discriminant function that maximizes the canonical correlations of within-class sets and minimizes the canonical correlations of between-class sets. Image sets transformed by the discriminant function are then compared by the canonical correlations. Classical orthogonal subspace method (OSM) is also investigated for the similar purpose and compared with the proposed method. The proposed method is evaluated on various object recognition problems using face image sets with arbitrary motion captured under different illuminations and image sets of 500 general objects taken at different views. The method is also applied to object category recognition using ETH-80 database. The proposed method is shown to outperform the state-of-the-art methods in terms of accuracy and efficiency

626 citations


Journal ArticleDOI
TL;DR: Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method and can be as competitive as the support vector machines classifiers.
Abstract: In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the "nearest shrunken centroids" (NSC) (Tibshirani and others, 2003) into the classical discriminant analysis. The SCRDA method is specially designed for classification problems in high dimension low sample size situations, for example, microarray data. Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method (using the NSC algorithm) and can be as competitive as the support vector machines classifiers. It is also suitable for feature elimination purpose and can be used as gene selection method. The open source R package for this method (named "rda") is available on CRAN (http://www.r-project.org) for download and testing.

602 citations


Journal ArticleDOI
TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
Abstract: The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech recognition problems. However, few if any studies have reported results for the application of MCE training to large-vocabulary, continuous-speech recognition tasks. This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary (up to 100 k word) speech recognition tasks: the Corpus of Spontaneous Japanese lecture speech transcription task, a telephone-based name recognition task, and the MIT Jupiter telephone-based conversational weather information task. On these tasks, starting from maximum likelihood (ML) baselines, MCE training yielded relative reductions in word error ranging from 7% to 20%. Furthermore, this paper evaluates the use of different methods for optimizing the MCE criterion function, as well as the use of precomputed recognition lattices to speed up training. An overview of the MCE framework is given, with an emphasis on practical implementation issues

581 citations


Proceedings Article
06 Jan 2007
TL;DR: A novel linear algorithm for discriminant analysis, called Locality Sensitive Discriminant Analysis (LSDA), which finds a projection which maximizes the margin between data points from different classes at each local area by discovering the local manifold structure.
Abstract: Linear Discriminant Analysis (LDA) is a popular data-analytic tool for studying the class relationship between data points A major disadvantage of LDA is that it fails to discover the local geometrical structure of the data manifold In this paper, we introduce a novel linear algorithm for discriminant analysis, called Locality Sensitive Discriminant Analysis (LSDA) When there is no sufficient training samples, local structure is generally more important than global structure for discriminant analysis By discovering the local manifold structure, LSDA finds a projection which maximizes the margin between data points from different classes at each local area Specifically, the data points are mapped into a subspace in which the nearby points with the same label are close to each other while the nearby points with different labels are far apart Experiments carried out on several standard face databases show a clear improvement over the results of LDA-based recognition

500 citations


Journal ArticleDOI
TL;DR: In this paper, an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-dimensional data in small sample size cases is proposed, which can be seen as a linear approximation of a multimanifolds-based learning framework taking into account both the local and nonlocal quantities.
Abstract: This paper develops an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-dimensional data in small sample size cases. UDP can be seen as a linear approximation of a multimanifolds-based learning framework which takes into account both the local and nonlocal quantities. UDP characterizes the local scatter as well as the nonlocal scatter, seeking to find a projection that simultaneously maximizes the nonlocal scatter and minimizes the local scatter. This characteristic makes UDP more intuitive and more powerful than the most up-to-date method, locality preserving projection (LPP), which considers only the local scatter for clustering or classification tasks. The proposed method is applied to face and palm biometrics and is examined using the Yale, FERET, and AR face image databases and the PolyU palmprint database. The experimental results show that UDP consistently outperforms LPP and PCA and outperforms LDA when the training sample size per class is small. This demonstrates that UDP is a good choice for real-world biometrics applications

473 citations


Journal ArticleDOI
TL;DR: It was discovered that a particular mixed-band feature space consisting of nine parameters and LMBPNN result in the highest classification accuracy, a high value of 96.7%.
Abstract: A novel wavelet-chaos-neural network methodology is presented for classification of electroencephalograms (EEGs) into healthy, ictal, and interictal EEGs. Wavelet analysis is used to decompose the EEG into delta, theta, alpha, beta, and gamma sub-bands. Three parameters are employed for EEG representation: standard deviation (quantifying the signal variance), correlation dimension, and largest Lyapunov exponent (quantifying the non-linear chaotic dynamics of the signal). The classification accuracies of the following techniques are compared: 1) unsupervised-means clustering; 2) linear and quadratic discriminant analysis; 3) radial basis function neural network; 4) Levenberg-Marquardt backpropagation neural network (LMBPNN). To reduce the computing time and output analysis, the research was performed in two phases: band-specific analysis and mixed-band analysis. In phase two, over 500 different combinations of mixed-band feature spaces consisting of promising parameters from phase one of the research were investigated. It is concluded that all three key components of the wavelet-chaos-neural network methodology are important for improving the EEG classification accuracy. Judicious combinations of parameters and classifiers are needed to accurately discriminate between the three types of EEGs. It was discovered that a particular mixed-band feature space consisting of nine parameters and LMBPNN result in the highest classification accuracy, a high value of 96.7%.

434 citations


01 Jan 2007
TL;DR: In this article, the authors develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms, such as locally weighted linear regression (LWLR), k-means, logistic regression (LR), naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis (GDA), EM, and backpropagation (NN).
Abstract: We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain ‘summation form,’ which allows them to be easily parallelized on multicore computers. We adapt Google's map-reduce [7] paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression (LR), naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis (GDA), EM, and backpropagation (NN). Our experimental results show basically linear speedup with an increasing number of processors.

381 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper proposes a novel dimensionality reduction framework, called spectral regression (SR), for efficient regularized subspace learning, which casts the problem of learning the projective functions into a regression framework, which avoids eigen-decomposition of dense matrices.
Abstract: Subspace learning based face recognition methods have attracted considerable interests in recent years, including principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projection (LPP), neighborhood preserving embedding (NPE) and marginal Fisher analysis (MFA). However, a disadvantage of all these approaches is that their computations involve eigen- decomposition of dense matrices which is expensive in both time and memory. In this paper, we propose a novel dimensionality reduction framework, called spectral regression (SR), for efficient regularized subspace learning. SR casts the problem of learning the projective functions into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. Computational analysis shows that SR has only linear-time complexity which is a huge speed up comparing to the cubic-time complexity of the ordinary approaches. Experimental results on face recognition demonstrate the effectiveness and efficiency of our method.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper introduces a regularized subspace learning model using a Laplacian penalty to constrain the coefficients to be spatially smooth and shows results on face recognition which are better for image representation than their original version.
Abstract: Subspace learning based face recognition methods have attracted considerable interests in recently years, including principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projection (LPP), neighborhood preserving embedding (NPE), marginal fisher analysis (MFA) and local discriminant embedding (LDE). These methods consider an n1timesn2 image as a vector in Rn 1 timesn 2 and the pixels of each image are considered as independent. While an image represented in the plane is intrinsically a matrix. The pixels spatially close to each other may be correlated. Even though we have n1xn2 pixels per image, this spatial correlation suggests the real number of freedom is far less. In this paper, we introduce a regularized subspace learning model using a Laplacian penalty to constrain the coefficients to be spatially smooth. All these existing subspace learning algorithms can fit into this model and produce a spatially smooth subspace which is better for image representation than their original version. Recognition, clustering and retrieval can be then performed in the image subspace. Experimental results on face recognition demonstrate the effectiveness of our method.

Journal ArticleDOI
TL;DR: This paper presents a novel approach to solve the supervised dimensionality reduction problem by encoding an image object as a general tensor of second or even higher order, and proposes a discriminant tensor criterion, whereby multiple interrelated lower dimensional discriminative subspaces are derived for feature extraction.
Abstract: There is a growing interest in subspace learning techniques for face recognition; however, the excessive dimension of the data space often brings the algorithms into the curse of dimensionality dilemma. In this paper, we present a novel approach to solve the supervised dimensionality reduction problem by encoding an image object as a general tensor of second or even higher order. First, we propose a discriminant tensor criterion, whereby multiple interrelated lower dimensional discriminative subspaces are derived for feature extraction. Then, a novel approach, called k-mode optimization, is presented to iteratively learn these subspaces by unfolding the tensor along different tensor directions. We call this algorithm multilinear discriminant analysis (MDA), which has the following characteristics: 1) multiple interrelated subspaces can collaborate to discriminate different classes, 2) for classification problems involving higher order tensors, the MDA algorithm can avoid the curse of dimensionality dilemma and alleviate the small sample size problem, and 3) the computational cost in the learning stage is reduced to a large extent owing to the reduced data dimensions in k-mode optimization. We provide extensive experiments on ORL, CMU PIE, and FERET databases by encoding face images as second- or third-order tensors to demonstrate that the proposed MDA algorithm based on higher order tensors has the potential to outperform the traditional vector-based subspace learning algorithms, especially in the cases with small sample sizes

Proceedings ArticleDOI
29 Sep 2007
TL;DR: In this paper, the authors proposed a method for dimensionality reduction of a feature set by choosing a subset of the original features that contains most of the essential information, using the same criteria as PCA.
Abstract: Dimensionality reduction of a feature set is a common preprocessing step used for pattern recognition and classification applications. Principal Component Analysis (PCA) is one of the popular methods used, and can be shown to be optimal using different optimality criteria. However, it has the disadvantage that measurements from all the original features are used in the projection to the lower dimensional space. This paper proposes a novel method for dimensionality reduction of a feature set by choosing a subset of the original features that contains most of the essential information, using the same criteria as PCA. We call this method Principal Feature Analysis (PFA). The proposed method is successfully applied for choosing the principal features in face tracking and content-based image retrieval (CBIR) problems. Automated annotation of digital pictures has been a highly challenging problem for computer scientists since the invention of computers. The capability of annotating pictures by computers can lead to breakthroughs in a wide range of applications including Web image search, online picture-sharing communities, and scientific experiments. In our work, by advancing statistical modeling and optimization techniques, we can train computers about hundreds of semantic concepts using example pictures from each concept. The ALIPR (Automatic Linguistic Indexing of Pictures - Real Time) system of fully automatic and high speed annotation for online pictures has been constructed. Thousands of pictures from an Internet photo-sharing site, unrelated to the source of those pictures used in the training process, have been tested. The experimental results show that a single computer processor can suggest annotation terms in real-time and with good accuracy.

Book ChapterDOI
20 Oct 2007
TL;DR: It is argued that robust recognition requires several different kinds of appearance information to be taken into account, suggesting the use of heterogeneous feature sets, and combining two of the most successful local face representations, Gabor wavelets and Local Binary Patterns, gives considerably better performance than either alone.
Abstract: Extending recognition to uncontrolled situations is a key challenge for practical face recognition systems Finding efficient and discriminative facial appearance descriptors is crucial for this Most existing approaches use features of just one type Here we argue that robust recognition requires several different kinds of appearance information to be taken into account, suggesting the use of heterogeneous feature sets We show that combining two of the most successful local face representations, Gabor wavelets and Local Binary Patterns (LBP), gives considerably better performance than either alone: they are complimentary in the sense that LBP captures small appearance details while Gabor features encode facial shape over a broader range of scales Both feature sets are high dimensional so it is beneficial to use PCA to reduce the dimensionality prior to normalization and integration The Kernel Discriminative Common Vector method is then applied to the combined feature vector to extract discriminant nonlinear features for recognition The method is evaluated on several challenging face datasets including FRGC 104, FRGC 204 and FERET, with promising results

Journal ArticleDOI
TL;DR: In this article, two versions of functional Principal Component Regression (PCR) are developed, both using B-splines and roughness penalties, and the regularized-components version applies such a penalty to the construction of the principal components.
Abstract: Regression of a scalar response on signal predictors, such as near-infrared (NIR) spectra of chemical samples, presents a major challenge when, as is typically the case, the dimension of the signals far exceeds their number. Most solutions to this problem reduce the dimension of the predictors either by regressing on components [e.g., principal component regression (PCR) and partial least squares (PLS)] or by smoothing methods, which restrict the coefficient function to the span of a spline basis. This article introduces functional versions of PCR and PLS, which combine both of the foregoing dimension-reduction approaches. Two versions of functional PCR are developed, both using B-splines and roughness penalties. The regularized-components version applies such a penalty to the construction of the principal components (i.e., it uses functional principal components), whereas the regularized-regression version incorporates a penalty in the regression. For the latter form of functional PCR, the penalty parame...

Proceedings ArticleDOI
20 Jun 2007
TL;DR: The equivalence relationship between the proposed least squares formulation and LDA for multi-class classifications is rigorously established under a mild condition, which is shown empirically to hold in many applications involving high-dimensional data.
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. LDA in the binaryclass case has been shown to be equivalent to linear regression with the class label as the output. This implies that LDA for binary-class classifications can be formulated as a least squares problem. Previous studies have shown certain relationship between multivariate linear regression and LDA for the multi-class case. Many of these studies show that multivariate linear regression with a specific class indicator matrix as the output can be applied as a preprocessing step for LDA. However, directly casting LDA as a least squares problem is challenging for the multi-class case. In this paper, a novel formulation for multivariate linear regression is proposed. The equivalence relationship between the proposed least squares formulation and LDA for multi-class classifications is rigorously established under a mild condition, which is shown empirically to hold in many applications involving high-dimensional data. Several LDA extensions based on the equivalence relationship are discussed.

Proceedings ArticleDOI
02 May 2007
TL;DR: This work proposes a new method called sub-band common spatial pattern (SBCSP), which outperforms the other two approaches and achieves similar result as compared to the best one in the literature which was obtained by a time-consuming fine-tuning process.
Abstract: Brain-computer interface (BCI) is a system to translate humans thoughts into commands. For electroencephalography (EEG) based BCI, motor imagery is considered as one of the most effective ways. Different imagery activities can be classified based on the changes in mu and/or beta rhythms and their spatial distributions. However, the change in these rhythmic patterns varies from one subject to another. This causes an unavoidable time-consuming fine-tuning process in building a BCI for every subject. To address this issue, we propose a new method called sub-band common spatial pattern (SBCSP) to solve the problem. First, we decompose the EEG signals into sub-bands using a filter bank. Subsequently, we apply a discriminative analysis to extract SBCSP features. The SBCSP features are then fed into linear discriminant analyzers (LDA) to obtain scores which reflect the classification capability of each frequency band. Finally, the scores are fused to make decision. We evaluate two fusion methods: recursive band elimination (RBE) and meta-classifier (MC). We assess our approaches on a standard database from BCI Competition III. We also compare our method with two other approaches that address the same issue. The results show that our method outperforms the other two approaches and achieves similar result as compared to the best one in the literature which was obtained by a time-consuming fine-tuning process.

Journal ArticleDOI
TL;DR: A novel and uniform framework for both face identification and verification is presented, based on a combination of Gabor wavelets and General Discriminant Analysis, and can be considered appearance based in that features are extracted from the whole face image and subjected to subspace projection.

Journal ArticleDOI
TL;DR: It could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well.

Journal ArticleDOI
TL;DR: Results confirm that the proposed method is applicable to real-time EMG pattern recognition for multifunction myoelectric hand control and produces a better performance for the class separability, plus the LDA-projected features improve the classification accuracy with a short processing time.
Abstract: Electromyographic (EMG) pattern recognition is essential for the control of a multifunction myoelectric hand. The main goal of this study was to develop an efficient feature- projection method for EMG pattern recognition. To this end, a linear supervised feature projection is proposed that utilizes a linear discriminant analysis (LDA). First, a wavelet packet transform (WPT) is performed to extract a feature vector from four-channel EMG signals. To dimensionally reduce and cluster the WPT features, an LDA, then, incorporates class information into the learning procedure, and identifies a linear matrix to maximize the class separability for the projected features. Finally, a multilayer perceptron classifies the LDA-reduced features into nine hand motions. To evaluate the performance of the LDA for WPT features, the LDA is compared with three other feature-projection methods. From a visualization and quantitative comparison, it is shown that the LDA produces a better performance for the class separability, plus the LDA-projected features improve the classification accuracy with a short processing time. A real-time pattern-recognition system is then implemented for a multifunction myoelectric hand. Experiments show that the proposed method achieves a 97.4% recognition accuracy, and all processes, including the generation of control commands for the myoelectric hand, are completed within 97 ms. Consequently, these results confirm that the proposed method is applicable to real-time EMG pattern recognition for multifunction myoelectric hand control.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: The rich structure of the general LDA-Km framework is shown by examining its variants and their relationships to earlier approaches by using K-means clustering to generate class labels and LDA to do subspace selection.
Abstract: We combine linear discriminant analysis (LDA) and K-means clustering into a coherent framework to adaptively select the most discriminative subspace. We use K-means clustering to generate class labels and use LDA to do subspace selection. The clustering process is thus integrated with the subspace selection process and the data are then simultaneously clustered while the feature subspaces are selected. We show the rich structure of the general LDA-Km framework by examining its variants and their relationships to earlier approaches. Relations among PCA, LDA, K-means are clarified. Extensive experimental results on real-world datasets show the effectiveness of our approach.

Journal ArticleDOI
TL;DR: Model-based clustering as discussed by the authors can also be used for some other important problems in multivariate analysis, including density estimation and discriminant analysis, and can be applied in each instance.
Abstract: Due to recent advances in methods and software for model-based clustering, and to the interpretability of the results, clustering procedures based on probability models are increasingly preferred over heuristic methods. The clustering process estimates a model for the data that allows for overlapping clusters, producing a probabilistic clustering that quantifies the uncertainty of observations belonging to components of the mixture. The resulting clustering model can also be used for some other important problems in multivariate analysis, including density estimation and discriminant analysis. Examples of the use of model-based clustering and classification techniques in chemometric studies include multivariate image analysis, magnetic resonance imaging, microarray image segmentation, statistical process control, and food authenticity. We review model-based clustering and related methods for density estimation and discriminant analysis, and show how the R package mclust can be applied in each instance.

Journal ArticleDOI
TL;DR: The exact distribution of the maximally selected Gini gain is derived by means of a combinatorial approach and the resulting p -value is suggested as an unbiased split selection criterion in recursive partitioning algorithms.

Journal ArticleDOI
TL;DR: In this article, the authors show that analyzing factorial data sets using a conventional discriminant function analysis is a case of pseudoreplication and tends to produce (sometimes grossly) incorrect results.

01 Jan 2007
TL;DR: In this paper, the authors summarize and analyze existing research on bankruptcy prediction studies in order to facilitate more productive future research in this area, highlighting the different methods, number and variety of factors, and specific uses of models.
Abstract: One of the most well-known bankruptcy prediction models was developed by Altman [1968) using multivariate discriminant analysis. Since Altman 5 model, a multitude of bankruptcy prediction models have flooded the literature. The primary goal of this paper is to summarize and analyze existing research on bankruptcy prediction studies in order to faCilitate more productive future research in this area. This paper traces the literature on bankruptcy prediction from the 19305, when studies focused on the use of simple ratio analysis to predict future bankruptcy, to present. The authors discuss how bankruptcy prediction studieshave evolved, highlighting the different methods, number and variety of factors, and specific uses of models. Analysis of 165 bankruptcy prediction studies published from 1965 to present reveals trends in model development. Forexample, discriminant analysis was the primary method used to develop models in the 19605 and 19705. Investigation of model type by decade shows that the primary method began to shift to logit analysis and neural networks in the 19805 and 19905. The number of factors utilized in models is also analyzed by decade, showing that the average has varied over time but remains around 10 overall. Analysis of accuracy of the models suggests that multivariate discriminant analysis and neural networks are the most promising methods for bankruptcy prediction models. The findings also suggest that higher model accuracy is not guaranteed with a greater number of factors. Some models with two factors are fust as capable of accurate prediction as models with 21 factors.

Book ChapterDOI
27 Aug 2007
TL;DR: In this paper, a discriminative face representation derived by the Linear Discriminant Analysis (LDA) of multi-scale local binary pattern histograms is proposed for face recognition.
Abstract: A novel discriminative face representation derived by the Linear Discriminant Analysis (LDA) of multi-scale local binary pattern histograms is proposed for face recognition The face image is first partitioned into several non-overlapping regions In each region, multi-scale local binary uniform pattern histograms1 are extracted and concatenated into a regional feature The features are then projected on the LDA space to be used as a discriminative facial descriptor The method is implemented and tested in face identification on the standard Feret database and in face verification on the XM2VTS database with very promising results

Journal ArticleDOI
TL;DR: A new Bayesian quadratic discriminant analysis classifier is proposed where the prior is defined using a coarse estimate of the covariance based on the training data, termed BDA7; results on benchmark data sets and simulations show that BDA 7 performance is competitive with, and in some cases significantly better than, regularized quadratics discriminantAnalysis and the cross-validated Bayesian Quadratic Bayes.
Abstract: Quadratic discriminant analysis is a common tool for classification, but estimation of the Gaussian parameters can be ill-posed. This paper contains theoretical and algorithmic contributions to Bayesian estimation for quadratic discriminant analysis. A distribution-based Bayesian classifier is derived using information geometry. Using a calculus of variations approach to define a functional Bregman divergence for distributions, it is shown that the Bayesian distribution-based classifier that minimizes the expected Bregman divergence of each class conditional distribution also minimizes the expected misclassification cost. A series approximation is used to relate regularized discriminant analysis to Bayesian discriminant analysis. A new Bayesian quadratic discriminant analysis classifier is proposed where the prior is defined using a coarse estimate of the covariance based on the training data; this classifier is termed BDA7. Results on benchmark data sets and simulations show that BDA7 performance is competitive with, and in some cases significantly better than, regularized quadratic discriminant analysis and the cross-validated Bayesian quadratic discriminant analysis classifier Quadratic Bayes.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: The proposed algorithm significantly outperforms the three popular linear face recognition techniques and also performs comparably with the recently developed Orthogonal Laplacian faces with the advantage of computational speed.
Abstract: In this paper, we present novel ridge regression (RR) and kernel ridge regression (KRR) techniques for multivariate labels and apply the methods to the problem efface recognition. Motivated by the fact that the regular simplex vertices are separate points with highest degree of symmetry, we choose such vertices as the targets for the distinct individuals in recognition and apply RR or KRR to map the training face images into a face subspace where the training images from each individual will locate near their individual targets. We identify the new face image by mapping it into this face subspace and comparing its distance to all individual targets. An efficient cross-validation algorithm is also provided for selecting the regularization and kernel parameters. Experiments were conducted on two face databases and the results demonstrate that the proposed algorithm significantly outperforms the three popular linear face recognition techniques (Eigenfaces, Fisher faces and Laplacian faces) and also performs comparably with the recently developed Orthogonal Laplacian faces with the advantage of computational speed. Experimental results also demonstrate that KRR outperforms RR as expected since KRR can utilize the nonlinear structure of the face images. Although we concentrate on face recognition in this paper, the proposed method is general and may be applied for general multi-category classification problems.

Proceedings ArticleDOI
28 Oct 2007
TL;DR: This paper proposes a novel dimensionality reduction framework, called Unified Sparse Subspace Learning (USSL), for learning sparse projections, which casts the problem of learning the projective functions into a regression framework, which facilitates the use of different kinds of regularizes.
Abstract: Recently the problem of dimensionality reduction (or, subspace learning) has received a lot of interests in many fields of information processing, including data mining, information retrieval, and pattern recognition. Some popular methods include principal component analysis (PCA), linear discriminant analysis (LDA) and locality preserving projection (LPP). However, a disadvantage of all these approaches is that the learned projective functions are linear combinations of all the original features, thus it is often difficult to interpret the results. In this paper, we propose a novel dimensionality reduction framework, called Unified Sparse Subspace Learning (USSL), for learning sparse projections. USSL casts the problem of learning the projective functions into a regression framework, which facilitates the use of different kinds of regularizes. By using a L1-norm regularizer (lasso), the sparse projections can be efficiently computed. Experimental results on real world classification and clustering problems demonstrate the effectiveness of our method.