scispace - formally typeset
Search or ask a question

Showing papers on "Linear discriminant analysis published in 2006"


Proceedings Article
04 Dec 2006
TL;DR: This work shows that algorithms that fit the Statistical Query model can be written in a certain "summation form," which allows them to be easily parallelized on multicore computers and shows basically linear speedup with an increasing number of processors.
Abstract: We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain "summation form," which allows them to be easily parallelized on multicore computers. We adapt Google's map-reduce [7] paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression (LR), naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis (GDA), EM, and backpropagation (NN). Our experimental results show basically linear speedup with an increasing number of processors.

1,310 citations


Journal ArticleDOI
TL;DR: In this paper, class-orthogonal variation can be exploited to augment classificaiton analysis (OPLS-DA) for the purpose of discriminant analysis, and the OPLS method can be used to augment classification.
Abstract: The characteristics of the OPLS method have been investigated for the purpose of discriminant analysis (OPLS-DA). We demonstrate how class-orthogonal variation can be exploited to augment classific ...

1,179 citations


Journal ArticleDOI
TL;DR: This paper proposes some new feature extractors based on maximum margin criterion (MMC) and establishes a new linear feature extractor that does not suffer from the small sample size problem, which is known to cause serious stability problems for LDA.
Abstract: In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information. Principal component analysis (PCA) and linear discriminant analysis (LDA) are the two most popular linear dimensionality reduction methods. However, PCA is not very effective for the extraction of the most discriminant features, and LDA is not stable due to the small sample size problem . In this paper, we propose some new (linear and nonlinear) feature extractors based on maximum margin criterion (MMC). Geometrically, feature extractors based on MMC maximize the (average) margin between classes after dimensionality reduction. It is shown that MMC can represent class separability better than PCA. As a connection to LDA, we may also derive LDA from MMC by incorporating some constraints. By using some other constraints, we establish a new linear feature extractor that does not suffer from the small sample size problem, which is known to cause serious stability problems for LDA. The kernelized (nonlinear) counterpart of this linear feature extractor is also established in the paper. Our extensive experiments demonstrate that the new feature extractors are effective, stable, and efficient.

838 citations


Journal ArticleDOI
TL;DR: The results indicate that while all methods attained acceptable performance levels, SWLDA and FLD provide the best overall performance and implementation characteristics for practical classification of P300 Speller data.
Abstract: This study assesses the relative performance characteristics of five established classification techniques on data collected using the P300 Speller paradigm, originally described by Farwell and Donchin (1988 Electroenceph. Clin. Neurophysiol. 70 510). Four linear methods: Pearson's correlation method (PCM), Fisher's linear discriminant (FLD), stepwise linear discriminant analysis (SWLDA) and a linear support vector machine (LSVM); and one nonlinear method: Gaussian kernel support vector machine (GSVM), are compared for classifying offline data from eight users. The relative performance of the classifiers is evaluated, along with the practical concerns regarding the implementation of the respective methods. The results indicate that while all methods attained acceptable performance levels, SWLDA and FLD provide the best overall performance and implementation characteristics for practical classification of P300 Speller data.

759 citations


Journal ArticleDOI
TL;DR: This paper extensively elaborates on the application of (1) univariate analysis, (2) risk index models, (3) multivariate discriminant analysis, and (4) conditional probability models, such as logit, probit and linear probability models.
Abstract: Over the last 35 years, business failure prediction has become a major research domain within corporate finance. Numerous corporate failure prediction models have been developed, based on various modelling techniques. The most popular are the classic cross-sectional statistical methods, which have resulted in various ‘single-period’ or static models, especially multivariate discriminant models and logit models. To date, there has been no clear overview and discussion of the application of classic statistical methods to business failure prediction. Therefore, this paper extensively elaborates on the application of (1) univariate analysis, (2) risk index models, (3) multivariate discriminant analysis, and (4) conditional probability models in corporate failure prediction. In addition, because there is no clear and comprehensive analysis in the existing literature of the diverse problems related to the application of these methods to the topic of corporate failure prediction, this paper brings together all problem issues and enlarges upon each of them. It discusses all problems related to: (1) the classical paradigm (i.e. the arbitrary definition of failure, non-stationarity and data instability, sampling selectivity, and the choice of the optimisation criteria); (2) the neglect of the time dimension of failure; and (3) the application focus in failure prediction modelling. Further, the paper elaborates on a number of other problems related to the use of a linear classification rule, the use of annual account information, and neglect of the multidimensional nature of failure. This paper contributes towards a thorough understanding of the features of the classic statistical business failure prediction models and their related problems.

691 citations


Book ChapterDOI
07 May 2006
TL;DR: This paper proposes Probabilistic LDA, a generative probability model with which it can both extract the features and combine them for recognition, and shows applications to classification, hypothesis testing, class inference, and clustering.
Abstract: Linear dimensionality reduction methods, such as LDA, are often used in object recognition for feature extraction, but do not address the problem of how to use these features for recognition. In this paper, we propose Probabilistic LDA, a generative probability model with which we can both extract the features and combine them for recognition. The latent variables of PLDA represent both the class of the object and the view of the object within a class. By making examples of the same class share the class variable, we show how to train PLDA and use it for recognition on previously unseen classes. The usual LDA features are derived as a result of training PLDA, but in addition have a probability model attached to them, which automatically gives more weight to the more discriminative features. With PLDA, we can build a model of a previously unseen class from a single example, and can combine multiple examples for a better representation of the class. We show applications to classification, hypothesis testing, class inference, and clustering, on classes not observed during training.

470 citations


MonographDOI
18 Apr 2006
TL;DR: A review of the main findings of the first edition of Manova/DDA: A Discriminant Analysis in Research, which concluded that the results of this study confirmed that the design of the MANOVA was based on a mixture of objective and subjective criteria.
Abstract: List of Figures List of Tables Preface to Second Edition Acknowledgments Preface to First Edition Notation I INTRODUCTION 1 Discriminant Analysis in Research 11 A Little History 12 Overview 13 Descriptive Discriminant Analysis 14 Predictive Discriminant Analysis 15 Design in Discriminant Analysis 2 Preliminaries 21 Introduction 22 Research Context 23 Data, Analysis Units, Variables, and Constructs 24 Summarizing Data 25 Matrix Operations 26 Distance 27 Linear Composite 28 Probability 29 Statistical Testing 210 Judgment in Data Analysis 211 Summary II ONE-FACTOR MANOVA/DDA 3 Group Separation 31 Introduction 32 Two-Group Analyses 33 Test for Covariance Matrix Equality 34 Yao Test 35 Multiple-Group Analyses-Single Factor 36 Computer Application 37 Summary 4 Assessing MANOVA Effects 41 Introduction 42 Strength of Association 43 Computer Application I 44 Group Contrasts 45 Computer Application II 46 Covariance Matrix Heterogeneity 47 Sample Size 48 Summary 5 Describing MANOVA Effects 51 Introduction 52 Omnibus Effects 53 Computer Application I 54 Standardized LDFWeights 55 LDF Space Dimension 56 Computer Application II 57 Computer Application III 58 Contrast Effects 59 Computer Application IV 510 Summary 6 Deleting and Ordering Variables 61 Introduction 62 Variable Deletion 63 Variable Ordering 64 Contrast Analyses 65 Computer Application II 66 Comments 7 Reporting DDA Results 71 Introduction 72 Example of Reporting DDA Results 73 Computer Package Information 74 Reporting Terms 75 MANOVA/DDA Applications 76 Concerns 77 Overview III FACTORIAL MANOVA, MANCOVA, AND REPEATED MEASURES 8 Factorial MANOVA 81 Introduction 82 Research Context 83 Univariate Analysis 84 Multivariate Analysis 85 Computer Application I 86 Computer Application II 87 Nonorthogonal Design 88 Outcome Variable Ordering and Deletion 89 Summary 9 Analysis of Covariance 91 Introduction 92 Research Context 93 Univariate ANCOVA 94 Multivariate ANCOVA (MANCOVA) 95 Computer Application I 96 Comparing Adjusted Means-Omnibus Test 97 Computer Application II 98 Contrast Analysis 99 Computer Application III 910 Summary 10 Repeated-Measures Analysis 101 Introduction 102 Research Context 103 Univariate Analyses 104 Multivariate Analysis 105 Computer Application I 106 Univariate and Multivariate Analyses 107 Testing for Sphericity 108 Computer Application II 109 Contrast Analysis 1010 Computer Application III 1011 Summary 11 Mixed-Model Analysis 111 Introduction 112 Research Context 113 Univariate Analysis 114 Multivariate Analysis 115 Computer Application I 116 Contrast Analysis 117 Computer Application II 118 Summary IV GROUP MEMBERSHIP PREDICTION 12 Classification Basics 121 Introduction 122 Notion of Distance 123 Distance and Classification 124 Classification Rules in General 125 Comments 13 Multivariate Normal Rules 131 Introduction 132 Normal Density Functions 133 Classification Rules Based on Normality 134 Classification Functions 135 Summary of Classification Statistics 136 Choice of Rule Form 137 Comments 14 Classification Results 141 Introduction 142 Research Context 143 Computer Application 144 Individual Unit Results 145 Group Results 146 Comments 15 Hit Rate Estimation 151 Introduction 152 True Hit Rates 153 Hit Rate Estimators 154 Computer Application 155 Choice of Hit Rate Estimator 156 Outliers and In-Doubt Units 157 Sample Size 158 Comments 16 Effectiveness of Classification Rules 161 Introduction 162 Proportional Chance Criterion 163 Maximum-Chance Criterion 164 Improvement over Chance 165 Comparison of Rules 166 Computer Application I 167 Effect of Unequal Priors 168 PDA Validity/Reliability 169 Applying a Classification Rule to New Units 1610 Comments 17 Deleting and Ordering Predictors 171 Introduction 172 Predictor Deletion 173 Computer Application 174 Predictor Ordering 175 Reanalysis 176 Comments 177 Side Note 18 Two-Group Classification 181 Introduction 182 Two-Group Rule 183 Regression Analogy 184 MRA-PDA Relationship 185 Necessary Sample Size 186 Univariate Classification 19 Nonnormal Rules 191 Introduction 192 Continuous Variables 193 Categorical Variables 194 Predictor Mixtures 195 Comments 20 Reporting PDA Results 201 Introduction 202 Example of Reporting PDA Results 203 Some Additional Specific PDA Information 204 Computer Package Information 205 Reporting Terms 206 Sources of PDA Applications 207 Concerns 208 Overview Further Reading Exercises 21 PDA-Related Analyses 211 Introduction 212 Nonlinear Methods 213 Other Methods V ISSUES AND PROBLEMS 22 Issues in PDA and DDA 221 Introduction 222 Five Choices in PDA 223 Stepwise Analyses 224 StandardizedWeights Versus Structure r's 225 Data-Based Structure 23 Problems in PDA and DDA 231 Introduction 232 Missing Data 233 Outliers and Influential Observations 234 Initial Group Misclassification 235 Misclassification Costs 236 Statistical Versus Clinical Prediction 237 Other Problems Appendix A: Data Set Descriptions Appendix B: Some DA-Related Originators Appendix C: List of Computer Syntax Appendix D: Contents ofWileyWebsite References Answers to Exercises Index

460 citations


Journal ArticleDOI
TL;DR: An end-to-end system that provides facial expression codes at 24 frames per second and animates a computer generated character and applies the system to fully automated facial action coding, the best performance reported so far on these datasets.

402 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: A new dimensionality reduction method called local Fisher discriminant analysis (LFDA) is proposed, which is a localized variant of Fisher discriminating analysis that takes local structure of the data into account so the multimodal data can be embedded appropriately.
Abstract: Dimensionality reduction is one of the important preprocessing steps in high-dimensional data analysis. In this paper, we consider the supervised dimensionality reduction problem where samples are accompanied with class labels. Traditional Fisher discriminant analysis is a popular and powerful method for this purpose. However, it tends to give undesired results if samples in some class form several separate clusters, i.e., multimodal. In this paper, we propose a new dimensionality reduction method called local Fisher discriminant analysis (LFDA), which is a localized variant of Fisher discriminant analysis. LFDA takes local structure of the data into account so the multimodal data can be embedded appropriately. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by the kernel trick.

370 citations


Journal ArticleDOI
TL;DR: Two criteria able to find the most convenient division of each class into a set of subclasses are derived and it is shown that this method is always the best or comparable to the best.
Abstract: Over the years, many discriminant analysis (DA) algorithms have been proposed for the study of high-dimensional data in a large variety of problems. Each of these algorithms is tuned to a specific type of data distribution (that which best models the problem at hand). Unfortunately, in most problems the form of each class pdf is a priori unknown, and the selection of the DA algorithm that best fits our data is done over trial-and-error. Ideally, one would like to have a single formulation which can be used for most distribution types. This can be achieved by approximating the underlying distribution of each class with a mixture of Gaussians. In this approach, the major problem to be addressed is that of determining the optimal number of Gaussians per class, i.e., the number of subclasses. In this paper, two criteria able to find the most convenient division of each class into a set of subclasses are derived. Extensive experimental results are shown using five databases. Comparisons are given against linear discriminant analysis (LDA), direct LDA (DLDA), heteroscedastic LDA (HLDA), nonparametric DA (NDA), and kernel-based LDA (K-LDA). We show that our method is always the best or comparable to the best

366 citations


Journal ArticleDOI
TL;DR: As the results reveal, CART and MARS outperform traditional discriminant analysis, logistic regression, neural networks, and support vector machine (SVM) approaches in terms of credit scoring accuracy and hence provide efficient alternatives in implementing credit scoring tasks.

Journal ArticleDOI
TL;DR: Experimental results are given for matching a database of 200 3D face models with 598 2.5D independent test scans acquired under different pose and some lighting and expression changes, showing the feasibility of the proposed matching scheme.
Abstract: The performance of face recognition systems that use two-dimensional images depends on factors such as lighting and subject's pose. We are developing a face recognition system that utilizes three-dimensional shape information to make the system more robust to arbitrary pose and lighting. For each subject, a 3D face model is constructed by integrating several 2.5D face scans which are captured from different views. 2.5D is a simplified 3D (x,y,z) surface representation that contains at most one depth value (z direction) for every point in the (x, y) plane. Two different modalities provided by the facial scan, namely, shape and texture, are utilized and integrated for face matching. The recognition engine consists of two components, surface matching and appearance-based matching. The surface matching component is based on a modified iterative closest point (ICP) algorithm. The candidate list from the gallery used for appearance matching is dynamically generated based on the output of the surface matching component, which reduces the complexity of the appearance-based matching stage. Three-dimensional models in the gallery are used to synthesize new appearance samples with pose and illumination variations and the synthesized face images are used in discriminant subspace analysis. The weighted sum rule is applied to combine the scores given by the two matching components. Experimental results are given for matching a database of 200 3D face models with 598 2.5D independent test scans acquired under different pose and some lighting and expression changes. These results show the feasibility of the proposed matching scheme.

Posted Content
TL;DR: In this paper, the authors discuss how bankruptcy prediction studies have evolved, highlighting the different methods, number and variety of factors, and specific uses of models, and suggest that multivariate discriminant analysis and neural networks are the most promising methods for bankruptcy prediction models.
Abstract: One of the most well-known bankruptcy prediction models was developed by Altman [1968] using multivariate discriminant analysis. Since Altman's model, a multitude of bankruptcy prediction models have flooded the literature. The primary goal of this paper is to summarize and analyze existing research on bankruptcy prediction studies in order to facilitate more productive future research in this area. This paper traces the literature on bankruptcy prediction from the 1930's, when studies focused on the use of simple ratio analysis to predict future bankruptcy, to present. The authors discuss how bankruptcy prediction studies have evolved, highlighting the different methods, number and variety of factors, and specific uses of models. Analysis of 165 bankruptcy prediction studies published from 1965 to present reveals trends in model development. For example, discriminant analysis was the primary method used to develop models in the 1960's and 1970's. Investigation of model type by decade shows that the primary method began to shift to logit analysis and neural networks in the 1980's and 1990's. The number of factors utilized in models is also analyzed by decade, showing that the average has varied over time but remains around 10 overall. Analysis of accuracy of the models suggests that multivariate discriminant analysis and neural networks are the most promising methods for bankruptcy prediction models. The findings also suggest that higher model accuracy is not guaranteed with a greater number of factors. Some models with two factors are just as capable of accurate prediction as models with 21 factors.

Journal ArticleDOI
TL;DR: Two supervised methods for enhancing the classification accuracy of the Nonnegative Matrix Factorization (NMF) algorithm are presented and greatly enhance the performance of NMF for frontal face verification.
Abstract: In this paper, two supervised methods for enhancing the classification accuracy of the Nonnegative Matrix Factorization (NMF) algorithm are presented. The idea is to extend the NMF algorithm in order to extract features that enforce not only the spatial locality, but also the separability between classes in a discriminant manner. The first method employs discriminant analysis in the features derived from NMF. In this way, a two-phase discriminant feature extraction procedure is implemented, namely NMF plus Linear Discriminant Analysis (LDA). The second method incorporates the discriminant constraints inside the NMF decomposition. Thus, a decomposition of a face to its discriminant parts is obtained and new update rules for both the weights and the basis images are derived. The introduced methods have been applied to the problem of frontal face verification using the well-known XM2VTS database. Both methods greatly enhance the performance of NMF for frontal face verification

Journal Article
TL;DR: Probabilistic LDA (PLDA) as discussed by the authors is a generative probability model with which the features of LDA can be extracted and combined for recognition, and the latent variables of PLDA represent both the class of an object and the view of the object within a class, by making examples of the same class share the class variable.
Abstract: Linear dimensionality reduction methods, such as LDA, are often used in object recognition for feature extraction, but do not address the problem of how to use these features for recognition. In this paper, we propose Probabilistic LDA, a generative probability model with which we can both extract the features and combine them for recognition. The latent variables of PLDA represent both the class of the object and the view of the object within a class. By making examples of the same class share the class variable, we show how to train PLDA and use it for recognition on previously unseen classes. The usual LDA features are derived as a result of training PLDA, but in addition have a probability model attached to them, which automatically gives more weight to the more discriminative features. With PLDA, we can build a model of a previously unseen class from a single example, and can combine multiple examples for a better representation of the class. We show applications to classification, hypothesis testing, class inference, and clustering, on classes not observed during training.

Journal ArticleDOI
TL;DR: A novel pattern recognition framework that integrates Gabor image representation, a novel multiclass kernel Fisher analysis (KFA) method, and fractional power polynomial models for improving pattern recognition performance is presented.
Abstract: This paper presents a novel pattern recognition framework by capitalizing on dimensionality increasing techniques. In particular, the framework integrates Gabor image representation, a novel multiclass kernel Fisher analysis (KFA) method, and fractional power polynomial models for improving pattern recognition performance. Gabor image representation, which increases dimensionality by incorporating Gabor filters with different scales and orientations, is characterized by spatial frequency, spatial locality, and orientational selectivity for coping with image variabilities such as illumination variations. The KFA method first performs nonlinear mapping from the input space to a high-dimensional feature space, and then implements the multiclass Fisher discriminant analysis in the feature space. The significance of the nonlinear mapping is that it increases the discriminating power of the KFA method, which is linear in the feature space but nonlinear in the input space. The novelty of the KFA method comes from the fact that 1) it extends the two-class kernel Fisher methods by addressing multiclass pattern classification problems and 2) it improves upon the traditional generalized discriminant analysis (GDA) method by deriving a unique solution (compared to the GDA solution, which is not unique). The fractional power polynomial models further improve performance of the proposed pattern recognition framework. Experiments on face recognition using both the FERET database and the FRGC (face recognition grand challenge) databases show the feasibility of the proposed framework. In particular, experimental results using the FERET database show that the KFA method performs better than the GDA method and the fractional power polynomial models help both the KFA method and the GDA method improve their face recognition performance. Experimental results using the FRGC databases show that the proposed pattern recognition framework improves face recognition performance upon the BEE baseline algorithm and the LDA-based baseline algorithm by large margins.

Journal ArticleDOI
TL;DR: A heuristic method for learning error correcting output codes matrices based on a hierarchical partition of the class space that maximizes a discriminative criterion is presented, validated using the UCI database and applied to a real problem, the classification of traffic sign images.
Abstract: We present a heuristic method for learning error correcting output codes matrices based on a hierarchical partition of the class space that maximizes a discriminative criterion. To achieve this goal, the optimal codeword separation is sacrificed in favor of a maximum class discrimination in the partitions. The creation of the hierarchical partition set is performed using a binary tree. As a result, a compact matrix with high discrimination power is obtained. Our method is validated using the UCI database and applied to a real problem, the classification of traffic sign images.

Journal ArticleDOI
TL;DR: The experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems.
Abstract: Many supervised machine learning tasks can be cast as multi-class classification problems. Support vector machines (SVMs) excel at binary classification problems, but the elegant theory behind large-margin hyperplane cannot be easily extended to their multi-class counterparts. On the other hand, it was shown that the decision hyperplanes for binary classification obtained by SVMs are equivalent to the solutions obtained by Fisher's linear discriminant on the set of support vectors. Discriminant analysis approaches are well known to learn discriminative feature transformations in the statistical pattern recognition literature and can be easily extend to multi-class cases. The use of discriminant analysis, however, has not been fully experimented in the data mining literature. In this paper, we explore the use of discriminant analysis for multi-class classification problems. We evaluate the performance of discriminant analysis on a large collection of benchmark datasets and investigate its usage in text categorization. Our experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems.

Journal ArticleDOI
TL;DR: Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds, and model fitting by maximizing the AUC should be considered when the goal is to derive a marker combination score for classification or prediction.
Abstract: No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically, the objective function that is optimized for combining markers is the likelihood function. In this article, we consider an alternative objective function-the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression, it yields consistent estimation with case-control or cohort data. Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood-based scores when the logistic regression model holds. Analysis of data from a proteomics biomarker study shows that performance can be far superior to logistic regression derived scores when the logistic regression model does not hold. Model fitting by maximizing the AUC rather than the likelihood should be considered when the goal is to derive a marker combination score for classification or prediction.

Journal ArticleDOI
TL;DR: A novel weakness analysis theory is developed that attempts to boost a strong learner by increasing the diversity between the classifiers created by the learner, at the expense of decreasing their margins, so as to achieve a tradeoff suggested by recent boosting studies for a low generalization error.
Abstract: In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as "boosting". However, it is generally believed that boosting-like learning rules are not suited to a strong and stable learner such as LDA. To break the limitation, a novel weakness analysis theory is developed here. The theory attempts to boost a strong learner by increasing the diversity between the classifiers created by the learner, at the expense of decreasing their margins, so as to achieve a tradeoff suggested by recent boosting studies for a low generalization error. In addition, a novel distribution accounting for the pairwise class discriminant information is introduced for effective interaction between the booster and the LDA-based learner. The integration of all these methodologies proposed here leads to the novel ensemble-based discriminant learning approach, capable of taking advantage of both the boosting and LDA techniques. Promising experimental results obtained on various difficult face recognition scenarios demonstrate the effectiveness of the proposed approach. We believe that this work is especially beneficial in extending the boosting framework to accommodate general (strong/weak) learners.

Journal ArticleDOI
TL;DR: This work presents a theoretical framework for achieving the best of both types of methods: an approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers.
Abstract: Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA, which, on the other hand, are better suited for classification tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving the best of both types of methods: an approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification with a high-breakdown point. We also show that subspace methods, such as CCA, which are used for solving regression tasks, can be treated in a similar manner. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.

Journal ArticleDOI
TL;DR: Nonlinear kernel based classifiers are applied within the Bayesian evidence framework in order to automatically infer and analyze the creditworthiness of potential corporate clients and yield better performances than linear discriminant analysis and logistic regression when applied to a real-life data set concerning commercial credit granting to mid-cap Belgian and Dutch firms.

Journal ArticleDOI
TL;DR: This work proposes a method of classifying collections of temporal gene expression curves in which individual expression profiles are modeled as independent realizations of a stochastic process, and demonstrates that this methodology provides low-error-rate classification for both yeast cell-cycle gene expression profiles and Dictyostelium cell-type specific gene expression patterns.
Abstract: Motivation: Temporal gene expression profiles provide an important characterization of gene function, as biological systems are predominantly developmental and dynamic. We propose a method of classifying collections of temporal gene expression curves in which individual expression profiles are modeled as independent realizations of a stochastic process. The method uses a recently developed functional logistic regression tool based on functional principal components, aimed at classifying gene expression curves into known gene groups. The number of eigenfunctions in the classifier can be chosen by leave-one-out cross-validation with the aim of minimizing the classification error. Results: We demonstrate that this methodology provides low-error-rate classification for both yeast cell-cycle gene expression profiles and Dictyostelium cell-type specific gene expression patterns. It also works well in simulations. We compare our functional principal components approach with a B-spline implementation of functional discriminant analysis for the yeast cell-cycle data and simulations. This indicates comparative advantages of our approach which uses fewer eigenfunctions/base functions. The proposed methodology is promising for the analysis of temporal gene expression data and beyond. Availability: MATLAB programs are available upon request. Contact: [email protected] Supplementary information: Supplementary materials are available on the journal's website.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: A method for object detection that combines AdaBoost learning with local histogram features that outperforms all methods reported in [5] for 7 out of 8 detection tasks and four object classes.
Abstract: We present a method for object detection that combines AdaBoost learning with local histogram features. On the side of learning we improve the performance by designing a weak learner for multi-valued features based on Weighted Fisher Linear Discriminant. Evaluation on the recent benchmark for object detection confirms the superior performance of our method compared to the state-of-the-art. In particular, using a single set of parameters our approach outperforms all methods reported in [5] for 7 out of 8 detection tasks and four object classes.

Journal ArticleDOI
TL;DR: Experimental results are reported on a real-world image collection to demonstrate that the proposed methods outperform the traditional kernel BDA (KBDA) and the support vector machine (SVM) based RF algorithms.
Abstract: In recent years, a variety of relevance feedback (RF) schemes have been developed to improve the performance of content-based image retrieval (CBIR). Given user feedback information, the key to a RF scheme is how to select a subset of image features to construct a suitable dissimilarity measure. Among various RF schemes, biased discriminant analysis (BDA) based RF is one of the most promising. It is based on the observation that all positive samples are alike, while in general each negative sample is negative in its own way. However, to use BDA, the small sample size (SSS) problem is a big challenge, as users tend to give a small number of feedback samples. To explore solutions to this issue, this paper proposes a direct kernel BDA (DKBDA), which is less sensitive to SSS. An incremental DKBDA (IDKBDA) is also developed to speed up the analysis. Experimental results are reported on a real-world image collection to demonstrate that the proposed methods outperform the traditional kernel BDA (KBDA) and the support vector machine (SVM) based RF algorithms

Book ChapterDOI
07 May 2006
TL;DR: A novel algorithm called Common Discriminant Feature Extraction specially tailored to the inter-modality face recognition problem is proposed and two nonlinear extensions of the algorithm are developed: one is based on kernelization, while the other is a multi-mode framework.
Abstract: Recently, the wide deployment of practical face recognition systems gives rise to the emergence of the inter-modality face recognition problem. In this problem, the face images in the database and the query images captured on spot are acquired under quite different conditions or even using different equipments. Conventional approaches either treat the samples in a uniform model or introduce an intermediate conversion stage, both of which would lead to severe performance degradation due to the great discrepancies between different modalities. In this paper, we propose a novel algorithm called Common Discriminant Feature Extraction specially tailored to the inter-modality problem. In the algorithm, two transforms are simultaneously learned to transform the samples in both modalities respectively to the common feature space. We formulate the learning objective by incorporating both the empirical discriminative power and the local smoothness of the feature transformation. By explicitly controlling the model complexity through the smoothness constraint, we can effectively reduce the risk of overfitting and enhance the generalization capability. Furthermore, to cope with the nongaussian distribution and diverse variations in the sample space, we develop two nonlinear extensions of the algorithm: one is based on kernelization, while the other is a multi-mode framework. These extensions substantially improve the recognition performance in complex situation. Extensive experiments are conducted to test our algorithms in two application scenarios: optical image-infrared image recognition and photo-sketch recognition. Our algorithms show excellent performance in the experiments.

Journal ArticleDOI
TL;DR: The Mixture Modeling (MIXMOD) program fits mixture models to a given data set for the purposes of density estimation, clustering or discriminant analysis, and fourteen different Gaussian models can be distinguished according to different assumptions regarding the component variance matrix eigenvalue decomposition.

Journal ArticleDOI
28 Feb 2006-Talanta
TL;DR: The use of genetic algorithms (GA) for variable selection methods was found to enhance the classification performance of the PLS-DA models, and various metabolites were identified that are responsible for the observed separations.

Journal ArticleDOI
TL;DR: Empirical experimentation suggests that the SVM outperforms the other classification methods in terms of predicting the direction of the stock market movement and random forest method outperforms neural network, discriminant analysis and logit model used in this study.
Abstract: There exists vast research articles which predict the stock market as well pricing of stock index financial instruments but most of the proposed models focus on the accurate forecasting of the levels (i.e. value) of the underlying stock index. There is a lack of studies examining the predictability of the direction/sign of stock index movement. Given the notion that a prediction with little forecast error does not necessarily translate into capital gain, this study is an attempt to predict the direction of S&P CNX NIFTY Market Index of the National Stock Exchange, one of the fastest growing financial exchanges in developing Asian countries. Random forest and Support Vector Machines (SVM) are very specific type of machine learning method, and are promising tools for the prediction of financial time series. The tested classification models, which predict direction, include linear discriminant analysis, logit, artificial neural network, random forest and SVM. Empirical experimentation suggests that the SVM outperforms the other classification methods in terms of predicting the direction of the stock market movement and random forest method outperforms neural network, discriminant analysis and logit model used in this study.

Journal Article
TL;DR: The main result shows that under a mild condition which holds in many applications involving high-dimensional data, NLDA is equivalent to OLDA, which confirms the effectiveness of the regularization in ROLDA.
Abstract: Dimensionality reduction is an important pre-processing step in many applications. Linear discriminant analysis (LDA) is a classical statistical approach for supervised dimensionality reduction. It aims to maximize the ratio of the between-class distance to the within-class distance, thus maximizing the class discrimination. It has been used widely in many applications. However, the classical LDA formulation requires the nonsingularity of the scatter matrices involved. For undersampled problems, where the data dimensionality is much larger than the sample size, all scatter matrices are singular and classical LDA fails. Many extensions, including null space LDA (NLDA) and orthogonal LDA (OLDA), have been proposed in the past to overcome this problem. NLDA aims to maximize the between-class distance in the null space of the within-class scatter matrix, while OLDA computes a set of orthogonal discriminant vectors via the simultaneous diagonalization of the scatter matrices. They have been applied successfully in various applications. In this paper, we present a computational and theoretical analysis of NLDA and OLDA. Our main result shows that under a mild condition which holds in many applications involving high-dimensional data, NLDA is equivalent to OLDA. We have performed extensive experiments on various types of data and results are consistent with our theoretical analysis. We further apply the regularization to OLDA. The algorithm is called regularized OLDA (or ROLDA for short). An efficient algorithm is presented to estimate the regularization value in ROLDA. A comparative study on classification shows that ROLDA is very competitive with OLDA. This confirms the effectiveness of the regularization in ROLDA.