scispace - formally typeset
Search or ask a question
Topic

Linear discriminant analysis

About: Linear discriminant analysis is a research topic. Over the lifetime, 18361 publications have been published within this topic receiving 603195 citations. The topic is also known as: Linear discriminant analysis & LDA.


Papers
More filters
Proceedings ArticleDOI
11 Aug 2002
TL;DR: This paper proposes a new method making use of the null space of S/sub w/ effectively and solve the small sample size problem of LDA, and compares its method with several well-known methods.
Abstract: The small sample size problem is often encountered in pattern recognition. It results in the singularity of the within-class scattering matrix S/sub w/ in linear discriminant analysis (LDA). Different methods have been proposed to solve this problem in face recognition literature. Some methods reduce the dimension of the original sample space and hence unavoidably remove the null space of S/sub w/, which has been demonstrated to contain considerable discriminative information; whereas other methods suffer from the computational problem. In this paper, we propose a new method making use of the null space of S/sub w/ effectively and solve the small sample size problem of LDA. We compare our method with several well-known methods, and demonstrate the efficiency of our method.

272 citations

Journal ArticleDOI
TL;DR: Experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed ensemble method can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting.
Abstract: Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%).

272 citations

Journal ArticleDOI
TL;DR: In this paper, a general approach to estimating linear statistical relationships is presented, which includes three lectures on linear functional and structural relationships, factor analysis, and simultaneous equations models, focusing on the similarity of maximum likelihood estimators under normality in the different models.
Abstract: This paper on estimating linear statistical relationships includes three lectures on linear functional and structural relationships, factor analysis, and simultaneous equations models. The emphasis is on relating the several models by a general approach and on the similarity of maximum likelihood estimators (under normality) in the different models. In the first two lectures the observable vector is decomposed into a "systematic part" and a random error; the systematic part satisfies the linear relationships. Estimators are derived for several cases and some of their properties given. Estimation of the coefficients of a single equation in a simultaneous equations model is shown to be equivalent to estimation of linear functional relationships.

272 citations

Proceedings ArticleDOI
02 Nov 2004
TL;DR: This paper presents a novel methodology for predicting fault prone modules, based on random forests, an extension of decision tree learning that generates hundreds or even thousands of trees using subsets of the training data.
Abstract: Accurate prediction of fault prone modules (a module is equivalent to a C function or a C+ + method) in software development process enables effective detection and identification of defects. Such prediction models are especially beneficial for large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a novel methodology for predicting fault prone modules, based on random forests. Random forests are an extension of decision tree learning. Instead of generating one decision tree, this methodology generates hundreds or even thousands of trees using subsets of the training data. Classification decision is obtained by voting. We applied random forests in five case studies based on NASA data sets. The prediction accuracy of the proposed methodology is generally higher than that achieved by logistic regression, discriminant analysis and the algorithms in two machine learning software packages, WEKA [I. H. Witten et al. (1999)] and See5. The difference in the performance of the proposed methodology over other methods is statistically significant. Further, the classification accuracy of random forests is more significant over other methods in larger data sets.

272 citations

Book ChapterDOI
01 Jan 2017
TL;DR: This chapter describes a solution that applies a linear transformation to source features to align them with target features before classifier training, and proposes to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high.
Abstract: In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves state-of-the-art performance on standard benchmark datasets. Our code is available at: https://github.com/VisionLearningGroup/CORAL.

271 citations


Network Information
Related Topics (5)
Regression analysis
31K papers, 1.7M citations
85% related
Artificial neural network
207K papers, 4.5M citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Cluster analysis
146.5K papers, 2.9M citations
79% related
Image segmentation
79.6K papers, 1.8M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023756
20221,711
2021678
2020815