scispace - formally typeset
Search or ask a question
Topic

Linear discriminant analysis

About: Linear discriminant analysis is a research topic. Over the lifetime, 18361 publications have been published within this topic receiving 603195 citations. The topic is also known as: Linear discriminant analysis & LDA.


Papers
More filters
Book
21 Mar 2013
TL;DR: In this article, the authors present an analysis of multivariate data and the forward search for regression data in order to find a Multivariate Transformations to Normality (MTN) with the Forward Search.
Abstract: Contents Preface Notation 1 Examples of Multivariate Data 1.1 In.uence, Outliers and Distances 1.2 A Sketch of the Forward Search 1.3 Multivariate Normality and our Examples 1.4 Swiss Heads 1.5 National Track Records forWomen 1.6 Municipalities in Emilia-Romagna 1.7 Swiss Bank Notes 1.8 Plan of the Book 2 Multivariate Data and the Forward Search 2.1 The Univariate Normal Distribution 2.1.1 Estimation 2.1.2 Distribution of Estimators 2.2 Estimation and the Multivariate Normal Distribution 2.2.1 The Multivariate Normal Distribution 2.2.2 The Wishart Distribution 2.2.3 Estimation of O 2.3 Hypothesis Testing 2.3.1 Hypotheses About the Mean 2.3.2 Hypotheses About the Variance 2.4 The Mahalanobis Distance 2.5 Some Deletion Results 2.5.1 The Deletion Mahalanobis Distance 2.5.2 The (Bartlett)-Sherman-Morrison-Woodbury Formula 2.5.3 Deletion Relationships Among Distances 2.6 Distribution of the Squared Mahalanobis Distance 2.7 Determinants of Dispersion Matrices and the Squared Mahalanobis Distance 2.8 Regression 2.9 Added Variables in Regression 2.10 TheMean Shift OutlierModel 2.11 Seemingly Unrelated Regression 2.12 The Forward Search 2.13 Starting the Search 2.13.1 The Babyfood Data 2.13.2 Robust Bivariate Boxplots from Peeling 2.13.3 Bivariate Boxplots from Ellipses 2.13.4 The Initial Subset 2.14 Monitoring the Search 2.15 The Forward Search for Regression Data 2.15.1 Univariate Regression 2.15.2 Multivariate Regression 2.16 Further Reading 2.17 Exercises 2.18 Solutions 3 Data from One Multivariate Distribution 3.1 Swiss Heads 3.2 National Track Records for Women 3.3 Municipalities in Emilia-Romagna 3.4 Swiss Bank Notes 3.5 What Have We Seen? 3.6 Exercises 3.7 Solutions 4 Multivariate Transformations to Normality 4.1 Background 4.2 An Introductory Example: the Babyfood Data 4.3 Power Transformations to Approximate Normality 4.3.1 Transformation of the Response in Regression 4.3.2 Multivariate Transformations to Normality 4.4 Score Tests for Transformations 4.5 Graphics for Transformations 4.6 Finding a Multivariate Transformation with the Forward Search 4.7 Babyfood Data 4.8 Swiss Heads 4.9 Horse Mussels 4.10 Municipalities in Emilia-Romagna 4.10.1 Demographic Variables 4.10.2 Wealth Variables 4.10.3 Work Variables 4.10.4 A Combined Analysis 4.11 National Track Records for Women 4.12 Dyestuff Data 4.13 Babyfood Data and Variable Selection 4.14 Suggestions for Further Reading 4.15 Exercises 4.16 Solutions 5 Principal Components Analysis 5.1 Background 5.2 Principal Components and Eigenvectors 5.2.1 Linear Transformations and Principal Components . 5.2.2 Lack of Scale Invariance and Standardized Variables 5.2.3 The Number of Components 5.3 Monitoring the Forward Search 5.3.1 Principal Components and Variances 5.3.2 Principal Component Scores 5.3.3 Correlations Between Variables and Principal Components 5.3.4 Elements of the Eigenvectors 5.4 The Biplot and the Singular Value Decomposition 5.5 Swiss Heads 5.6 Milk Data 5.7 Quality of Life 5.8 Swiss Bank Notes 5.8.1 Forgeries and Genuine Notes 5.8.2 Forgeries Alone 5.9 Municipalities in Emilia-Romagna 5.10 Further reading 5.11 Exercises 5.12 Solutions 6 Discriminant Analysis 6.1 Background 6.2 An Outline of Discriminant Analysis 6.2.1 Bayesian Discrimination 6.2.2 Quadratic Discriminant Analysis 6.2.3 Linear Discriminant Analysis 6.2.4 Estimation of Means and Variances 6.2.5 Canonical Variates 6.2.6 Assessment of Discriminant Rules 6.3 The Forward Search 6.3.1 Step 1: Choice of the Initial Subset 6.3.2 Step 2: Adding

202 citations

Journal ArticleDOI
TL;DR: The results documented in this study may provide a reference for the optimum quantitative EEG features to use in developing and enhancing neonatal seizure detection algorithms.

202 citations

Journal ArticleDOI
TL;DR: This work shows that with an appropriate combination of kernels a significant boost in classification performance is possible, and indicates the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.
Abstract: Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.

202 citations

Journal ArticleDOI
TL;DR: RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring and get the better results than five single classifiers and four popular ensemble classifiers.
Abstract: Decision tree (DT) is one of the most popular classification algorithms in data mining and machine learning. However, the performance of DT based credit scoring model is often relatively poorer than other techniques. This is mainly due to two reasons: DT is easily affected by (1) the noise data and (2) the redundant attributes of data under the circumstance of credit scoring. In this study, we propose two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT, which are based on two ensemble strategies: bagging and random subspace, to reduce the influences of the noise data and the redundant attributes of data and to get the relatively higher classification accuracy. Two real world credit datasets are selected to demonstrate the effectiveness and feasibility of proposed methods. Experimental results reveal that single DT gets the lowest average accuracy among five single classifiers, i.e., Logistic Regression Analysis (LRA), Linear Discriminant Analysis (LDA), Multi-layer Perceptron (MLP) and Radial Basis Function Network (RBFN). Moreover, RS-Bagging DT and Bagging-RS DT get the better results than five single classifiers and four popular ensemble classifiers, i.e., Bagging DT, Random Subspace DT, Random Forest and Rotation Forest. The results show that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring.

202 citations

Journal ArticleDOI
TL;DR: This work proposes a methodology for the automatic detection of normal and Coronary Artery Disease conditions using heart rate signals and shows that the ICA coupled with GMM classifier combination resulted in highest accuracy and sensitivity compared to other data reduction techniques and classifiers.
Abstract: Coronary Artery Disease (CAD) is the narrowing of the blood vessels that supply blood and oxygen to the heart. Electrocardiogram (ECG) is an important cardiac signal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insights into the state of health and nature of the disease afflicting the heart. However, it is very difficult to perceive the subtle changes in ECG signals which indicate a particular type of cardiac abnormality. Hence, we have used the heart rate signals from the ECG for the diagnosis of cardiac health. In this work, we propose a methodology for the automatic detection of normal and Coronary Artery Disease conditions using heart rate signals. The heart rate signals are decomposed into frequency sub-bands using Discrete Wavelet Transform (DWT). Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Independent Component Analysis (ICA) were applied on the set of DWT coefficients extracted from particular sub-bands in order to reduce the data dimension. The selected sets of features were fed into four different classifiers: Support Vector Machine (SVM), Gaussian Mixture Model (GMM), Probabilistic Neural Network (PNN) and K-Nearest Neighbor (KNN). Our results showed that the ICA coupled with GMM classifier combination resulted in highest accuracy of 96.8%, sensitivity of 100% and specificity of 93.7% compared to other data reduction techniques (PCA and LDA) and classifiers. Overall, compared to previous techniques, our proposed strategy is more suitable for diagnosis of CAD with higher accuracy.

202 citations


Network Information
Related Topics (5)
Regression analysis
31K papers, 1.7M citations
85% related
Artificial neural network
207K papers, 4.5M citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Cluster analysis
146.5K papers, 2.9M citations
79% related
Image segmentation
79.6K papers, 1.8M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20251
20242
2023756
20221,711
2021678
2020815