Topic

Linear discriminant analysis

About: Linear discriminant analysis is a research topic. Over the lifetime, 18361 publications have been published within this topic receiving 603195 citations. The topic is also known as: Linear discriminant analysis & LDA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Determination of minimum sample size and discriminatory expression patterns in microarray data.

[...]

Daehee Hwang¹, William A. Schmitt¹, George Stephanopoulos¹, Gregory Stephanopoulos¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Sep 2002-Bioinformatics

TL;DR: The power analysis algorithm calculates the appropriate sample size for discrimination of phenotypic subtypes in a reduced dimensional space obtained by Fisher discriminant analysis (FDA), and it was confirmed that when the minimum number of samples estimated from power analysis is used, group means in the FDA discrimination space are statistically different.

...read moreread less

Abstract: Motivation: Transcriptional profiling using microarrays can reveal important information about cellular and tissue expression phenotypes, but these measurements are costly and time consuming. Additionally, tissue sample availability poses further constraints on the number of arrays that can be analyzed in connection with a particular disease or state of interest. It is therefore important to provide a method for the determination of the minimum number of microarrays required to separate, with statistical reliability, distinct disease states or other physiological differences. Results: Power analysis was applied to estimate the minimum sample size required for two-class and multi-class discrimination. The power analysis algorithm calculates the appropriate sample size for discrimination of phenotypic subtypes in a reduced dimensional space obtained by Fisher discriminant analysis (FDA). This approach was tested by applying the algorithm to existing data sets for estimation of the minimum sample size required for drawing certain conclusions on multi-class distinction with statistical reliability. It was confirmed that when the minimum number of samples estimated from power analysis is used, group means in the FDA discrimination space are statistically different.

...read moreread less

142 citations

Journal Article•DOI•

Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data

[...]

Piotr S. Gromski¹, Yun Xu, Helen L. Kotze¹, Elon Correa, David I. Ellis¹, Emily G. Armitage¹, Michael L. Turner¹, Royston Goodacre - Show less +4 more•Institutions (1)

University of Manchester¹

16 Jun 2014-Metabolites

TL;DR: Different substitutes of missing values namely: zero, mean, median, k-nearest neighbours (kNN) and random forest (RF) imputation are analysed in terms of their influence on unsupervised and supervised learning and, thus, their impact on the final output(s) of biological interpretation.

...read moreread less

Abstract: Missing values are known to be problematic for the analysis of gas chromatography-mass spectrometry (GC-MS) metabolomics data. Typically these values cover about 10%–20% of all data and can originate from various backgrounds, including analytical, computational, as well as biological. Currently, the most well known substitute for missing values is a mean imputation. In fact, some researchers consider this aspect of data analysis in their metabolomics pipeline as so routine that they do not even mention using this replacement approach. However, this may have a significant influence on the data analysis output(s) and might be highly sensitive to the distribution of samples between different classes. Therefore, in this study we have analysed different substitutes of missing values namely: zero, mean, median, k-nearest neighbours (kNN) and random forest (RF) imputation, in terms of their influence on unsupervised and supervised learning and, thus, their impact on the final output(s) in terms of biological interpretation. These comparisons have been demonstrated both visually and computationally (classification rate) to support our findings. The results show that the selection of the replacement methods to impute missing values may have a considerable effect on the classification accuracy, if performed incorrectly this may negatively influence the biomarkers selected for an early disease diagnosis or identification of cancer related metabolites. In the case of GC-MS metabolomics data studied here our findings recommend that RF should be favored as an imputation of missing value over the other tested methods. This approach displayed excellent results in terms of classification rate for both supervised methods namely: principal components-linear discriminant analysis (PC-LDA) (98.02%) and partial least squares-discriminant analysis (PLS-DA) (97.96%) outperforming other imputation methods.

...read moreread less

142 citations

Journal Article•DOI•

Double-bagging: combining classifiers by bootstrap aggregation

[...]

Torsten Hothorn¹, Berthold Lausen¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Jun 2003-Pattern Recognition

TL;DR: The combination of classifiers leads to substantial reduction of misclassification error in a wide range of applications and benchmark problems and the procedure performs comparable to the best classifiers used in a number of artificial examples and applications.

...read moreread less

141 citations

Journal Article•DOI•

Regularized discriminant analysis and its application to face recognition

[...]

Dao-Qing Dai¹, Pong C. Yuen²•Institutions (2)

Sun Yat-sen University¹, Hong Kong Baptist University²

01 Mar 2003-Pattern Recognition

TL;DR: A number of methods have been proposed in the last decade to overcome the limitation of LDA on small sample size, and these methods, in applying to face recognition, can be roughly grouped into three categories.

...read moreread less

141 citations

Journal Article•DOI•

Modelling sparse generalized longitudinal observations with latent Gaussian processes

[...]

Peter Hall¹, Hans-Georg Müller², Fang Yao³•Institutions (3)

University of Melbourne¹, University of California, Davis², University of Toronto³

01 Sep 2008-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: This work develops functional principal components analysis for this situation and demonstrates the prediction of individual trajectories from sparse observations and can handle missing data and lead to predictions of the functional principal component scores which serve as random effects in this model.

...read moreread less

Abstract: Summary In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time The repeated observations could be binomial, Poisson or of another discrete type or could be continuousThe timings of the repeated measurements are often sparse and irregular We introduce a latent Gaussian process model for such data, establishing a connection to functional data analysis The functional methods proposed are non-parametric and computationally straightforward as they do not involve a likelihood We develop functional principal components analysis for this situation and demonstrate the prediction of individual trajectories from sparse observations This method can handle missing data and leads to predictions of the functional principal component scores which serve as random effects in this modelThese scores can then be used for further statistical analysis, such as inference, regression, discriminant analysis or clustering We illustrate these non-parametric methods with longitudinal data on primary biliary cirrhosis and show in simulations that they are competitive in comparisons with generalized estimating equations and generalized linear mixed models

...read moreread less

141 citations

Collapse

Network Information

Performance

Metrics

20,826

Papers

671,342

Citations

No. of papers in the topic in previous years
Year	Papers
2025	1
2024	2
2023	756
2022	1,711
2021	678
2020	815

Linear discriminant analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics