Showing papers on "Principal component analysis published in 2003"

PDF

Open Access

Book•

Chemometrics: Data Analysis for the Laboratory and Chemical Plant

[...]

12 Mar 2003

TL;DR: The concept and need for Principal Components Analysis, a method forsupervised Pattern Recognition: Cluster Analysis, and its application in Chemistry are explained.

...read moreread less

Abstract: Preface. Supplementary Information. Acknowledgements. 1. INTRODUCTION. Points of View. Software and Calculations. Further Reading. References. 2. EXPERIMENTAL DESIGN. Introduction. Basic Principles. Factorial Designs. Central Composite or Response Surface Designs. Mixture Designs. Simplex Optimisation. Problems. 3. SIGNAL PROCESSING. Sequential Signals in Chemistry. Basics. Linear Filters. Correlograms and Time Series Analysis. Fourier Transform Techniques. Topical Methods. Problems. 4. PATTERN RECOGNITION. Introduction. The Concept and Need for Principal Components Analysis. Principal Components Analysis: the Method. Unsupervised Pattern Recognition: Cluster Analysis. Supervised Pattern Recognition. Multiway Pattern Recognition. Problems. 5. CALIBRATION. Introduction. Univariate Calibration. Multiple Linear Regression. Principal Components Regression. Partial Least Squares. Model Validation. Problems. 6. EVOLUTIONARY SIGNALS. Introduction. Exploratory Data Analysis and Preprocessing. Determining Composition. Resolution. Problems. Appendices A.1 Vectors and Matrices. A.2 Algorithms. A.3 Basic Statistical Concepts. A.4 Excel for Chemometrics. A.5 Matlab for Chemometrics. Index

...read moreread less

1,411 citations

Book Chapter•DOI•

Singular value decomposition and principal component analysis

[...]

Michael E. Wall¹, Andreas Rechtsteiner¹, Andreas Rechtsteiner², Luis M. Rocha¹•Institutions (2)

Los Alamos National Laboratory¹, Portland State University²

01 Jan 2003-arXiv: Biological Physics

TL;DR: This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data, and describes the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix.

...read moreread less

Abstract: This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.

...read moreread less

1,211 citations

Journal Article•DOI•

Assessment of the surface water quality in Northern Greece.

[...]

Vasil Simeonov¹, John A. Stratis², Constantini Samara², Georgios Zachariadis², Dimitra Voutsa², Aristidis N. Anthemidis, M. Sofoniou², Th-H. Kouimtzis² - Show less +4 more•Institutions (2)

Sofia University¹, Aristotle University of Thessaloniki²

01 Oct 2003-Water Research

TL;DR: The necessity and usefulness of multivariate statistical assessment of large and complex databases in order to get better information about the quality of surface water, the design of sampling and analytical protocols and the effective pollution control/management of the surface waters is presented.

...read moreread less

1,136 citations

Proceedings Article•

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

[...]

Neil D. Lawrence¹•Institutions (1)

University of Sheffield¹

09 Dec 2003

TL;DR: A new underlying probabilistic model for principal component analysis (PCA) is introduced that shows that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, and is extended by considering less restrictive covariance functions which allow non-linear mappings.

...read moreread less

Abstract: In this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance functions which allow non-linear mappings. This more general Gaussian process latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our non-linear algorithm can be further kernelised leading to 'twin kernel PCA' in which a mapping between feature spaces occurs.

...read moreread less

843 citations

Journal Article•DOI•

A Modified Principal Component Technique Based on the LASSO

[...]

Ian T. Jolliffe¹, Nickolay T. Trendafilov², Mudassir Uddin•Institutions (2)

University of Aberdeen¹, University of the West of England²

01 Sep 2003-Journal of Computational and Graphical Statistics

TL;DR: The least absolute shrinkage and selection approach (LASSO) as mentioned in this paper is a technique for interpreting multiple regression equations, which is based on principal component analysis (PCA) in the context of multiple regression.

...read moreread less

Abstract: In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difficult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefficients on all p variables. A common approach is to effectively ignore (treat as zero) any coefficients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading. There are alternatives to principal component analysis which restrict the coefficients to a smaller number of possible values in the derivation of the linear functions, or replace the principal components by “principal variables.” This article introduces a new technique, borrowing an idea proposed by Tibshirani in the context of multiple regression where similar problems arise in interpreting regression equations. This approach is the so-called LASSO, the “least absolute shrinkage and selection o...

...read moreread less

841 citations

Journal Article•DOI•

Face recognition using kernel direct discriminant analysis algorithms

[...]

Juwei Lu¹, Konstantinos N. Plataniotis¹, Anastasios N. Venetsanopoulos¹•Institutions (1)

University of Toronto¹

01 Jan 2003-IEEE Transactions on Neural Networks

TL;DR: This paper proposes a kernel machine-based discriminant analysis method, which deals with the nonlinearity of the face patterns' distribution and effectively solves the so-called "small sample size" (SSS) problem, which exists in most FR tasks.

...read moreread less

Abstract: Techniques that can introduce low-dimensional feature representation with enhanced discriminatory power is of paramount importance in face recognition (FR) systems. It is well known that the distribution of face images, under a perceivable variation in viewpoint, illumination or facial expression, is highly nonlinear and complex. It is, therefore, not surprising that linear techniques, such as those based on principle component analysis (PCA) or linear discriminant analysis (LDA), cannot provide reliable and robust solutions to those FR problems with complex face variations. In this paper, we propose a kernel machine-based discriminant analysis method, which deals with the nonlinearity of the face patterns' distribution. The proposed method also effectively solves the so-called "small sample size" (SSS) problem, which exists in most FR tasks. The new algorithm has been tested, in terms of classification error rate performance, on the multiview UMIST face database. Results indicate that the proposed methodology is able to achieve excellent performance with only a very small set of features being used, and its error rate is approximately 34% and 48% of those of two other commonly used kernel FR approaches, the kernel-PCA (KPCA) and the generalized discriminant analysis (GDA), respectively.

...read moreread less

651 citations

Proceedings Article•

A Novel Anomaly Detection Scheme Based on Principal Component Classifier

[...]

Mei-Ling Shyu¹, Shu-Ching Chen², Kanoksri Sarinnapakorn¹, Liwu Chang³•Institutions (3)

University of Miami¹, Florida International University², United States Department of the Navy³

01 Jan 2003

TL;DR: A novel scheme that uses robust principal component classifier in intrusion detection problems where the training data may be unsupervised and outperforms the nearest neighbor method, density-based local outliers (LOF) approach, and the outlier detection algorithm based on Canberra metric is proposed.

...read moreread less

Abstract: : This paper proposes a novel scheme that uses robust principal component classifier in intrusion detection problems where the training data may be unsupervised Assuming that anomalies can be treated as outliers, an intrusion predictive model is constructed from the major and minor principal components of the normal instances A measure of the difference of an anomaly from the normal instance is the distance in the principal component space The distance based on the major components that account for 50% of the total variation and the minor components whose eigenvalues less than 020 is shown to work well The experiments with KDD Cup 1999 data demonstrate that the proposed method achieves 9894% in recall and 9789% in precision with the false alarm rate 092% and outperforms the nearest neighbor method, density-based local outliers (LOF) approach, and the outlier detection algorithm based on Canberra metric

...read moreread less

574 citations

Journal Article•DOI•

Discarding or downweighting high-noise variables in factor analytic models

[...]

Pentti Paatero¹, Philip K. Hopke²•Institutions (2)

University of Helsinki¹, Clarkson University²

25 Aug 2003-Analytica Chimica Acta

TL;DR: In this paper, the authors examined the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables) and found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level.

...read moreread less

554 citations

Journal Article•DOI•

A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine

[...]

Cao Lijuan¹, Kok Seng Chua², W. K. Chong², Heow Pueh Lee¹, Q. M. Gu - Show less +1 more•Institutions (2)

Institute of High Performance Computing Singapore¹, National University of Singapore²

01 Sep 2003-Neurocomputing

TL;DR: The experiment shows that SVM by feature extraction using PCA, KPCA or ICA can perform better than that without feature extraction, and among the three methods, there is the best performance in K PCA feature extraction; followed by ICA feature extraction.

...read moreread less

524 citations

Journal Article•DOI•

Candid covariance-free incremental principal component analysis

[...]

Juyang Weng¹, Yilu Zhang¹, Wey-Shiuan Hwang¹•Institutions (1)

Michigan State University¹

01 Aug 2003-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariances-free).

...read moreread less

Abstract: Appearance-based image analysis techniques require fast computation of principal components of high-dimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariance-free). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some well known distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for real-time applications and, thus, it does not allow iterations. It converges very fast for high-dimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed.

...read moreread less

479 citations

Journal Article•DOI•

Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach

[...]

Miguel Pérez-Enciso¹, Michel Tenenhaus²•Institutions (2)

Institut national de la recherche agronomique¹, HEC Paris²

27 Feb 2003-Human Genetics

TL;DR: The performance of PLS-DA with published data from breast cancer is found to be extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation.

...read moreread less

Abstract: Partial least squares discriminant analysis (PLS-DA) is a partial least squares regression of a set Y of binary variables describing the categories of a categorical variable on a set X of predictor variables. It is a compromise between the usual discriminant analysis and a discriminant analysis on the significant principal components of the predictor variables. This technique is specially suited to deal with a much larger number of predictors than observations and with multicollineality, two of the main problems encountered when analysing microarray expression data. We explore the performance of PLS-DA with published data from breast cancer (Perou et al. 2000). Several such analyses were carried out: (1) before vs after chemotherapy treatment, (2) estrogen receptor positive vs negative tumours, and (3) tumour classification. We found that the performance of PLS-DA was extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation. We conclude that PLS-DA is a powerful yet simple tool for analysing microarray data.

...read moreread less

Journal Article•DOI•

Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis

[...]

Pedro R. Peres-Neto¹, Donald A. Jackson¹, Keith M. Somers¹•Institutions (1)

University of Toronto¹

01 Sep 2003-Ecology

TL;DR: In this paper, the authors compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power, and two novel approaches based on the broken-stick model were also evaluated.

...read moreread less

Abstract: Principal component analysis (PCA) is one of the most commonly used tools in the analysis of ecological data. This method reduces the effective dimensionality of a multivariate data set by producing linear combinations of the original variables (i.e., com- ponents) that summarize the predominant patterns in the data. In order to provide meaningful interpretations for principal components, it is important to determine which variables are associated with particular components. Some data analysts incorrectly test the statistical significance of the correlation between original variables and multivariate scores using standard statistical tables. Others interpret eigenvector coefficients larger than an arbitrary absolute value (e.g., 0.50). Resampling, randomization techniques, and parallel analysis have been applied in a few cases. In this study, we compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power. Two novel approaches based on the broken-stick model were also evaluated. We used a variety of simulated scenarios to examine the influence of the number of real dimensions in the data; unique versus complex variables; the magnitude of eigen- vector coefficients; and the number of variables associated with a particular dimension. Our results revealed that bootstrap confidence intervals and a modified bootstrap confidence interval for the broken-stick model proved to be the most reliable techniques.

...read moreread less

Journal Article•DOI•

Automated analysis of SEM X-ray spectral images: a powerful new microanalysis tool.

[...]

Paul G. Kotula¹, Michael R. Keenan¹, J. R. Michael¹•Institutions (1)

Sandia National Laboratories¹

01 Feb 2003-Microscopy and Microanalysis

TL;DR: The application of a new automated, unbiased, multivariate statistical analysis technique to very large X-ray spectral image data sets, based in part on principal components analysis, returns physically accurate component spectra and images in a few minutes on a standard personal computer.

...read moreread less

Abstract: Spectral imaging in the scanning electron microscope (SEM) equipped with an energy-dispersive X-ray (EDX) analyzer has the potential to be a powerful tool for chemical phase identification, but the large data sets have, in the past, proved too large to efficiently analyze. In the present work, we describe the application of a new automated, unbiased, multivariate statistical analysis technique to very large X-ray spectral image data sets. The method, based in part on principal components analysis, returns physically accurate (all positive) component spectra and images in a few minutes on a standard personal computer. The efficacy of the technique for microanalysis is illustrated by the analysis of complex multi-phase materials, particulates, a diffusion couple, and a single-pixel-detection problem.

...read moreread less

Book Chapter•DOI•

Detecting stable clusters using principal component analysis.

[...]

Asa Ben-Hur¹, Isabelle Guyon•Institutions (1)

Stanford University¹

01 Jan 2003-Methods of Molecular Biology

TL;DR: This chapter extends the stability-based validation of cluster structure, and proposes stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices.

...read moreread less

Abstract: Clustering is one of the most commonly used tools in the analysis of gene expression data (1, 2) . The usage in grouping genes is based on the premise that co-expression is a result of co-regulation. It is thus a preliminary step in extracting gene networks and inference of gene function (3, 4) . Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3, 5, 6) , including sensitivity to drugs (7) , and can also detect artifacts of experimental conditions (8) . Clustering and its applications in biology are presented in greater detail in the chapter by Zhao and Karypis (see also (9) ). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well. Clustering is a form of unsupervised learning, i.e. no information on the class variable is assumed, and the objective is to find the “natural” groups in the data. However, most clustering algorithms generate a clustering even if the data has no inherent cluster structure, so external validation tools are required. Given a set of partitions of the data into an increasing number of clusters (e.g. by a hierarchical clustering algorithm, or k-means), such a validation tool will tell the user the number of clusters in the data (if any). Many methods have been proposed in the literature to address this problem (10–15) . Recent studies have shown the advantages of sampling-based methods (12, 14) . These methods are based on the idea that when a partition has captured the structure in the data, this partition should be stable with respect to perturbation of the data. Bittner et al. (16) used a similar approach to validate clusters representing gene expression of melanoma patients. The emergence of cluster structure depends on several choices: data representation and normalization, the choice of a similarity measure and clustering algorithm. In this chapter we extend the stability-based validation of cluster structure, and propose stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices. We use this framework to demonstrate the ability of Principal Component Analysis (PCA) to extract features relevant to the cluster structure. We use stability as a tool for simultaneously choosing the number of principal components and the number of clusters; we compare the performance of different similarity measures and normalization schemes. The approach is demonstrated through a case study of yeast gene expression data from Eisen et al. (1) . For yeast, a functional classification of a large number of genes is known, and we use this classification for validating the results produced by clustering. A method for comparing clustering solutions specifically applicable to gene expression data was introduced in (17) . However, it cannot be used to choose the number of clusters, and is not directly applicable in choosing the number of principal components. The results of clustering are easily corrupted by the addition of noise: even a few

...read moreread less

Journal Article•DOI•

Application of independent component analysis to microarrays

[...]

Su-In Lee¹, Serafim Batzoglou¹•Institutions (1)

Stanford University¹

24 Oct 2003-Genome Biology

TL;DR: ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

...read moreread less

Abstract: We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

...read moreread less

Journal Article•DOI•

Using principal component analysis to monitor spatial and temporal changes in water quality.

[...]

Karim Bengraı̈ne¹, Taha F. Marhaba¹•Institutions (1)

New Jersey Institute of Technology¹

27 Jun 2003-Journal of Hazardous Materials

TL;DR: This study shows the importance of environmental monitoring associated with simple but powerful statistics to better understand a complex water system.

...read moreread less

Journal Article•DOI•

Optimizing PCA methodology for ERP component identification and measurement: theoretical rationale and empirical evaluation

[...]

Jürgen Kayser, Craig E. Tenke

01 Dec 2003-Clinical Neurophysiology

TL;DR: Unrestricted, unstandardized covariance-based PCA solutions optimize ERP component identification and measurement and Interpretability (more distinctive component waveforms with narrow and unambiguous loading peaks) and statistical conclusions (greater effect stability across extraction criteria) were best for un standardized covariances-based solutions.

...read moreread less

Journal Article•DOI•

Algorithms for nonnegative independent component analysis

[...]

Mark D. Plumbley¹•Institutions (1)

Queen Mary University of London¹

01 May 2003-IEEE Transactions on Neural Networks

TL;DR: It is sufficient to find the orthonormal rotation y=Wz of prewhitened sources z=Vx, which minimizes the mean squared error of the reconstruction of z from the rectified version y/sup +/ of y, which shows in particular the fast convergence of the rotation and geodesic methods.

...read moreread less

Abstract: We consider the task of solving the independent component analysis (ICA) problem x=As given observations x, with a constraint of nonnegativity of the source random vector s. We refer to this as nonnegative independent component analysis and we consider methods for solving this task. For independent sources with nonzero probability density function (pdf) p(s) down to s=0 it is sufficient to find the orthonormal rotation y=Wz of prewhitened sources z=Vx, which minimizes the mean squared error of the reconstruction of z from the rectified version y/sup +/ of y. We suggest some algorithms which perform this, both based on a nonlinear principal component analysis (PCA) approach and on a geodesic search method driven by differential geometry considerations. We demonstrate the operation of these algorithms on an image separation problem, which shows in particular the fast convergence of the rotation and geodesic methods and apply the approach to a musical audio analysis task.

...read moreread less

Journal Article•DOI•

Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition

[...]

Xuechuan Wang¹, Kuldip K. Paliwal¹•Institutions (1)

Griffith University¹

01 Oct 2003-Pattern Recognition

TL;DR: The minimum classification error (MCE) training algorithm (which was originally proposed for optimizing classifiers) is investigated for feature extraction and a generalized MCE (GMCE)Training algorithm is proposed to mend the shortcomings of the MCE training algorithm.

...read moreread less

Proceedings Article•DOI•

Learning a locality preserving subspace for visual recognition

[...]

Xiaofei He¹, Shuicheng Yan², Yuxiao Hu³, Hong-Jiang Zhang³•Institutions (3)

University of Chicago¹, Peking University², Microsoft³

13 Oct 2003

TL;DR: This work proposes a new approach to mapping face images into a subspace obtained by locality preserving projections (LPP) for face analysis, which provides a better representation and achieves lower error rates in face recognition.

...read moreread less

Abstract: We have demonstrated that the face recognition performance can be improved significantly in low dimensional linear subspaces. Conventionally, principal component analysis (PCA) and linear discriminant analysis (LDA) are considered effective in deriving such a face subspace. However, both of them effectively see only the Euclidean structure of face space. We propose a new approach to mapping face images into a subspace obtained by locality preserving projections (LPP) for face analysis. We call this Laplacianface approach. Different from PCA and LDA, LPP finds an embedding that preserves local information, and obtains a face space that best detects the essential manifold structure. In this way, the unwanted variations resulting from changes in lighting, facial expression, and pose may be eliminated or reduced. We compare the proposed Laplacianface approach with eigenface and fisherface methods on three test datasets. Experimental results show that the proposed Laplacianface approach provides a better representation and achieves lower error rates in face recognition.

...read moreread less

Journal Article•DOI•

Automatic reduction of hyperspectral imagery using wavelet spectral analysis

[...]

Sinthop Kaewpijit¹, J. Le Moigne², Tarek El-Ghazawi¹•Institutions (2)

George Washington University¹, Goddard Space Flight Center²

05 Jun 2003-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: It is shown that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.

...read moreread less

Abstract: Hyperspectral imagery provides richer information about materials than multispectral imagery. The new larger data volumes from hyperspectral sensors present a challenge for traditional processing techniques. For example, the identification of each ground surface pixel by its corresponding spectral signature is still difficult because of the immense volume of data. Conventional classification methods may not be used without dimension reduction preprocessing. This is due to the curse of dimensionality, which refers to the fact that the sample size needed to estimate a function of several variables to a given degree of accuracy grows exponentially with the number of variables. Principal component analysis (PCA) has been the technique of choice for dimension reduction. However, PCA is computationally expensive and does not eliminate anomalies that can be seen at one arbitrary band. Spectral data reduction using automatic wavelet decomposition could be useful. This is because it preserves the distinctions among spectral signatures. It is also computed in automatic fashion and can filter data anomalies. This is due to the intrinsic properties of wavelet transforms that preserves high- and low-frequency features, therefore preserving peaks and valleys found in typical spectra. Compared to PCA, for the same level of data reduction, we show that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.

...read moreread less

Journal Article•DOI•

Use of depth and colour eigenfaces for face recognition

[...]

F. Tsalakanidou¹, Dimitrios Tzovaras, Michael G. Strintzis¹•Institutions (1)

Aristotle University of Thessaloniki¹

01 Jun 2003-Pattern Recognition Letters

TL;DR: The proposed face recognition technique is based on the implementation of the principal component analysis algorithm and the extraction of depth and colour eigenfaces and Experimental results show significant gains attained with the addition of depth information.

...read moreread less

Journal Article•DOI•

Face recognition using independent component analysis and support vector machines

[...]

Oscar Deniz¹, Modesto Castrillón¹, Mario Hernández¹•Institutions (1)

University of Las Palmas de Gran Canaria¹

01 Sep 2003-Pattern Recognition Letters

TL;DR: This result indicates that the best practical combination is PCA with SVM for face recognition, as the training time for ICA is much larger than that of PCA.

...read moreread less

Journal Article•DOI•

Introducing a weighted non-negative matrix factorization for image classification

[...]

David Guillamet¹, Jordi Vitrià¹, Bernt Schiele²•Institutions (2)

Autonomous University of Barcelona¹, ETH Zurich²

01 Oct 2003-Pattern Recognition Letters

TL;DR: A comparison between NMF, WNMF and the well-known principal component analysis (PCA) in the context of image patch classification has been carried out and it is claimed that all three techniques can be combined in a common and unique classifier.

...read moreread less

Journal Article•DOI•

Robust factor analysis

[...]

Greet Pison, Peter J. Rousseeuw, Peter Filzmoser¹, Christophe Croux²•Institutions (2)

Vienna University of Technology¹, Katholieke Universiteit Leuven²

01 Jan 2003-Journal of Multivariate Analysis

TL;DR: In this paper, a factor analysis method that can resist the effect of outliers is proposed. But the method is based on a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA).

...read moreread less

Proceedings Article•DOI•

Quaternion principal component analysis of color images

[...]

N. Le Bihan¹, Stephen J. Sangwine•Institutions (1)

École nationale supérieure d'ingénieurs électriciens de Grenoble¹

24 Nov 2003

TL;DR: This paper presents quaternion matrix algebra techniques that can be used to process the eigen analysis of a color image and introduces the extension of two classical techniques to their quaternionic case: singular value decomposition (SVD) and Karhunen-Loeve transform (KLT).

...read moreread less

Abstract: In this paper, we present quaternion matrix algebra techniques that can be used to process the eigen analysis of a color image. Applications of principal component analysis (PCA) in image processing are numerous, and the proposed tools aim to give material for color image processing, that take into account their particular nature. For this purpose, we use the quaternion model for color images and introduce the extension of two classical techniques to their quaternionic case: singular value decomposition (SVD) and Karhunen-Loeve transform (KLT). For the quaternionic version of the KLT, we also introduce the problem of eigenvalue decomposition (EVD) of a quaternion matrix. We give the properties of these quaternion tools for color images and present their behavior on natural images. We also present a method to compute the decompositions using complex matrix algebra. Finally, we start a discussion on possible applications of the proposed techniques in color images processing.

...read moreread less

Proceedings Article•

A Generalized Linear Model for Principal Component Analysis of Binary Data

[...]

Andrew I. Schein¹, Lawrence K. Saul¹, Lyle H. Ungar¹•Institutions (1)

University of Pennsylvania¹

03 Jan 2003

TL;DR: An alternating least squares method is derived to estimate the basis vectors and generalized linear coefficients of the logistic PCA model, a generalized linear model for dimensionality reduction of binary data that is related to principal component analysis (PCA) and is much better suited to modeling binary data than conventional PCA.

...read moreread less

Abstract: We investigate a generalized linear model for dimensionality reduction of binary data. The model is related to principal component analysis (PCA) in the same way that logistic regression is related to linear regression. Thus we refer to the model as logistic PCA. In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model. The resulting updates have a simple closed form and are guaranteed at each iteration to improve the model’s likelihood. We evaluate the performance of logistic PCA—as measured by reconstruction error rates—on data sets drawn from four real world applications. In general, we find that logistic PCA is much better suited to modeling binary data than conventional PCA.

...read moreread less

Proceedings Article•DOI•

Discovery of climate indices using clustering

[...]

Michael Steinbach¹, Pang-Ning Tan¹, Vipin Kumar¹, Steven Klooster², Christopher Potter³ - Show less +1 more•Institutions (3)

University of Minnesota¹, California State University, Monterey Bay², Ames Research Center³

24 Aug 2003

TL;DR: This paper presents an alternative clustering-based methodology for the discovery of climate indices that overcomes limitiations and is based on clusters that represent regions with relatively homogeneous behavior, and shows that cluster based indices generally outperform SVD derived indices.

...read moreread less

Abstract: To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth's oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eigenvalue analysis techniques, such as principal components analysis (PCA) and singular value decomposition (SVD), to discover climate indices. However, eigenvalue techniques are only useful for finding a few of the strongest signals. Furthermore, they impose a condition that all discovered signals must be orthogonal to each other, making it difficult to attach a physical interpretation to them. This paper presents an alternative clustering-based methodology for the discovery of climate indices that overcomes these limitiations and is based on clusters that represent regions with relatively homogeneous behavior. The centroids of these clusters are time series that summarize the behavior of the ocean or atmosphere in those regions. Some of these centroids correspond to known climate indices and provide a validation of our methodology; other centroids are variants of known indices that may provide better predictive power for some land areas; and still other indices may represent potentially new Earth science phenomena. Finally, we show that cluster based indices generally outperform SVD derived indices, both in terms of area weighted correlation and direct correlation with the known indices.

...read moreread less

Journal Article•DOI•

Principal-components-based display strategy for spectral imagery

[...]

J.S. Tyo¹, A. Konsolakis², D.I. Diersen², Richard C. Olsen²•Institutions (2)

University of New Mexico¹, Naval Postgraduate School²

13 May 2003-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A supervised method for locating the origin of the cone based on identification of clusters in the data is presented, and the effects of proper origin orientation are illustrated.

...read moreread less

Abstract: A new pseudocolor mapping strategy for use with spectral imagery is presented. This strategy is based on a principal components analysis of spectral data, and it capitalizes on the similarities between three-color human vision and high-dimensional hyperspectral datasets. The mapping is closely related to three-dimensional versions of scatter plots that are commonly used in remote sensing to visualize the data cloud. The transformation results in final images where the color assigned to each pixel is solely determined by the position within the data cloud. Materials with similar spectral characteristics are presented in similar hues, and basic classification and clustering decisions can be made by the observer. Final images tend to have large regions of desaturated pixels that make the image more readily interpretable. The data cloud is shown to be conical in nature, and materials with common spectral signatures radiate from the origin of the cone, which is not (in general) at the origin of the spectral data. A supervised method for locating the origin of the cone based on identification of clusters in the data is presented, and the effects of proper origin orientation are illustrated.

...read moreread less

Journal Article•DOI•

Comparison of multiexcitation fluorescence and diffuse reflectance spectroscopy for the diagnosis of breast cancer (March 2003)

[...]

Gregory M. Palmer¹, Changfang Zhu¹, Tara M. Breslin¹, Fushen Xu¹, Kennedy W. Gilchrist¹, Nirmala Ramanujam¹ - Show less +2 more•Institutions (1)

University of Wisconsin-Madison¹

20 Oct 2003-IEEE Transactions on Biomedical Engineering

TL;DR: The fluorescence excitation-emission wavelengths identified as being diagnostic from the PCA-SVM algorithm suggest that the important fluorophores for breast cancer diagnosis are most likely tryptophan, NAD(P)H and flavoproteins.

...read moreread less

Abstract: Nonmalignant (n = 36) and malignant (n = 20) tissue samples were obtained from breast cancer and breast reduction surgeries. These tissues were characterized using multiple excitation wavelength fluorescence spectroscopy and diffuse reflectance spectroscopy in the ultraviolet-visible wavelength range, immediately after excision. Spectra were then analyzed using principal component analysis (PCA) as a data reduction technique. PCA was performed on each fluorescence spectrum, as well as on the diffuse reflectance spectrum individually, to establish a set of principal components for each spectrum. A Wilcoxon rank-sum test was used to determine which principal components show statistically significant differences between malignant and nonmalignant tissues. Finally, a support vector machine (SVM) algorithm was utilized to classify the samples based on the diagnostically useful principal components. Cross-validation of this nonparametric algorithm was carried out to determine its classification accuracy in an unbiased manner. Multiexcitation fluorescence spectroscopy was successful in discriminating malignant and nonmalignant tissues, with a sensitivity and specificity of 70% and 92%, respectively. The sensitivity (30%) and specificity (78%) of diffuse reflectance spectroscopy alone was significantly lower. Combining fluorescence and diffuse reflectance spectra did not improve the classification accuracy of an algorithm based on fluorescence spectra alone. The fluorescence excitation-emission wavelengths identified as being diagnostic from the PCA-SVM algorithm suggest that the important fluorophores for breast cancer diagnosis are most likely tryptophan, NAD(P)H and flavoproteins.

...read moreread less

Collapse