Showing papers on "Principal component analysis published in 1999"

PDF

Open Access

Journal Article•DOI•

Probabilistic Principal Component Analysis

[...]

Michael E. Tipping¹, Christopher M. Bishop¹•Institutions (1)

01 Jan 1999-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this paper, the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis.

...read moreread less

Abstract: Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

...read moreread less

3,362 citations

Journal Article•DOI•

Distance‐based redundancy analysis: testing multispecies responses in multifactorial ecological experiments

[...]

Pierre Legendre¹, Marti J. Anderson²•Institutions (2)

Université de Montréal¹, University of Sydney²

01 Feb 1999-Ecological Monographs

TL;DR: It is the view that distance-based RDA will be extremely useful to ecologists measuring multispecies responses to structured multifactorial experimental designs.

...read moreread less

Abstract: We present a new multivariate technique for testing the significance of individual terms in a multifactorial analysis-of-variance model for multispecies response variables. The technique will allow researchers to base analyses on measures of association (distance measures) that are ecologically relevant. In addition, unlike other distance-based hypothesis-testing techniques, this method allows tests of significance of interaction terms in a linear model. The technique uses the existing method of redundancy analysis (RDA) but allows the analysis to be based on Bray-Curtis or other ecologically meaningful measures through the use of principal coordinate analysis (PCoA). Steps in the procedure include: (1) calculating a matrix of distances among replicates using a distance measure of choice (e.g., Bray-Curtis); (2) determining the principal coordinates (including a correction for negative eigenvalues, if necessary), which preserve these distances; (3) creating a matrix of dummy variables corresponding to the design of the experiment (i.e., individual terms in a linear model); (4) analyzing the relationship between the principal coordinates (species data) and the dummy variables (model) using RDA; and (5) implementing a test by permutation for particular statistics corresponding to the particular terms in the model. This method has certain advantages not shared by other multivariate testing procedures. We demonstrate the use of this technique with experimental ecological data from intertidal assemblages and show how the presence of significant multivariate interactions can be interpreted. It is our view that distance-based RDA will be extremely useful to ecologists measuring multispecies responses to structured multifactorial experimental designs.

...read moreread less

2,193 citations

Journal Article•DOI•

Mixtures of probabilistic principal component analyzers

[...]

Michael E. Tipping¹, Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Feb 1999-Neural Computation

TL;DR: PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model, which leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm.

...read moreread less

Abstract: Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectationmaximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

...read moreread less

1,927 citations

Journal Article•DOI•

Classifying facial actions

[...]

Gianluca Donato, Marian Stewart Bartlett¹, Joseph C. Hager, Paul Ekman², Terrence J. Sejnowski¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, University of California, San Francisco²

01 Oct 1999-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper explores and compares techniques for automatically recognizing facial actions in sequences of images and provides converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.

...read moreread less

Abstract: The facial action coding system (FAGS) is an objective method for quantifying facial movement in terms of component actions. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include: analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.

...read moreread less

1,086 citations

Proceedings Article•DOI•

Principal components analysis to summarize microarray experiments: application to sporulation time series

[...]

Soumya Raychaudhuri¹, Joshua M. Stuart, Russ B. Altman•Institutions (1)

Stanford University¹

01 Dec 1999

TL;DR: This work shows that application of PCA to expression data allows us to summarize the ways in which gene responses vary under different conditions, and suggests that much of the observed variability in the experiment can be summarized in just 2 components.

...read moreread less

Abstract: A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through different mechanisms. It is useful, therefore, to define a core set of independent features for the expression states that allow them to be compared directly. Principal components analysis (PCA) is a statistical technique for determining the key variables in a multidimensional data set that explain the differences in the observations, and can be used to simplify the analysis and visualization of multidimensional data sets. We show that application of PCA to expression data (where the experimental conditions are the variables, and the gene expression measurements are the observations) allows us to summarize the ways in which gene responses vary under different conditions. Examination of the components also provides insight into the underlying factors that are measured in the experiments. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components--i.e. 2 variables capture most of the information. These components appear to represent (1) overall induction level and (2) change in induction level over time. We also examined the clusters proposed in the original paper, and show how they are manifested in principal component space. Our results are available on the internet at http:?www.smi.stanford.edu/project/helix/PCArray .

...read moreread less

815 citations

Journal Article•DOI•

Principal response curves: Analysis of time-dependent multivariate responses of biological community to stress

[...]

Paul J. Van den Brink, Cajo J. F. ter Braak

01 Feb 1999-Environmental Toxicology and Chemistry

TL;DR: The principal response curve method (PRC) as discussed by the authors is based on redundancy analysis (RDA), adjusted for overall changes in community response over time, as observed in control test systems.

...read moreread less

Abstract: In this paper a novel multivariate method is proposed for the analysis of community response data from designed experiments repeatedly sampled in time. The long-term effects of the insecticide chlorpyrifos on the invertebrate community and the dissolved oxygen (DO)–pH–alkalinity–conductivity syndrome, in outdoor experimental ditches, are used as example data. The new method, which we have named the principal response curve method (PRC), is based on redundancy analysis (RDA), adjusted for overall changes in community response over time, as observed in control test systems. This allows the method to focus on the time-dependent treatment effects. The principal component is plotted against time, yielding a principal response curve of the community for each treatment. The PRC method distills the complexity of time-dependent, community-level effects of pollutants into a graphic form that can be appreciated more readily than the results of other currently available multivariate techniques. The PRC method also enables a quantitative interpretation of effects towards the species level.

...read moreread less

757 citations

Journal Article•DOI•

The Multilinear Engine—A Table-Driven, Least Squares Program for Solving Multilinear Problems, Including the n-Way Parallel Factor Analysis Model

[...]

Pentti Paatero¹•Institutions (1)

University of Helsinki¹

01 Dec 1999-Journal of Computational and Graphical Statistics

TL;DR: This work presents a technique for specifying the problem in a structured way so that one program (the Multilinear Engine) may be used for solving widely different multilInear problems.

...read moreread less

Abstract: A technique for fitting multilinear and quasi-multilinear mathematical expressions or models to two-, three-, and many-dimensional data arrays is described. Principal component analysis and three-way PARAFAC factor analysis are examples of bilinear and trilinear least squares fit. This work presents a technique for specifying the problem in a structured way so that one program (the Multilinear Engine) may be used for solving widely different multilinear problems. The multilinear equations to be solved are specified as a large table of integer code values. The end user creates this table by using a small preprocessing program. For each different case, an individual structure table is needed. The solution is computed by using the conjugate gradient algorithm. Non-negativity constraints are implemented by using the well-known technique of preconditioning in opposite way for slowing down changes of variables that are about to become negative. The iteration converges to a minimum that may be local or ...

...read moreread less

743 citations

Journal Article•DOI•

Independent factor analysis

[...]

Hagai Attias¹•Institutions (1)

University of California, San Francisco¹

15 May 1999-Neural Computation

TL;DR: An expectation-maximization (EM) algorithm is presented, which performs unsupervised learning of an associated probabilistic model of the mixing situation and is shown to be superior to ICA since it can learn arbitrary source densities from the data.

...read moreread less

Abstract: We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square noiseless mixing but also the general case where the number of mixtures differs from the number of sources and the data are noisy. IFA is a two-step procedure. In the first step, the source densities, mixing matrix, and noise covariance are estimated from the observed data by maximum likelihood. For this purpose we present an expectation-maximization (EM) algorithm, which performs unsupervised learning of an associated probabilistic model of the mixing situation. Each source in our model is described by a mixture of gaussians; thus, all the probabilistic calculations can be performed analytically. In the second step, the sources are reconstructed from the observed data by an optimal nonlinear estimator. A variational approximation of this algorithm is derived for cases with a large number of sources, where the exact algorithm becomes intractable. Our IFA algorithm reduces to the one for ordinary FA when the sources become gaussian, and to an EM algorithm for PCA in the zero-noise limit. We derive an additional EM algorithm specifically for noiseless IFA. This algorithm is shown to be superior to ICA since it can learn arbitrary source densities from the data. Beyond blind separation, IFA can be used for modeling multidimensional data by a highly constrained mixture of gaussians and as a tool for nonlinear signal encoding.

...read moreread less

573 citations

Journal Article•DOI•

A joint band prioritization and band-decorrelation approach to band selection for hyperspectral image classification

[...]

Chein-I Chang¹, Qian Du², Tzu-Lung Sun², M.L.G. Althouse²•Institutions (2)

University of Baltimore¹, University of Maryland, Baltimore County²

01 Nov 1999-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A joint band-prioritization and band-decorrelation approach to band selection is considered for hyperspectral image classification and it is shown that the proposed band-selection method effectively eliminates a great number of insignificant bands.

...read moreread less

Abstract: Band selection for remotely sensed image data is an effective means to mitigate the curse of dimensionality. Many criteria have been suggested in the past for optimal band selection. In this paper, a joint band-prioritization and band-decorrelation approach to band selection is considered for hyperspectral image classification. The proposed band prioritization is a method based on the eigen (spectral) decomposition of a matrix from which a loading-factors matrix can be constructed for band prioritization via the corresponding eigenvalues and eigenvectors. Two approaches are presented, principal components analysis (PCA)-based criteria and classification-based criteria. The former includes the maximum-variance PCA and maximum SNR PCA, whereas the latter derives the minimum misclassification canonical analysis (MMCA) (i.e., Fisher's discriminant analysis) and subspace projection-based criteria. Since the band prioritization does not take spectral correlation into account, an information-theoretic criterion called divergence is used for band decorrelation. Finally, the band selection can then be done by an eigenanalysis based band prioritization in conjunction with a divergence-based band decorrelation. It is shown that the proposed band-selection method effectively eliminates a great number of insignificant bands. Surprisingly, the experiments show that with a proper band selection, less than 0.1 of the total number of bands can achieve comparable performance using the number of full bands. This further demonstrates that the band selection can significantly reduce data volume so as to achieve data compression.

...read moreread less

565 citations

Journal Article•DOI•

Selection of the Number of Principal Components: The Variance of the Reconstruction Error Criterion with a Comparison to Other Methods†

[...]

Sergio Valle¹, and Weihua Li¹, S. Joe Qin¹•Institutions (1)

University of Texas at Austin¹

30 Sep 1999-Industrial & Engineering Chemistry Research

TL;DR: A method based on the variance of the reconstruction error to select the number of PCs is presented and a minimum is demonstrated under which this minimum corresponds to the true number of PC.

...read moreread less

Abstract: One of the main difficulties in using principal component analysis (PCA) is the selection of the number of principal components (PCs). There exist a plethora of methods to calculate the number of PCs, but most of them use monotonically increasing or decreasing indices. Therefore, the decision to choose the number of principal components is very subjective. In this paper, we present a method based on the variance of the reconstruction error to select the number of PCs. This method demonstrates a minimum over the number of PCs. Conditions are given under which this minimum corresponds to the true number of PCs. Ten other methods available in the signal processing and chemometrics literature are overviewed and compared with the proposed method. Three data sets are used to test the different methods for selecting the number of PCs: two of them are real process data and the other one is a batch reactor simulation.

...read moreread less

509 citations

Book•

Principal component models for sparse functional data

[...]

Gareth M. James, Trevor Hastie, Catherine A. Sugar

01 Jan 1999

TL;DR: In this article, a reduced rank mixed effects framework is proposed to handle the more difficult case where curves are often measured at an irregular and sparse set of time points which can differ widely across individuals.

...read moreread less

Abstract: SUMMARY The elements of a multivariate data set are often curves rather than single points. Functional principal components can be used to describe the modes of variation of such curves. If one has complete measurements for each individual curve or, as is more common, one has measurements on a fine grid taken at the same time points for all curves, then many standard techniques may be applied. However, curves are often measured at an irregular and sparse set of time points which can differ widely across individuals. We present a technique for handling this more difficult case using a reduced rank mixed effects framework.

...read moreread less

Proceedings Article•DOI•

Kernel principal component analysis

[...]

Bernhard Schölkopf¹, Alexander J. Smola, Klaus-Robert Müller•Institutions (1)

Max Planck Society¹

08 Feb 1999

TL;DR: In this paper, a nonlinear form of principal component analysis (PCA) is proposed to perform polynomial feature extraction in high-dimensional feature spaces, related to input space by some nonlinear map; for instance, the space of all possible d-pixel products in images.

...read moreread less

Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

...read moreread less

Journal Article•DOI•

Segmented principal components transformation for efficient hyperspectral remote-sensing image display and classification

[...]

Xiuping Jia¹, John A. Richards¹•Institutions (1)

University of New South Wales¹

01 Jan 1999-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A segmented, and possibly multistage, principal components transformation (PCT) is proposed for efficient hyperspectral remote-sensing image classification and display and results have been obtained in terms of classification accuracy, speed, and quality of color image display using two airborne visible/infrared imaging spectrometer (AVIRIS) data sets.

...read moreread less

Abstract: A segmented, and possibly multistage, principal components transformation (PCT) is proposed for efficient hyperspectral remote-sensing image classification and display. The scheme requires, initially, partitioning the complete set of bands into several highly correlated subgroups. After separate transformation of each subgroup, the single-band separabilities are used as a guide to carry out feature selection. The selected features can then be transformed again to achieve a satisfactory data reduction ratio and generate the three most significant components for color display. The scheme reduces the computational load significantly for feature extraction, compared with the conventional PCT. A reduced number of features will also accelerate the maximum likelihood classification process significantly, and the process will not suffer the limitations encountered by trying to use the full set of hyperspectral data when training samples are limited. Encouraging results have been obtained in terms of classification accuracy, speed, and quality of color image display using two airborne visible/infrared imaging spectrometer (AVIRIS) data sets.

...read moreread less

Journal Article•DOI•

Robust principal component analysis for functional data

[...]

Nicholas Locantore¹, James Stephen Marron¹, Douglas G. Simpson², N. Tripoli¹, Jin-Ting Zhang¹, K. L. Cohen¹, Graciela Boente³, Ricardo Fraiman³, Babette Brumback⁴, Christophe Croux⁵, Jianqing Fan⁶, Alois Kneip⁷, John I. Marden², Daniel Peña⁸, Javier Prieto⁸, James O. Ramsay⁹, Mariano J. Valderrama¹⁰, Ana M. Aguilera¹⁰ - Show less +14 more•Institutions (10)

University of North Carolina at Chapel Hill¹, University of Illinois at Urbana–Champaign², University of Buenos Aires³, Harvard University⁴, Université libre de Bruxelles⁵, University of California, Los Angeles⁶, Université catholique de Louvain⁷, Charles III University of Madrid⁸, McGill University⁹, University of Granada¹⁰

01 Jun 1999-Test

TL;DR: A method for exploring the structure of populations of complex objects, such as images, is considered, and endemic outliers motivate the development of a bounded influence approach to PCA.

...read moreread less

Abstract: A method for exploring the structure of populations of complex objects, such as images, is considered. The objects are summarized by feature vectors. The statistical backbone is Principal Component Analysis in the space of feature vectors. Visual insights come from representing the results in the original data space. In an ophthalmological example, endemic outliers motivate the development of a bounded influence approach to PCA.

...read moreread less

Proceedings Article•DOI•

Variational principal components

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 1999

TL;DR: This paper develops an alternative, variational formulation of Bayesian PCA, based on a factorial representation of the posterior distribution, which maximizes a rigorous lower bound on the marginal log probability of the observed data.

...read moreread less

Abstract: One of the central issues in the use of principal component analysis (PCA) for data modelling is that of choosing the appropriate number of retained components. This problem was recently addressed through the formulation of a Bayesian treatment of PCA in terms of a probabilistic latent variable model. A central feature of this approach is that the effective dimensionality of the latent space is determined automatically as part of the Bayesian inference procedure. In common with most non-trivial Bayesian models, however, the required marginalizations are analytically intractable, and so an approximation scheme based on a local Gaussian representation of the posterior distribution was employed. In this paper we develop an alternative, variational formulation of Bayesian PCA, based on a factorial representation of the posterior distribution. This approach is computationally efficient, and unlike other approximation schemes, it maximizes a rigorous lower bound on the marginal log probability of the observed data.

...read moreread less

Journal Article•DOI•

A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process

[...]

Barry M. Wise, Neal B. Gallagher, Stephanie W. Butler¹, Daniel D. White¹, Gabriel G. Barna¹ - Show less +1 more•Institutions (1)

Texas Instruments¹

01 May 1999-Journal of Chemometrics

TL;DR: Multivariate statistical process control tools have been developed for monitoring a Lam 9600 TCP metal etcher at Texas Instruments and the strengths and weaknesses of the methods are discussed, along with the relative advantages of each of the sensor systems.

...read moreread less

Abstract: Multivariate statistical process control (MSPC) tools have been developed for monitoring a Lam 9600 TCP metal etcher at Texas Instruments. These tools are used to determine if the etch process is operating normally or if a system fault has occurred. Application of these methods is complicated because the etch process data exhibit a large amount of normal systematic variation. Variations due to faults of process concern can be relatively minor in comparison. The Lam 9600 used in this study is equipped with several sensor systems including engineering variables (e.g. pressure, gas flow rates and power), spatially resolved optical emission spectroscopy (OES) of the plasma and a radio-frequency monitoring (RFM) system to monitor the power and phase relationships of the plasma generator. A variety of analysis methods and data preprocessing techniques have been tested for their sensitivity to specific system faults. These methods have been applied to data from each of the sensor systems separately and in combination. The performance of the methods on a set of benchmark fault detection problems is presented and the strengths and weaknesses of the methods are discussed, along with the relative advantages of each of the sensor systems. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Book Chapter•DOI•

Bayesian Denoising of Visual Images in the Wavelet Domain

[...]

Eero P. Simoncelli

01 Jan 1999

TL;DR: The most well-known description of image statistics is that their Fourier spectra take the form of a power law, which suggests that the Fourier transform is an appropriate PCA representation of Fourier and related representations, widely used in image processing applications.

...read moreread less

Abstract: The use of multi-scale decompositions has led to significant advances in representation, compression, restoration, analysis, and synthesis of signals. The fundamental reason for these advances is that the statistics of many natural signals, when decomposed in such bases, are substantially simplified. Choosing a basis that is adapted to statistical properties of the input signal is a classical problem. The traditional solution is principal components analysis (PCA), in which a linear decomposition is chosen to diagonalize the covariance structure of the input. The most well-known description of image statistics is that their Fourier spectra take the form of a power law [e.g., 1, 2, 3]. Coupled with a constraint of translation-invariance, this suggests that the Fourier transform is an appropriate PCA representation. Fourier and related representations are widely used in image processing applications. For example, the classical solution to the noise removal problem is the Wiener filter, which can be derived by assuming a signal model of decorrelated Gaussian-distributed coefficients in the Fourier domain.

...read moreread less

Journal Article•DOI•

Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework.

[...]

Anders H. Andersen¹, Don M. Gash¹, Malcolm J. Avison¹•Institutions (1)

University of Kentucky¹

01 Jul 1999-Magnetic Resonance Imaging

TL;DR: A generalized linear systems framework for PCA based on the singular value decomposition (SVD) model for representation of spatio-temporal fMRI data sets is presented and illustrated in the setting of dynamic time-series response data from fMRI experiments involving pharmacological stimulation of the dopaminergic nigro-striatal system in primates.

...read moreread less

Journal Article•DOI•

Recursive PCA for Adaptive Process Monitoring

[...]

S. Joe Qin¹, Weihua Li¹, H. Henry Yue¹•Institutions (1)

University of Texas at Austin¹

01 Jul 1999-IFAC Proceedings Volumes

TL;DR: The paper starts with an efficient approach to updating the correlation matrix recursively and proposes two recursive PCA algorithms using rank-one modification and Lanczos tridiagonalization for adaptive process monitoring.

...read moreread less

Journal Article•DOI•

Isolation enhanced principal component analysis

[...]

Janos Gertler¹, Weihua Li¹, Yunbing Huang², Thomas J. McAvoy²•Institutions (2)

George Mason University¹, University of Maryland, College Park²

01 Feb 1999-Aiche Journal

TL;DR: In this paper, the equivalence between PCA and parity relations has been exploited to transfer analytical redundancy to PCA, and the existence conditions of such residuals are demonstrated, as well as how disturbance decoupling is implied in the method.

...read moreread less

Abstract: Principal component analysis (PCA) may reduce the dimensionality of plant models significantly by exposing linear dependences among the variables. While PCA is a popular tool in detecting faults in complex plants, it offers little support in its original form for fault isolation. However, by utilizing the equivalence between PCA and parity relations, all the powerful concepts of analytical redundancy may be transferred to PCA. Following this path, it is shown how structured residuals, which have the same isolation properties as analytical redundancy residuals, are obtained by PCA. The existence conditions of such residuals are demonstrated, as well as how disturbance decoupling is implied in the method. The effect of the presence of control constraints in the training data is analyzed. Statistical testing methods for structured PCA residuals are also outlined. The theoretical findings are fully supported by simulation studies performed on the Tennessee Eastman process.

...read moreread less

Journal Article•DOI•

Quimiometria I: calibração multivariada, um tutorial

[...]

Márcia M. C. Ferreira¹, Alexandre M. Antunes¹, Marisa S. Melgo¹, Pedro L. O. Volpe¹•Institutions (1)

State University of Campinas¹

01 Sep 1999-Química Nova

TL;DR: The aim of this work is to present a tutorial on Multivariate Calibration, a tool which is nowadays necessary in basically most laboratories but very often misused.

...read moreread less

Abstract: The aim of this work is to present a tutorial on Multivariate Calibration, a tool which is nowadays necessary in basically most laboratories but very often misused. The basic concepts of preprocessing, principal component analysis (PCA), principal component regression (PCR) and partial least squares (PLS) are given. The two basic steps on any calibration procedure: model building and validation are fully discussed. The concepts of cross validation (to determine the number of factors to be used in the model), leverage and studentized residuals (to detect outliers) for the validation step are given. The whole calibration procedure is illustrated using spectra recorded for ternary mixtures of 2,4,6 trinitrophenolate, 2,4 dinitrophenolate and 2,5 dinitrophenolate followed by the concentration prediction of these three chemical species during a diffusion experiment through a hydrophobic liquid membrane. MATLAB software is used for numerical calculations. Most of the commands for the analysis are provided in order to allow a non-specialist to follow step by step the analysis.

...read moreread less

Journal Article•DOI•

Variable selection in large environmental data sets using principal components analysis

[...]

Jacquelynne R. King, Donald A. Jackson¹•Institutions (1)

University of Toronto¹

01 Jan 1999-Environmetrics

TL;DR: In this article, four methods of variable selection along with different criteria levels for deciding on the number of variables to retain were examined along with a selection method that requires one principal component analysis and retains variables by starting with selection from the first component.

...read moreread less

Abstract: In many large environmental datasets redundant variables can be discarded without the loss of extra variation. Principal components analysis can be used to select those variables that contain the most information. Using an environmental dataset consisting of 36 meteorological variables spanning 37 years, four methods of variable selection are examined along with different criteria levels for deciding on the number of variables to retain. Procrustes analysis, a measure of similarity and bivariate plots are used to assess the success of the alternative variable selection methods and criteria levels in extracting representative variables. The Broken-stick model is a consistent approach to choosing significant principal components and is chosen here as the more suitable criterion in combination with a selection method that requires one principal component analysis and retains variables by starting with selection from the first component. Copyright © 1999 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Generalizable patterns in neuroimaging: how many principal components?

[...]

Lars Kai Hansen¹, Jan Larsen¹, Finn Årup Nielsen¹, Stephen C. Strother², Egill Rostrup³, Robert L. Savoy⁴, Nicholas Lange⁴, John J. Sidtis², Claus Svarer, Olaf B. Paulson - Show less +6 more•Institutions (4)

Technical University of Denmark¹, University of Minnesota², Hvidovre Hospital³, Harvard University⁴

01 May 1999-NeuroImage

TL;DR: It is shown how the generalization error can be used to select the number of principal components in two analyses of functional magnetic resonance imaging activation sets.

...read moreread less

Journal Article•DOI•

Interference and noise-adjusted principal components analysis

[...]

Chien-I Chang¹, Qian Du•Institutions (1)

University of Baltimore¹

01 Sep 1999-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The interference is considered as a separate, unknown signal source from which an interference and noise-adjusted principal components analysis (INAPCA) can be developed in a manner similar to the one from which the NAPC was derived.

...read moreread less

Abstract: The goal of principal components analysis (PCA) is to find principal components in accordance with maximum variance of a data matrix. However, it has been shown recently that such variance-based principal components may not adequately represent image quality. As a result, a modified PCA approach based on maximization of SNR was proposed. Called maximum noise fraction (MNF) transformation or noise-adjusted principal components (NAPC) transform, it arranges principal components in decreasing order of image quality rather than variance. One of the major disadvantages of this approach is that the noise covariance matrix must be estimated accurately from the data a priori. Another is that the factor of interference is not taken into account in MNF or NAPC in which the interfering effect tends to be more serious than noise in hyperspectral images. In this paper, these two problems are addressed by considering the interference as a separate, unknown signal source, from which an interference and noise-adjusted principal components analysis (INAPCA) can be developed in a manner similar to the one from which the NAPC was derived. Two approaches are proposed for the INAPCA, referred to as signal to interference plus noise ratio-based principal components analysis (SINR-PCA) and interference-annihilated noise-whitened principal components analysis (IANW-PCA). It is shown that if interference is taken care of properly, SINR-PCA and IANW-PCA significantly improve NAPC. In addition, interference annihilation also improves the estimation of the noise covariance matrix. All of these results are compared with NAPC and PCA and are demonstrated by HYDICE data.

...read moreread less

Journal Article•DOI•

Recognising humans by gait via parametric canonical space

[...]

Ping S. Huang¹, Chris Harris¹, Mark S. Nixon¹•Institutions (1)

University of Southampton¹

01 Oct 1999-Artificial Intelligence in Engineering

TL;DR: A new approach is proposed which combines canonical space transformation (CST) based on Canonical Analysis (CA), with EST for feature extraction, which can be used to reduce data dimensionality and to optimise the class separability of different gait classes simultaneously.

...read moreread less

Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition

[...]

Chengjun Liu, Harry Wechsler

01 Jan 1999

TL;DR: Discriminant analysis shows that the ICA criterion, when carried out in the properly compressed and whitened space, performs better than the eigenfaces and Fisherfaces methods for face recognition, but its performance deteriorates when augmented by additional criteria such as the Maximum A Posteriori (MAP) rule of the Bayes classifier or the FLD.

...read moreread less

Abstract: This paper addresses the relative usefulness of Independent Component Analysis (ICA) for Face Recognition. Comparative assessments are made regarding (i) ICA sensitivity to the dimension of the space where it is carried out, and (ii) ICA discriminant performance alone or when combined with other discriminant criteria such as Bayesian framework or Fisher’s Linear Discriminant (FLD). Sensitivity analysis suggests that for enhanced performance ICA should be carried out in a compressed and whitened Principal Component Analysis (PCA) space where the small trailing eigenvalues are discarded. The reason for this finding is that during whitening the eigenvalues of the covariance matrix appear in the denominator and that the small trailing eigenvalues mostly encode noise. As a consequence the whitening component, if used in an uncompressed image space, would fit for misleading variations and thus generalize poorly to new data. Discriminant analysis shows that the ICA criterion, when carried out in the properly compressed and whitened space, performs better than the eigenfaces and Fisherfaces methods for face recognition, but its performance deteriorates when augmented by additional criteria such as the Maximum A Posteriori (MAP) rule of the Bayes classifier or the FLD. The reason for the last finding is that the Mahalanobis distance embedded in the MAP classifier duplicates to some extent the whitening component, while using FLD is counter to the independence criterion intrinsic to ICA.

...read moreread less

Journal Article•DOI•

Principal component analysis of neuronal ensemble activity reveals multidimensional somatosensory representations

[...]

John K. Chapin¹, Miguel A. L. Nicolelis²•Institutions (2)

Drexel University¹, Duke University²

15 Dec 1999-Journal of Neuroscience Methods

TL;DR: PCA provides a powerful set of tools for selectively measuring neural ensemble activity within multiple functionally significant 'dimensions' of information processing, and redefines the 'neuron' as an entity which contributes portions of its variance to processing not one, but several tasks.

...read moreread less

Proceedings Article•DOI•

Feature reduction for neural network based text categorization

[...]

S.L.Y. Lam¹, Dik Lun Lee²•Institutions (2)

University of Hong Kong¹, Hong Kong University of Science and Technology²

19 Apr 1999

TL;DR: The proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall.

...read moreread less

Abstract: In a text categorization model using an artificial neural network as the text classifier scalability is poor if the neural network is trained using the raw feature space since textural data has a very high-dimension feature space. We proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier. To test the effectiveness of the proposed model, experiments were conducted using a subset of the Reuters-22173 test collection for text categorization. The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall. Among the four dimensionality reduction techniques proposed, principal component analysis was found to be the most effective in reducing the dimensionality of the feature space.

...read moreread less

Journal Article•DOI•

Content analysis of video using principal components

[...]

E. Sahouria¹, Avideh Zakhor¹•Institutions (1)

University of California, Berkeley¹

01 Dec 1999-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The principal component analysis representation circumvents or eliminates several of the stumbling blocks in current analysis methods and makes new analyses feasible, and accomplishes high-level scene description without shot detection and key-frame selection.

...read moreread less

Abstract: We use principal component analysis (PCA) to reduce the dimensionality of features of video frames for the purpose of content description. This low-dimensional description makes practical the direct use of all the frames of a video sequence in later analysis. The PCA representation circumvents or eliminates several of the stumbling blocks in current analysis methods and makes new analyses feasible. We demonstrate this with two applications. The first accomplishes high-level scene description without shot detection and key-frame selection. The second uses the time sequences of motion data from every frame to classify sports sequences.

...read moreread less

Proceedings Article•DOI•

Construction of vector field hierarchies

[...]

Bjoern Heckel¹, Gunther H. Weber, Bernd Hamann¹, Kenneth I. Joy¹•Institutions (1)

University of California, Davis¹

24 Oct 1999

TL;DR: This work presents a method for the hierarchical representation of vector fields based on iterative refinement using clustering and principal component analysis, and assumes no particular structure of the field, nor does it require any topological connectivity information.

...read moreread less

Abstract: We present a method for the hierarchical representation of vector fields. Our approach is based on iterative refinement using clustering and principal component analysis. The input to our algorithm is a discrete set of points with associated vectors. The algorithm generates a top-down segmentation of the discrete field by splitting clusters of points. We measure the error of the various approximation levels by measuring the discrepancy between streamlines generated by the original discrete field and its approximations based on much smaller discrete data sets. Our method assumes no particular structure of the field, nor does it require any topological connectivity information. It is possible to generate multiresolution representations of vector fields using this approach.

...read moreread less

Collapse