scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1999"


Journal ArticleDOI
TL;DR: In this paper, the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis.
Abstract: Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

3,362 citations


Journal ArticleDOI
TL;DR: It is the view that distance-based RDA will be extremely useful to ecologists measuring multispecies responses to structured multifactorial experimental designs.
Abstract: We present a new multivariate technique for testing the significance of individual terms in a multifactorial analysis-of-variance model for multispecies response variables. The technique will allow researchers to base analyses on measures of association (distance measures) that are ecologically relevant. In addition, unlike other distance-based hypothesis-testing techniques, this method allows tests of significance of interaction terms in a linear model. The technique uses the existing method of redundancy analysis (RDA) but allows the analysis to be based on Bray-Curtis or other ecologically meaningful measures through the use of principal coordinate analysis (PCoA). Steps in the procedure include: (1) calculating a matrix of distances among replicates using a distance measure of choice (e.g., Bray-Curtis); (2) determining the principal coordinates (including a correction for negative eigenvalues, if necessary), which preserve these distances; (3) creating a matrix of dummy variables corresponding to the design of the experiment (i.e., individual terms in a linear model); (4) analyzing the relationship between the principal coordinates (species data) and the dummy variables (model) using RDA; and (5) implementing a test by permutation for particular statistics corresponding to the particular terms in the model. This method has certain advantages not shared by other multivariate testing procedures. We demonstrate the use of this technique with experimental ecological data from intertidal assemblages and show how the presence of significant multivariate interactions can be interpreted. It is our view that distance-based RDA will be extremely useful to ecologists measuring multispecies responses to structured multifactorial experimental designs.

2,193 citations


Journal ArticleDOI
TL;DR: PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model, which leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm.
Abstract: Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectationmaximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

1,927 citations


Journal ArticleDOI
TL;DR: This paper explores and compares techniques for automatically recognizing facial actions in sequences of images and provides converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.
Abstract: The facial action coding system (FAGS) is an objective method for quantifying facial movement in terms of component actions. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include: analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.

1,086 citations


Proceedings ArticleDOI
01 Dec 1999
TL;DR: This work shows that application of PCA to expression data allows us to summarize the ways in which gene responses vary under different conditions, and suggests that much of the observed variability in the experiment can be summarized in just 2 components.
Abstract: A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through different mechanisms. It is useful, therefore, to define a core set of independent features for the expression states that allow them to be compared directly. Principal components analysis (PCA) is a statistical technique for determining the key variables in a multidimensional data set that explain the differences in the observations, and can be used to simplify the analysis and visualization of multidimensional data sets. We show that application of PCA to expression data (where the experimental conditions are the variables, and the gene expression measurements are the observations) allows us to summarize the ways in which gene responses vary under different conditions. Examination of the components also provides insight into the underlying factors that are measured in the experiments. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components--i.e. 2 variables capture most of the information. These components appear to represent (1) overall induction level and (2) change in induction level over time. We also examined the clusters proposed in the original paper, and show how they are manifested in principal component space. Our results are available on the internet at http:?www.smi.stanford.edu/project/helix/PCArray .

815 citations


Journal ArticleDOI
TL;DR: The principal response curve method (PRC) as discussed by the authors is based on redundancy analysis (RDA), adjusted for overall changes in community response over time, as observed in control test systems.
Abstract: In this paper a novel multivariate method is proposed for the analysis of community response data from designed experiments repeatedly sampled in time. The long-term effects of the insecticide chlorpyrifos on the invertebrate community and the dissolved oxygen (DO)–pH–alkalinity–conductivity syndrome, in outdoor experimental ditches, are used as example data. The new method, which we have named the principal response curve method (PRC), is based on redundancy analysis (RDA), adjusted for overall changes in community response over time, as observed in control test systems. This allows the method to focus on the time-dependent treatment effects. The principal component is plotted against time, yielding a principal response curve of the community for each treatment. The PRC method distills the complexity of time-dependent, community-level effects of pollutants into a graphic form that can be appreciated more readily than the results of other currently available multivariate techniques. The PRC method also enables a quantitative interpretation of effects towards the species level.

757 citations


Journal ArticleDOI
TL;DR: This work presents a technique for specifying the problem in a structured way so that one program (the Multilinear Engine) may be used for solving widely different multilInear problems.
Abstract: A technique for fitting multilinear and quasi-multilinear mathematical expressions or models to two-, three-, and many-dimensional data arrays is described. Principal component analysis and three-way PARAFAC factor analysis are examples of bilinear and trilinear least squares fit. This work presents a technique for specifying the problem in a structured way so that one program (the Multilinear Engine) may be used for solving widely different multilinear problems. The multilinear equations to be solved are specified as a large table of integer code values. The end user creates this table by using a small preprocessing program. For each different case, an individual structure table is needed. The solution is computed by using the conjugate gradient algorithm. Non-negativity constraints are implemented by using the well-known technique of preconditioning in opposite way for slowing down changes of variables that are about to become negative. The iteration converges to a minimum that may be local or ...

743 citations


Journal ArticleDOI
TL;DR: An expectation-maximization (EM) algorithm is presented, which performs unsupervised learning of an associated probabilistic model of the mixing situation and is shown to be superior to ICA since it can learn arbitrary source densities from the data.
Abstract: We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square noiseless mixing but also the general case where the number of mixtures differs from the number of sources and the data are noisy. IFA is a two-step procedure. In the first step, the source densities, mixing matrix, and noise covariance are estimated from the observed data by maximum likelihood. For this purpose we present an expectation-maximization (EM) algorithm, which performs unsupervised learning of an associated probabilistic model of the mixing situation. Each source in our model is described by a mixture of gaussians; thus, all the probabilistic calculations can be performed analytically. In the second step, the sources are reconstructed from the observed data by an optimal nonlinear estimator. A variational approximation of this algorithm is derived for cases with a large number of sources, where the exact algorithm becomes intractable. Our IFA algorithm reduces to the one for ordinary FA when the sources become gaussian, and to an EM algorithm for PCA in the zero-noise limit. We derive an additional EM algorithm specifically for noiseless IFA. This algorithm is shown to be superior to ICA since it can learn arbitrary source densities from the data. Beyond blind separation, IFA can be used for modeling multidimensional data by a highly constrained mixture of gaussians and as a tool for nonlinear signal encoding.

573 citations


Journal ArticleDOI
TL;DR: A joint band-prioritization and band-decorrelation approach to band selection is considered for hyperspectral image classification and it is shown that the proposed band-selection method effectively eliminates a great number of insignificant bands.
Abstract: Band selection for remotely sensed image data is an effective means to mitigate the curse of dimensionality. Many criteria have been suggested in the past for optimal band selection. In this paper, a joint band-prioritization and band-decorrelation approach to band selection is considered for hyperspectral image classification. The proposed band prioritization is a method based on the eigen (spectral) decomposition of a matrix from which a loading-factors matrix can be constructed for band prioritization via the corresponding eigenvalues and eigenvectors. Two approaches are presented, principal components analysis (PCA)-based criteria and classification-based criteria. The former includes the maximum-variance PCA and maximum SNR PCA, whereas the latter derives the minimum misclassification canonical analysis (MMCA) (i.e., Fisher's discriminant analysis) and subspace projection-based criteria. Since the band prioritization does not take spectral correlation into account, an information-theoretic criterion called divergence is used for band decorrelation. Finally, the band selection can then be done by an eigenanalysis based band prioritization in conjunction with a divergence-based band decorrelation. It is shown that the proposed band-selection method effectively eliminates a great number of insignificant bands. Surprisingly, the experiments show that with a proper band selection, less than 0.1 of the total number of bands can achieve comparable performance using the number of full bands. This further demonstrates that the band selection can significantly reduce data volume so as to achieve data compression.

565 citations


Journal ArticleDOI
TL;DR: A method based on the variance of the reconstruction error to select the number of PCs is presented and a minimum is demonstrated under which this minimum corresponds to the true number of PC.
Abstract: One of the main difficulties in using principal component analysis (PCA) is the selection of the number of principal components (PCs). There exist a plethora of methods to calculate the number of PCs, but most of them use monotonically increasing or decreasing indices. Therefore, the decision to choose the number of principal components is very subjective. In this paper, we present a method based on the variance of the reconstruction error to select the number of PCs. This method demonstrates a minimum over the number of PCs. Conditions are given under which this minimum corresponds to the true number of PCs. Ten other methods available in the signal processing and chemometrics literature are overviewed and compared with the proposed method. Three data sets are used to test the different methods for selecting the number of PCs: two of them are real process data and the other one is a batch reactor simulation.

509 citations


Book
01 Jan 1999
TL;DR: In this article, a reduced rank mixed effects framework is proposed to handle the more difficult case where curves are often measured at an irregular and sparse set of time points which can differ widely across individuals.
Abstract: SUMMARY The elements of a multivariate data set are often curves rather than single points. Functional principal components can be used to describe the modes of variation of such curves. If one has complete measurements for each individual curve or, as is more common, one has measurements on a fine grid taken at the same time points for all curves, then many standard techniques may be applied. However, curves are often measured at an irregular and sparse set of time points which can differ widely across individuals. We present a technique for handling this more difficult case using a reduced rank mixed effects framework.

Proceedings ArticleDOI
08 Feb 1999
TL;DR: In this paper, a nonlinear form of principal component analysis (PCA) is proposed to perform polynomial feature extraction in high-dimensional feature spaces, related to input space by some nonlinear map; for instance, the space of all possible d-pixel products in images.
Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

Journal ArticleDOI
TL;DR: A segmented, and possibly multistage, principal components transformation (PCT) is proposed for efficient hyperspectral remote-sensing image classification and display and results have been obtained in terms of classification accuracy, speed, and quality of color image display using two airborne visible/infrared imaging spectrometer (AVIRIS) data sets.
Abstract: A segmented, and possibly multistage, principal components transformation (PCT) is proposed for efficient hyperspectral remote-sensing image classification and display. The scheme requires, initially, partitioning the complete set of bands into several highly correlated subgroups. After separate transformation of each subgroup, the single-band separabilities are used as a guide to carry out feature selection. The selected features can then be transformed again to achieve a satisfactory data reduction ratio and generate the three most significant components for color display. The scheme reduces the computational load significantly for feature extraction, compared with the conventional PCT. A reduced number of features will also accelerate the maximum likelihood classification process significantly, and the process will not suffer the limitations encountered by trying to use the full set of hyperspectral data when training samples are limited. Encouraging results have been obtained in terms of classification accuracy, speed, and quality of color image display using two airborne visible/infrared imaging spectrometer (AVIRIS) data sets.

Journal ArticleDOI
01 Jun 1999-Test
TL;DR: A method for exploring the structure of populations of complex objects, such as images, is considered, and endemic outliers motivate the development of a bounded influence approach to PCA.
Abstract: A method for exploring the structure of populations of complex objects, such as images, is considered. The objects are summarized by feature vectors. The statistical backbone is Principal Component Analysis in the space of feature vectors. Visual insights come from representing the results in the original data space. In an ophthalmological example, endemic outliers motivate the development of a bounded influence approach to PCA.

Proceedings ArticleDOI
Christopher M. Bishop1
01 Jan 1999
TL;DR: This paper develops an alternative, variational formulation of Bayesian PCA, based on a factorial representation of the posterior distribution, which maximizes a rigorous lower bound on the marginal log probability of the observed data.
Abstract: One of the central issues in the use of principal component analysis (PCA) for data modelling is that of choosing the appropriate number of retained components. This problem was recently addressed through the formulation of a Bayesian treatment of PCA in terms of a probabilistic latent variable model. A central feature of this approach is that the effective dimensionality of the latent space is determined automatically as part of the Bayesian inference procedure. In common with most non-trivial Bayesian models, however, the required marginalizations are analytically intractable, and so an approximation scheme based on a local Gaussian representation of the posterior distribution was employed. In this paper we develop an alternative, variational formulation of Bayesian PCA, based on a factorial representation of the posterior distribution. This approach is computationally efficient, and unlike other approximation schemes, it maximizes a rigorous lower bound on the marginal log probability of the observed data.

Journal ArticleDOI
TL;DR: Multivariate statistical process control tools have been developed for monitoring a Lam 9600 TCP metal etcher at Texas Instruments and the strengths and weaknesses of the methods are discussed, along with the relative advantages of each of the sensor systems.
Abstract: Multivariate statistical process control (MSPC) tools have been developed for monitoring a Lam 9600 TCP metal etcher at Texas Instruments. These tools are used to determine if the etch process is operating normally or if a system fault has occurred. Application of these methods is complicated because the etch process data exhibit a large amount of normal systematic variation. Variations due to faults of process concern can be relatively minor in comparison. The Lam 9600 used in this study is equipped with several sensor systems including engineering variables (e.g. pressure, gas flow rates and power), spatially resolved optical emission spectroscopy (OES) of the plasma and a radio-frequency monitoring (RFM) system to monitor the power and phase relationships of the plasma generator. A variety of analysis methods and data preprocessing techniques have been tested for their sensitivity to specific system faults. These methods have been applied to data from each of the sensor systems separately and in combination. The performance of the methods on a set of benchmark fault detection problems is presented and the strengths and weaknesses of the methods are discussed, along with the relative advantages of each of the sensor systems. Copyright © 1999 John Wiley & Sons, Ltd.

Book ChapterDOI
01 Jan 1999
TL;DR: The most well-known description of image statistics is that their Fourier spectra take the form of a power law, which suggests that the Fourier transform is an appropriate PCA representation of Fourier and related representations, widely used in image processing applications.
Abstract: The use of multi-scale decompositions has led to significant advances in representation, compression, restoration, analysis, and synthesis of signals. The fundamental reason for these advances is that the statistics of many natural signals, when decomposed in such bases, are substantially simplified. Choosing a basis that is adapted to statistical properties of the input signal is a classical problem. The traditional solution is principal components analysis (PCA), in which a linear decomposition is chosen to diagonalize the covariance structure of the input. The most well-known description of image statistics is that their Fourier spectra take the form of a power law [e.g., 1, 2, 3]. Coupled with a constraint of translation-invariance, this suggests that the Fourier transform is an appropriate PCA representation. Fourier and related representations are widely used in image processing applications. For example, the classical solution to the noise removal problem is the Wiener filter, which can be derived by assuming a signal model of decorrelated Gaussian-distributed coefficients in the Fourier domain.

Journal ArticleDOI
TL;DR: A generalized linear systems framework for PCA based on the singular value decomposition (SVD) model for representation of spatio-temporal fMRI data sets is presented and illustrated in the setting of dynamic time-series response data from fMRI experiments involving pharmacological stimulation of the dopaminergic nigro-striatal system in primates.

Journal ArticleDOI
TL;DR: The paper starts with an efficient approach to updating the correlation matrix recursively and proposes two recursive PCA algorithms using rank-one modification and Lanczos tridiagonalization for adaptive process monitoring.

Journal ArticleDOI
TL;DR: In this paper, the equivalence between PCA and parity relations has been exploited to transfer analytical redundancy to PCA, and the existence conditions of such residuals are demonstrated, as well as how disturbance decoupling is implied in the method.
Abstract: Principal component analysis (PCA) may reduce the dimensionality of plant models significantly by exposing linear dependences among the variables. While PCA is a popular tool in detecting faults in complex plants, it offers little support in its original form for fault isolation. However, by utilizing the equivalence between PCA and parity relations, all the powerful concepts of analytical redundancy may be transferred to PCA. Following this path, it is shown how structured residuals, which have the same isolation properties as analytical redundancy residuals, are obtained by PCA. The existence conditions of such residuals are demonstrated, as well as how disturbance decoupling is implied in the method. The effect of the presence of control constraints in the training data is analyzed. Statistical testing methods for structured PCA residuals are also outlined. The theoretical findings are fully supported by simulation studies performed on the Tennessee Eastman process.

Journal ArticleDOI
TL;DR: The aim of this work is to present a tutorial on Multivariate Calibration, a tool which is nowadays necessary in basically most laboratories but very often misused.
Abstract: The aim of this work is to present a tutorial on Multivariate Calibration, a tool which is nowadays necessary in basically most laboratories but very often misused. The basic concepts of preprocessing, principal component analysis (PCA), principal component regression (PCR) and partial least squares (PLS) are given. The two basic steps on any calibration procedure: model building and validation are fully discussed. The concepts of cross validation (to determine the number of factors to be used in the model), leverage and studentized residuals (to detect outliers) for the validation step are given. The whole calibration procedure is illustrated using spectra recorded for ternary mixtures of 2,4,6 trinitrophenolate, 2,4 dinitrophenolate and 2,5 dinitrophenolate followed by the concentration prediction of these three chemical species during a diffusion experiment through a hydrophobic liquid membrane. MATLAB software is used for numerical calculations. Most of the commands for the analysis are provided in order to allow a non-specialist to follow step by step the analysis.

Journal ArticleDOI
TL;DR: In this article, four methods of variable selection along with different criteria levels for deciding on the number of variables to retain were examined along with a selection method that requires one principal component analysis and retains variables by starting with selection from the first component.
Abstract: In many large environmental datasets redundant variables can be discarded without the loss of extra variation. Principal components analysis can be used to select those variables that contain the most information. Using an environmental dataset consisting of 36 meteorological variables spanning 37 years, four methods of variable selection are examined along with different criteria levels for deciding on the number of variables to retain. Procrustes analysis, a measure of similarity and bivariate plots are used to assess the success of the alternative variable selection methods and criteria levels in extracting representative variables. The Broken-stick model is a consistent approach to choosing significant principal components and is chosen here as the more suitable criterion in combination with a selection method that requires one principal component analysis and retains variables by starting with selection from the first component. Copyright © 1999 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: It is shown how the generalization error can be used to select the number of principal components in two analyses of functional magnetic resonance imaging activation sets.

Journal ArticleDOI
TL;DR: The interference is considered as a separate, unknown signal source from which an interference and noise-adjusted principal components analysis (INAPCA) can be developed in a manner similar to the one from which the NAPC was derived.
Abstract: The goal of principal components analysis (PCA) is to find principal components in accordance with maximum variance of a data matrix. However, it has been shown recently that such variance-based principal components may not adequately represent image quality. As a result, a modified PCA approach based on maximization of SNR was proposed. Called maximum noise fraction (MNF) transformation or noise-adjusted principal components (NAPC) transform, it arranges principal components in decreasing order of image quality rather than variance. One of the major disadvantages of this approach is that the noise covariance matrix must be estimated accurately from the data a priori. Another is that the factor of interference is not taken into account in MNF or NAPC in which the interfering effect tends to be more serious than noise in hyperspectral images. In this paper, these two problems are addressed by considering the interference as a separate, unknown signal source, from which an interference and noise-adjusted principal components analysis (INAPCA) can be developed in a manner similar to the one from which the NAPC was derived. Two approaches are proposed for the INAPCA, referred to as signal to interference plus noise ratio-based principal components analysis (SINR-PCA) and interference-annihilated noise-whitened principal components analysis (IANW-PCA). It is shown that if interference is taken care of properly, SINR-PCA and IANW-PCA significantly improve NAPC. In addition, interference annihilation also improves the estimation of the noise covariance matrix. All of these results are compared with NAPC and PCA and are demonstrated by HYDICE data.

Journal ArticleDOI
TL;DR: A new approach is proposed which combines canonical space transformation (CST) based on Canonical Analysis (CA), with EST for feature extraction, which can be used to reduce data dimensionality and to optimise the class separability of different gait classes simultaneously.

01 Jan 1999
TL;DR: Discriminant analysis shows that the ICA criterion, when carried out in the properly compressed and whitened space, performs better than the eigenfaces and Fisherfaces methods for face recognition, but its performance deteriorates when augmented by additional criteria such as the Maximum A Posteriori (MAP) rule of the Bayes classifier or the FLD.
Abstract: This paper addresses the relative usefulness of Independent Component Analysis (ICA) for Face Recognition. Comparative assessments are made regarding (i) ICA sensitivity to the dimension of the space where it is carried out, and (ii) ICA discriminant performance alone or when combined with other discriminant criteria such as Bayesian framework or Fisher’s Linear Discriminant (FLD). Sensitivity analysis suggests that for enhanced performance ICA should be carried out in a compressed and whitened Principal Component Analysis (PCA) space where the small trailing eigenvalues are discarded. The reason for this finding is that during whitening the eigenvalues of the covariance matrix appear in the denominator and that the small trailing eigenvalues mostly encode noise. As a consequence the whitening component, if used in an uncompressed image space, would fit for misleading variations and thus generalize poorly to new data. Discriminant analysis shows that the ICA criterion, when carried out in the properly compressed and whitened space, performs better than the eigenfaces and Fisherfaces methods for face recognition, but its performance deteriorates when augmented by additional criteria such as the Maximum A Posteriori (MAP) rule of the Bayes classifier or the FLD. The reason for the last finding is that the Mahalanobis distance embedded in the MAP classifier duplicates to some extent the whitening component, while using FLD is counter to the independence criterion intrinsic to ICA.

Journal ArticleDOI
TL;DR: PCA provides a powerful set of tools for selectively measuring neural ensemble activity within multiple functionally significant 'dimensions' of information processing, and redefines the 'neuron' as an entity which contributes portions of its variance to processing not one, but several tasks.

Proceedings ArticleDOI
19 Apr 1999
TL;DR: The proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall.
Abstract: In a text categorization model using an artificial neural network as the text classifier scalability is poor if the neural network is trained using the raw feature space since textural data has a very high-dimension feature space. We proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier. To test the effectiveness of the proposed model, experiments were conducted using a subset of the Reuters-22173 test collection for text categorization. The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall. Among the four dimensionality reduction techniques proposed, principal component analysis was found to be the most effective in reducing the dimensionality of the feature space.

Journal ArticleDOI
TL;DR: The principal component analysis representation circumvents or eliminates several of the stumbling blocks in current analysis methods and makes new analyses feasible, and accomplishes high-level scene description without shot detection and key-frame selection.
Abstract: We use principal component analysis (PCA) to reduce the dimensionality of features of video frames for the purpose of content description. This low-dimensional description makes practical the direct use of all the frames of a video sequence in later analysis. The PCA representation circumvents or eliminates several of the stumbling blocks in current analysis methods and makes new analyses feasible. We demonstrate this with two applications. The first accomplishes high-level scene description without shot detection and key-frame selection. The second uses the time sequences of motion data from every frame to classify sports sequences.

Proceedings ArticleDOI
24 Oct 1999
TL;DR: This work presents a method for the hierarchical representation of vector fields based on iterative refinement using clustering and principal component analysis, and assumes no particular structure of the field, nor does it require any topological connectivity information.
Abstract: We present a method for the hierarchical representation of vector fields. Our approach is based on iterative refinement using clustering and principal component analysis. The input to our algorithm is a discrete set of points with associated vectors. The algorithm generates a top-down segmentation of the discrete field by splitting clusters of points. We measure the error of the various approximation levels by measuring the discrepancy between streamlines generated by the original discrete field and its approximations based on much smaller discrete data sets. Our method assumes no particular structure of the field, nor does it require any topological connectivity information. It is possible to generate multiresolution representations of vector fields using this approach.