scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 2003"


Book
12 Mar 2003
TL;DR: The concept and need for Principal Components Analysis, a method forsupervised Pattern Recognition: Cluster Analysis, and its application in Chemistry are explained.
Abstract: Preface. Supplementary Information. Acknowledgements. 1. INTRODUCTION. Points of View. Software and Calculations. Further Reading. References. 2. EXPERIMENTAL DESIGN. Introduction. Basic Principles. Factorial Designs. Central Composite or Response Surface Designs. Mixture Designs. Simplex Optimisation. Problems. 3. SIGNAL PROCESSING. Sequential Signals in Chemistry. Basics. Linear Filters. Correlograms and Time Series Analysis. Fourier Transform Techniques. Topical Methods. Problems. 4. PATTERN RECOGNITION. Introduction. The Concept and Need for Principal Components Analysis. Principal Components Analysis: the Method. Unsupervised Pattern Recognition: Cluster Analysis. Supervised Pattern Recognition. Multiway Pattern Recognition. Problems. 5. CALIBRATION. Introduction. Univariate Calibration. Multiple Linear Regression. Principal Components Regression. Partial Least Squares. Model Validation. Problems. 6. EVOLUTIONARY SIGNALS. Introduction. Exploratory Data Analysis and Preprocessing. Determining Composition. Resolution. Problems. Appendices A.1 Vectors and Matrices. A.2 Algorithms. A.3 Basic Statistical Concepts. A.4 Excel for Chemometrics. A.5 Matlab for Chemometrics. Index

1,411 citations


Book ChapterDOI
TL;DR: This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data, and describes the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix.
Abstract: This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.

1,211 citations


Journal ArticleDOI
TL;DR: The necessity and usefulness of multivariate statistical assessment of large and complex databases in order to get better information about the quality of surface water, the design of sampling and analytical protocols and the effective pollution control/management of the surface waters is presented.

1,136 citations


Proceedings Article
09 Dec 2003
TL;DR: A new underlying probabilistic model for principal component analysis (PCA) is introduced that shows that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, and is extended by considering less restrictive covariance functions which allow non-linear mappings.
Abstract: In this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance functions which allow non-linear mappings. This more general Gaussian process latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our non-linear algorithm can be further kernelised leading to 'twin kernel PCA' in which a mapping between feature spaces occurs.

843 citations


Journal ArticleDOI
TL;DR: The least absolute shrinkage and selection approach (LASSO) as mentioned in this paper is a technique for interpreting multiple regression equations, which is based on principal component analysis (PCA) in the context of multiple regression.
Abstract: In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difficult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefficients on all p variables. A common approach is to effectively ignore (treat as zero) any coefficients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading. There are alternatives to principal component analysis which restrict the coefficients to a smaller number of possible values in the derivation of the linear functions, or replace the principal components by “principal variables.” This article introduces a new technique, borrowing an idea proposed by Tibshirani in the context of multiple regression where similar problems arise in interpreting regression equations. This approach is the so-called LASSO, the “least absolute shrinkage and selection o...

841 citations


Journal ArticleDOI
TL;DR: This paper proposes a kernel machine-based discriminant analysis method, which deals with the nonlinearity of the face patterns' distribution and effectively solves the so-called "small sample size" (SSS) problem, which exists in most FR tasks.
Abstract: Techniques that can introduce low-dimensional feature representation with enhanced discriminatory power is of paramount importance in face recognition (FR) systems. It is well known that the distribution of face images, under a perceivable variation in viewpoint, illumination or facial expression, is highly nonlinear and complex. It is, therefore, not surprising that linear techniques, such as those based on principle component analysis (PCA) or linear discriminant analysis (LDA), cannot provide reliable and robust solutions to those FR problems with complex face variations. In this paper, we propose a kernel machine-based discriminant analysis method, which deals with the nonlinearity of the face patterns' distribution. The proposed method also effectively solves the so-called "small sample size" (SSS) problem, which exists in most FR tasks. The new algorithm has been tested, in terms of classification error rate performance, on the multiview UMIST face database. Results indicate that the proposed methodology is able to achieve excellent performance with only a very small set of features being used, and its error rate is approximately 34% and 48% of those of two other commonly used kernel FR approaches, the kernel-PCA (KPCA) and the generalized discriminant analysis (GDA), respectively.

651 citations


Proceedings Article
01 Jan 2003
TL;DR: A novel scheme that uses robust principal component classifier in intrusion detection problems where the training data may be unsupervised and outperforms the nearest neighbor method, density-based local outliers (LOF) approach, and the outlier detection algorithm based on Canberra metric is proposed.
Abstract: : This paper proposes a novel scheme that uses robust principal component classifier in intrusion detection problems where the training data may be unsupervised Assuming that anomalies can be treated as outliers, an intrusion predictive model is constructed from the major and minor principal components of the normal instances A measure of the difference of an anomaly from the normal instance is the distance in the principal component space The distance based on the major components that account for 50% of the total variation and the minor components whose eigenvalues less than 020 is shown to work well The experiments with KDD Cup 1999 data demonstrate that the proposed method achieves 9894% in recall and 9789% in precision with the false alarm rate 092% and outperforms the nearest neighbor method, density-based local outliers (LOF) approach, and the outlier detection algorithm based on Canberra metric

574 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables) and found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level.

554 citations


Journal ArticleDOI
TL;DR: The experiment shows that SVM by feature extraction using PCA, KPCA or ICA can perform better than that without feature extraction, and among the three methods, there is the best performance in K PCA feature extraction; followed by ICA feature extraction.

524 citations


Journal ArticleDOI
TL;DR: A fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariances-free).
Abstract: Appearance-based image analysis techniques require fast computation of principal components of high-dimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariance-free). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some well known distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for real-time applications and, thus, it does not allow iterations. It converges very fast for high-dimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed.

479 citations


Journal ArticleDOI
TL;DR: The performance of PLS-DA with published data from breast cancer is found to be extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation.
Abstract: Partial least squares discriminant analysis (PLS-DA) is a partial least squares regression of a set Y of binary variables describing the categories of a categorical variable on a set X of predictor variables. It is a compromise between the usual discriminant analysis and a discriminant analysis on the significant principal components of the predictor variables. This technique is specially suited to deal with a much larger number of predictors than observations and with multicollineality, two of the main problems encountered when analysing microarray expression data. We explore the performance of PLS-DA with published data from breast cancer (Perou et al. 2000). Several such analyses were carried out: (1) before vs after chemotherapy treatment, (2) estrogen receptor positive vs negative tumours, and (3) tumour classification. We found that the performance of PLS-DA was extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation. We conclude that PLS-DA is a powerful yet simple tool for analysing microarray data.

Journal ArticleDOI
01 Sep 2003-Ecology
TL;DR: In this paper, the authors compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power, and two novel approaches based on the broken-stick model were also evaluated.
Abstract: Principal component analysis (PCA) is one of the most commonly used tools in the analysis of ecological data. This method reduces the effective dimensionality of a multivariate data set by producing linear combinations of the original variables (i.e., com- ponents) that summarize the predominant patterns in the data. In order to provide meaningful interpretations for principal components, it is important to determine which variables are associated with particular components. Some data analysts incorrectly test the statistical significance of the correlation between original variables and multivariate scores using standard statistical tables. Others interpret eigenvector coefficients larger than an arbitrary absolute value (e.g., 0.50). Resampling, randomization techniques, and parallel analysis have been applied in a few cases. In this study, we compared the performance of a variety of approaches for assessing the significance of eigenvector coefficients in terms of type I error rates and power. Two novel approaches based on the broken-stick model were also evaluated. We used a variety of simulated scenarios to examine the influence of the number of real dimensions in the data; unique versus complex variables; the magnitude of eigen- vector coefficients; and the number of variables associated with a particular dimension. Our results revealed that bootstrap confidence intervals and a modified bootstrap confidence interval for the broken-stick model proved to be the most reliable techniques.

Journal ArticleDOI
TL;DR: The application of a new automated, unbiased, multivariate statistical analysis technique to very large X-ray spectral image data sets, based in part on principal components analysis, returns physically accurate component spectra and images in a few minutes on a standard personal computer.
Abstract: Spectral imaging in the scanning electron microscope (SEM) equipped with an energy-dispersive X-ray (EDX) analyzer has the potential to be a powerful tool for chemical phase identification, but the large data sets have, in the past, proved too large to efficiently analyze. In the present work, we describe the application of a new automated, unbiased, multivariate statistical analysis technique to very large X-ray spectral image data sets. The method, based in part on principal components analysis, returns physically accurate (all positive) component spectra and images in a few minutes on a standard personal computer. The efficacy of the technique for microanalysis is illustrated by the analysis of complex multi-phase materials, particulates, a diffusion couple, and a single-pixel-detection problem.

Book ChapterDOI
TL;DR: This chapter extends the stability-based validation of cluster structure, and proposes stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices.
Abstract: Clustering is one of the most commonly used tools in the analysis of gene expression data (1, 2) . The usage in grouping genes is based on the premise that co-expression is a result of co-regulation. It is thus a preliminary step in extracting gene networks and inference of gene function (3, 4) . Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3, 5, 6) , including sensitivity to drugs (7) , and can also detect artifacts of experimental conditions (8) . Clustering and its applications in biology are presented in greater detail in the chapter by Zhao and Karypis (see also (9) ). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well. Clustering is a form of unsupervised learning, i.e. no information on the class variable is assumed, and the objective is to find the “natural” groups in the data. However, most clustering algorithms generate a clustering even if the data has no inherent cluster structure, so external validation tools are required. Given a set of partitions of the data into an increasing number of clusters (e.g. by a hierarchical clustering algorithm, or k-means), such a validation tool will tell the user the number of clusters in the data (if any). Many methods have been proposed in the literature to address this problem (10–15) . Recent studies have shown the advantages of sampling-based methods (12, 14) . These methods are based on the idea that when a partition has captured the structure in the data, this partition should be stable with respect to perturbation of the data. Bittner et al. (16) used a similar approach to validate clusters representing gene expression of melanoma patients. The emergence of cluster structure depends on several choices: data representation and normalization, the choice of a similarity measure and clustering algorithm. In this chapter we extend the stability-based validation of cluster structure, and propose stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices. We use this framework to demonstrate the ability of Principal Component Analysis (PCA) to extract features relevant to the cluster structure. We use stability as a tool for simultaneously choosing the number of principal components and the number of clusters; we compare the performance of different similarity measures and normalization schemes. The approach is demonstrated through a case study of yeast gene expression data from Eisen et al. (1) . For yeast, a functional classification of a large number of genes is known, and we use this classification for validating the results produced by clustering. A method for comparing clustering solutions specifically applicable to gene expression data was introduced in (17) . However, it cannot be used to choose the number of clusters, and is not directly applicable in choosing the number of principal components. The results of clustering are easily corrupted by the addition of noise: even a few

Journal ArticleDOI
TL;DR: ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.
Abstract: We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

Journal ArticleDOI
TL;DR: This study shows the importance of environmental monitoring associated with simple but powerful statistics to better understand a complex water system.

Journal ArticleDOI
TL;DR: Unrestricted, unstandardized covariance-based PCA solutions optimize ERP component identification and measurement and Interpretability (more distinctive component waveforms with narrow and unambiguous loading peaks) and statistical conclusions (greater effect stability across extraction criteria) were best for un standardized covariances-based solutions.

Journal ArticleDOI
TL;DR: It is sufficient to find the orthonormal rotation y=Wz of prewhitened sources z=Vx, which minimizes the mean squared error of the reconstruction of z from the rectified version y/sup +/ of y, which shows in particular the fast convergence of the rotation and geodesic methods.
Abstract: We consider the task of solving the independent component analysis (ICA) problem x=As given observations x, with a constraint of nonnegativity of the source random vector s. We refer to this as nonnegative independent component analysis and we consider methods for solving this task. For independent sources with nonzero probability density function (pdf) p(s) down to s=0 it is sufficient to find the orthonormal rotation y=Wz of prewhitened sources z=Vx, which minimizes the mean squared error of the reconstruction of z from the rectified version y/sup +/ of y. We suggest some algorithms which perform this, both based on a nonlinear principal component analysis (PCA) approach and on a geodesic search method driven by differential geometry considerations. We demonstrate the operation of these algorithms on an image separation problem, which shows in particular the fast convergence of the rotation and geodesic methods and apply the approach to a musical audio analysis task.

Journal ArticleDOI
TL;DR: The minimum classification error (MCE) training algorithm (which was originally proposed for optimizing classifiers) is investigated for feature extraction and a generalized MCE (GMCE)Training algorithm is proposed to mend the shortcomings of the MCE training algorithm.

Proceedings ArticleDOI
13 Oct 2003
TL;DR: This work proposes a new approach to mapping face images into a subspace obtained by locality preserving projections (LPP) for face analysis, which provides a better representation and achieves lower error rates in face recognition.
Abstract: We have demonstrated that the face recognition performance can be improved significantly in low dimensional linear subspaces. Conventionally, principal component analysis (PCA) and linear discriminant analysis (LDA) are considered effective in deriving such a face subspace. However, both of them effectively see only the Euclidean structure of face space. We propose a new approach to mapping face images into a subspace obtained by locality preserving projections (LPP) for face analysis. We call this Laplacianface approach. Different from PCA and LDA, LPP finds an embedding that preserves local information, and obtains a face space that best detects the essential manifold structure. In this way, the unwanted variations resulting from changes in lighting, facial expression, and pose may be eliminated or reduced. We compare the proposed Laplacianface approach with eigenface and fisherface methods on three test datasets. Experimental results show that the proposed Laplacianface approach provides a better representation and achieves lower error rates in face recognition.

Journal ArticleDOI
TL;DR: It is shown that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.
Abstract: Hyperspectral imagery provides richer information about materials than multispectral imagery. The new larger data volumes from hyperspectral sensors present a challenge for traditional processing techniques. For example, the identification of each ground surface pixel by its corresponding spectral signature is still difficult because of the immense volume of data. Conventional classification methods may not be used without dimension reduction preprocessing. This is due to the curse of dimensionality, which refers to the fact that the sample size needed to estimate a function of several variables to a given degree of accuracy grows exponentially with the number of variables. Principal component analysis (PCA) has been the technique of choice for dimension reduction. However, PCA is computationally expensive and does not eliminate anomalies that can be seen at one arbitrary band. Spectral data reduction using automatic wavelet decomposition could be useful. This is because it preserves the distinctions among spectral signatures. It is also computed in automatic fashion and can filter data anomalies. This is due to the intrinsic properties of wavelet transforms that preserves high- and low-frequency features, therefore preserving peaks and valleys found in typical spectra. Compared to PCA, for the same level of data reduction, we show that automatic wavelet reduction yields better or comparable classification accuracy for hyperspectral data, while achieving substantial computational savings.

Journal ArticleDOI
TL;DR: The proposed face recognition technique is based on the implementation of the principal component analysis algorithm and the extraction of depth and colour eigenfaces and Experimental results show significant gains attained with the addition of depth information.

Journal ArticleDOI
TL;DR: This result indicates that the best practical combination is PCA with SVM for face recognition, as the training time for ICA is much larger than that of PCA.

Journal ArticleDOI
TL;DR: A comparison between NMF, WNMF and the well-known principal component analysis (PCA) in the context of image patch classification has been carried out and it is claimed that all three techniques can be combined in a common and unique classifier.

Journal ArticleDOI
TL;DR: In this paper, a factor analysis method that can resist the effect of outliers is proposed. But the method is based on a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA).

Proceedings ArticleDOI
24 Nov 2003
TL;DR: This paper presents quaternion matrix algebra techniques that can be used to process the eigen analysis of a color image and introduces the extension of two classical techniques to their quaternionic case: singular value decomposition (SVD) and Karhunen-Loeve transform (KLT).
Abstract: In this paper, we present quaternion matrix algebra techniques that can be used to process the eigen analysis of a color image. Applications of principal component analysis (PCA) in image processing are numerous, and the proposed tools aim to give material for color image processing, that take into account their particular nature. For this purpose, we use the quaternion model for color images and introduce the extension of two classical techniques to their quaternionic case: singular value decomposition (SVD) and Karhunen-Loeve transform (KLT). For the quaternionic version of the KLT, we also introduce the problem of eigenvalue decomposition (EVD) of a quaternion matrix. We give the properties of these quaternion tools for color images and present their behavior on natural images. We also present a method to compute the decompositions using complex matrix algebra. Finally, we start a discussion on possible applications of the proposed techniques in color images processing.

Proceedings Article
03 Jan 2003
TL;DR: An alternating least squares method is derived to estimate the basis vectors and generalized linear coefficients of the logistic PCA model, a generalized linear model for dimensionality reduction of binary data that is related to principal component analysis (PCA) and is much better suited to modeling binary data than conventional PCA.
Abstract: We investigate a generalized linear model for dimensionality reduction of binary data. The model is related to principal component analysis (PCA) in the same way that logistic regression is related to linear regression. Thus we refer to the model as logistic PCA. In this paper, we derive an alternating least squares method to estimate the basis vectors and generalized linear coefficients of the logistic PCA model. The resulting updates have a simple closed form and are guaranteed at each iteration to improve the model’s likelihood. We evaluate the performance of logistic PCA—as measured by reconstruction error rates—on data sets drawn from four real world applications. In general, we find that logistic PCA is much better suited to modeling binary data than conventional PCA.

Proceedings ArticleDOI
24 Aug 2003
TL;DR: This paper presents an alternative clustering-based methodology for the discovery of climate indices that overcomes limitiations and is based on clusters that represent regions with relatively homogeneous behavior, and shows that cluster based indices generally outperform SVD derived indices.
Abstract: To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth's oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eigenvalue analysis techniques, such as principal components analysis (PCA) and singular value decomposition (SVD), to discover climate indices. However, eigenvalue techniques are only useful for finding a few of the strongest signals. Furthermore, they impose a condition that all discovered signals must be orthogonal to each other, making it difficult to attach a physical interpretation to them. This paper presents an alternative clustering-based methodology for the discovery of climate indices that overcomes these limitiations and is based on clusters that represent regions with relatively homogeneous behavior. The centroids of these clusters are time series that summarize the behavior of the ocean or atmosphere in those regions. Some of these centroids correspond to known climate indices and provide a validation of our methodology; other centroids are variants of known indices that may provide better predictive power for some land areas; and still other indices may represent potentially new Earth science phenomena. Finally, we show that cluster based indices generally outperform SVD derived indices, both in terms of area weighted correlation and direct correlation with the known indices.

Journal ArticleDOI
TL;DR: A supervised method for locating the origin of the cone based on identification of clusters in the data is presented, and the effects of proper origin orientation are illustrated.
Abstract: A new pseudocolor mapping strategy for use with spectral imagery is presented. This strategy is based on a principal components analysis of spectral data, and it capitalizes on the similarities between three-color human vision and high-dimensional hyperspectral datasets. The mapping is closely related to three-dimensional versions of scatter plots that are commonly used in remote sensing to visualize the data cloud. The transformation results in final images where the color assigned to each pixel is solely determined by the position within the data cloud. Materials with similar spectral characteristics are presented in similar hues, and basic classification and clustering decisions can be made by the observer. Final images tend to have large regions of desaturated pixels that make the image more readily interpretable. The data cloud is shown to be conical in nature, and materials with common spectral signatures radiate from the origin of the cone, which is not (in general) at the origin of the spectral data. A supervised method for locating the origin of the cone based on identification of clusters in the data is presented, and the effects of proper origin orientation are illustrated.

Journal ArticleDOI
TL;DR: The fluorescence excitation-emission wavelengths identified as being diagnostic from the PCA-SVM algorithm suggest that the important fluorophores for breast cancer diagnosis are most likely tryptophan, NAD(P)H and flavoproteins.
Abstract: Nonmalignant (n = 36) and malignant (n = 20) tissue samples were obtained from breast cancer and breast reduction surgeries. These tissues were characterized using multiple excitation wavelength fluorescence spectroscopy and diffuse reflectance spectroscopy in the ultraviolet-visible wavelength range, immediately after excision. Spectra were then analyzed using principal component analysis (PCA) as a data reduction technique. PCA was performed on each fluorescence spectrum, as well as on the diffuse reflectance spectrum individually, to establish a set of principal components for each spectrum. A Wilcoxon rank-sum test was used to determine which principal components show statistically significant differences between malignant and nonmalignant tissues. Finally, a support vector machine (SVM) algorithm was utilized to classify the samples based on the diagnostically useful principal components. Cross-validation of this nonparametric algorithm was carried out to determine its classification accuracy in an unbiased manner. Multiexcitation fluorescence spectroscopy was successful in discriminating malignant and nonmalignant tissues, with a sensitivity and specificity of 70% and 92%, respectively. The sensitivity (30%) and specificity (78%) of diffuse reflectance spectroscopy alone was significantly lower. Combining fluorescence and diffuse reflectance spectra did not improve the classification accuracy of an algorithm based on fluorescence spectra alone. The fluorescence excitation-emission wavelengths identified as being diagnostic from the PCA-SVM algorithm suggest that the important fluorophores for breast cancer diagnosis are most likely tryptophan, NAD(P)H and flavoproteins.