scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Partial least-squares regression: a tutorial

01 Jan 1986-Analytica Chimica Acta (Elsevier)-Vol. 185, pp 1-17
TL;DR: In this paper, a tutorial on the Partial Least Squares (PLS) regression method is provided, and an algorithm for a predictive PLS and some practical hints for its use are given.
About: This article is published in Analytica Chimica Acta.The article was published on 1986-01-01. It has received 6393 citations till now. The article focuses on the topics: Partial least squares regression & Regression analysis.
Citations
More filters
Journal ArticleDOI
TL;DR: Principal Component Analysis is a multivariate exploratory analysis method useful to separate systematic variation from noise and to define a space of reduced dimensions that preserve noise.

8,660 citations

Book
17 May 2013
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

3,672 citations


Cites background from "Partial least-squares regression: a..."

  • ...9, and a thorough explanation of the algorithm can be found in Geladi and Kowalski (1986). To obtain a better understanding of the algorithm’s function, Stone and Brooks (1990) linked it to well-known statistical concepts of covariance and regression....

    [...]

Journal ArticleDOI
TL;DR: The NLPCA method is demonstrated using time-dependent, simulated batch reaction data and shows that it successfully reduces dimensionality and produces a feature space map resembling the actual distribution of the underlying system parameters.
Abstract: Nonlinear principal component analysis is a novel technique for multivariate data analysis, similar to the well-known method of principal component analysis. NLPCA, like PCA, is used to identify and remove correlations among problem variables as an aid to dimensionality reduction, visualization, and exploratory data analysis. While PCA identifies only linear correlations between variables, NLPCA uncovers both linear and nonlinear correlations, without restriction on the character of the nonlinearities present in the data. NLPCA operates by training a feedforward neural network to perform the identity mapping, where the network inputs are reproduced at the output layer. The network contains an internal “bottleneck” layer (containing fewer nodes than input or output layers), which forces the network to develop a compact representation of the input data, and two additional hidden layers. The NLPCA method is demonstrated using time-dependent, simulated batch reaction data. Results show that NLPCA successfully reduces dimensionality and produces a feature space map resembling the actual distribution of the underlying system parameters.

2,643 citations

Journal ArticleDOI
TL;DR: Partial least squares (PLS) as discussed by the authors is one of the most popular spectral analysis methods for spectral analysis, which is composed of a series of simpllfled classical least-squares (CLS) and ILS steps.
Abstract: Partial leastgquares (PLS) methods for spectral analyses are related to other multlvarlate callbratlon methods such as classical least-squares (CLS), Inverse least-squares (ILS), and prlnclpal component regression (PCR) methods which have been used often In quantitative spectral analyses. The PLS method which analyzes one chemlcal component at a tbne Is presented, and the basis for each step In the algorithm Is explained. PLS callbratlon Is shown to be composed of a series of simpllfled CLS and ILS steps. This detalled understandlng of the PLS algorithm has helped to ldentlfy how chemically Interpretable qualltatlve spectral lnformatlon can be obtained from the lntennedlate steps of the PLS algorithm. These methods for extractlng qualitative Information are demonstrated by use of simulated spectral data. The qualltatlve Information directly available from the PLS analysis Is superlor to that obtained from PCR but is not as complete as that which can be generated during CLS analyses. Methods are presented for selecting optbnal numbers of loading vectors for both the PLS and PCR models In order to optimize the model while simultaneously reduclng the potential for overfittlng the caHbratlon data. Outlier detection and methods to evaluate the statlstlcal slgnlflcance of resuits obtalned from the dMerent cahatlon methods applied to the same spectral data are also discussed.

2,443 citations

Book
01 Oct 2010
TL;DR: Partial least squares (PLS) was not originally designed as a tool for statistical discrimination as discussed by the authors, but applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role.
Abstract: Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed. Copyright © 2003 John Wiley & Sons, Ltd.

2,067 citations

References
More filters
Book
01 Jan 1966
TL;DR: In this article, the Straight Line Case is used to fit a straight line by least squares, and the Durbin-Watson Test is used for checking the straight line fit.
Abstract: Basic Prerequisite Knowledge. Fitting a Straight Line by Least Squares. Checking the Straight Line Fit. Fitting Straight Lines: Special Topics. Regression in Matrix Terms: Straight Line Case. The General Regression Situation. Extra Sums of Squares and Tests for Several Parameters Being Zero. Serial Correlation in the Residuals and the Durbin--Watson Test. More of Checking Fitted Models. Multiple Regression: Special Topics. Bias in Regression Estimates, and Expected Values of Mean Squares and Sums of Squares. On Worthwhile Regressions, Big F's, and R 2 . Models Containing Functions of the Predictors, Including Polynomial Models. Transformation of the Response Variable. "Dummy" Variables. Selecting the "Best" Regression Equation. Ill--Conditioning in Regression Data. Ridge Regression. Generalized Linear Models (GLIM). Mixture Ingredients as Predictor Variables. The Geometry of Least Squares. More Geometry of Least Squares. Orthogonal Polynomials and Summary Data. Multiple Regression Applied to Analysis of Variance Problems. An Introduction to Nonlinear Estimation. Robust Regression. Resampling Procedures (Bootstrapping). Bibliography. True/False Questions. Answers to Exercises. Tables. Indexes.

18,952 citations

Book
08 Jul 1980
TL;DR: In this article, the authors present a method for detecting and assessing Collinearity of observations and outliers in the context of extensions to the Wikipedia corpus, based on the concept of Influential Observations.
Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

6,449 citations

Journal ArticleDOI
TL;DR: In this article, the use of Partial Least Squares (PLS) for handling collinearities among the independent variables X in multiple regression is discussed, and successive estimates are obtained using the residuals from previous rank as a new dependent variable y.
Abstract: The use of partial least squares (PLS) for handling collinearities among the independent variables X in multiple regression is discussed. Consecutive estimates $({\text{rank }}1,2,\cdots )$ are obtained using the residuals from previous rank as a new dependent variable y. The PLS method is equivalent to the conjugate gradient method used in Numerical Analysis for related problems.To estimate the “optimal” rank, cross validation is used. Jackknife estimates of the standard errors are thereby obtained with no extra computation.The PLS method is compared with ridge regression and principal components regression on a chemical example of modelling the relation between the measured biological activity and variables describing the chemical structure of a set of substituted phenethylamines.

2,290 citations

Journal ArticleDOI

1,602 citations


"Partial least-squares regression: a..." refers background in this paper

  • ...Score vectors corresponding to small eigenvalues can be left out in order to avoid collinearity problems from influencing the solution [ 9 ] ....

    [...]

Journal ArticleDOI
TL;DR: In this paper, partial least-squares analysis in latent variables has been used for the analysis of mixture components with low spectral selectivity, namely, in the ultraviolet, visible, and infrared spectral range.
Abstract: Quantitative spectrometric analysis of mixture components is featured for systems with low spectral selectivity, namely, in the ultraviolet, visible, and infrared spectral range. Limitations imposed by data reduction schemes based on ordinary multiple regression are shown to be overcome by means of partial least-squares analysis in latent variables. The influences of variables such as noise, band separation band intensity ratios, number of wavelengths, number of components, number of calibration mixtures, time drift, or deviations from Beer's law on the analytical result has been evaluated under a wide range of conditions providing a basis to search for new systems applicable to spectrophotometric multicomponent analysis. The practical utility of the method is demonstrated for simultaneous analysis of copper, nickel, cobalt, iron, and palladium down to 2 X 10/sup -6/ M concentrations by use of their diethyldithiocarbamate chelate complexes with relative errors less than 6%. 26 references, 4 figures, 6 tables.

268 citations