# PLS-regression: a basic tool of chemometrics

Abstract: PLS-regression (PLSR) is the PLS approach in its simplest, and in chemistry and technology, most used form (two-block predictive PLS). PLSR is a method for relating two data matrices, X and Y, by a linear multivariate model, but goes beyond traditional regression in that it models also the structure of X and Y. PLSR derives its usefulness from its ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. PLSR has the desirable property that the precision of the model parameters improves with the increasing number of relevant variables and observations.This article reviews PLSR as it has developed to become a standard tool in chemometrics and used in chemistry and engineering. The underlying model and its assumptions are discussed, and commonly used diagnostics are reviewed together with the interpretation of resulting parameters.Two examples are used as illustrations: First, a Quantitative Structure-Activity Relationship (QSAR)/Quantitative Structure-Property Relationship (QSPR) data set of peptides is used to outline how to develop, interpret and refine a PLSR model. Second, a data set from the manufacturing of recycled paper is analyzed to illustrate time series modelling of process data by means of PLSR and time-lagged X-variables.

Topics: Partial least squares regression (54%)

##### Citations

More filters

••

TL;DR: The authors conclude that PLS-SEM path modeling, if appropriately applied, is indeed a "silver bullet" for estimating causal models in many theoretical models and empirical data situations.

Abstract: Structural equation modeling (SEM) has become a quasi-standard in marketing and management research when it comes to analyzing the cause-effect relations between latent constructs. For most researchers, SEM is equivalent to carrying out covariance-based SEM (CB-SEM). While marketing researchers have a basic understanding of CB-SEM, most of them are only barely familiar with the other useful approach to SEM-partial least squares SEM (PLS-SEM). The current paper reviews PLS-SEM and its algorithm, and provides an overview of when it can be most appropriately applied, indicating its potential and limitations for future research. The authors conclude that PLS-SEM path modeling, if appropriately applied, is indeed a "silver bullet" for estimating causal models in many theoretical models and empirical data situations.

9,205 citations

••

Abstract: An overview is given of near infrared (NIR) spectroscopy for use in measuring quality attributes of horticultural produce. Different spectrophotometer designs and measurement principles are compared, and novel techniques, such as time and spatially resolved spectroscopy for the estimation of light absorption and scattering properties of vegetable tissue, as well as NIR multi- and hyperspectral imaging techniques are reviewed. Special attention is paid to recent developments in portable systems. Chemometrics is an essential part of NIR spectroscopy, and the available preprocessing and regression techniques, including nonlinear ones, such as kernel-based methods, are discussed. Robustness issues due to orchard and species effects and fluctuating temperatures are addressed. The problem of calibration transfer from one spectrophotometer to another is introduced, as well as techniques for calibration transfer. Most applications of NIR spectroscopy have focussed on the nondestructive measurement of soluble solids content of fruit where typically a root mean square error of prediction of 1° Brix can be achieved, but also other applications involving texture, dry matter, acidity or disorders of fruit and vegetables have been reported. Areas where more research is required are identified.

1,535 citations

••

TL;DR: The nature of the VIP method is explored and it is compared with other methods through computer simulation experiments considering four factors–the proportion of the number of relevant predictor, the magnitude of correlations between predictors, the structure of regression coefficients, andThe magnitude of signal to noise.

Abstract: Variable selection is one of the important practical issues for many scientific engineers. Although the PLS (partial least squares) regression combined with the VIP (variable importance in the projection) scores is often used when the multicollinearity is present among variables, there are few guidelines about its uses as well as its performance. The purpose of this paper is to explore the nature of the VIP method and to compare with other methods through computer simulation experiments. We design 108 experiments where observations are generated from true models considering four factors–the proportion of the number of relevant predictors, the magnitude of correlations between predictors, the structure of regression coefficients, and the magnitude of signal to noise. Confusion matrix is adopted to evaluate the performance of PLS, the Lasso, and stepwise method. We also discuss the proper cutoff value of the VIP method to increase its performance. Some practical hints for the use of the VIP method are given as simulation results.

1,360 citations

••

Teodoro Espinosa-Solares, John Bombardiere

^{1}, Mark Chatfield^{1}, Max Domaschko^{1}+4 more•Institutions (1)TL;DR: Results suggest some changes to the pilot plant configuration are necessary to reduce power consumption although maximizing biodigester performance, and a modification of the typical continuous stirred tank reactor is a promising process being relatively stable and owing to its capability to manage considerable amounts of residuals at low operational cost.

Abstract: Intensive poultry production generates over 100,000 t of litter annually in West Virginia and 9×106 t nationwide. Current available technological alternatives based on thermophilic anaerobic digestion for residuals treatment are diverse. A modification of the typical continuous stirred tank reactor is a promising process being relatively stable and owing to its capability to manage considerable amounts of residuals at low operational cost. A 40-m3 pilot plant digester was used for performance evaluation considering energy input and methane production. Results suggest some changes to the pilot plant configuration are necessary to reduce power consumption although maximizing biodigester performance.

1,287 citations

••

TL;DR: Characteristics of the process industry data which are critical for the development of data-driven Soft Sensors are discussed.

Abstract: In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work.

1,172 citations

### Cites methods from "PLS-regression: a basic tool of che..."

...…to data-driven Soft Sensors are the Principle Component Analysis (Jolliffe, 2002) in a combination with a regression model, Partial Least Squares (Wold et al., 2001), Artificial Neural Networks (Bishop, 1995; Principe et al., 2000; Hastie et al., 2001), Neuro-Fuzzy Systems (Jang et al., 1997;…...

[...]

...One way is by transforming the input variables into a new reduced space with less co-linearity as it is done in the case of the PCA (Jolliffe, 2002) and PLS (Wold et al., 2001; Abdi, 2003)....

[...]

##### References

More filters

•

08 Jul 1980-

Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

6,437 citations

•

13 Mar 1991-

Abstract: Preface.Introduction.1. Getting Started.2. PCA with More Than Two Variables.3. Scaling of Data.4. Inferential Procedures.5. Putting It All Together-Hearing Loss I.6. Operations with Group Data.7. Vector Interpretation I : Simplifications and Inferential Techniques.8. Vector Interpretation II: Rotation.9. A Case History-Hearing Loss II.10. Singular Value Decomposition: Multidimensional Scaling I.11. Distance Models: Multidimensional Scaling II.12. Linear Models I : Regression PCA of Predictor Variables.13. Linear Models II: Analysis of Variance PCA of Response Variables.14. Other Applications of PCA.15. Flatland: Special Procedures for Two Dimensions.16. Odds and Ends.17. What is Factor Analysis Anyhow?18. Other Competitors.Conclusion.Appendix A. Matrix Properties.Appendix B. Matrix Algebra Associated with Principal Component Analysis.Appendix C. Computational Methods.Appendix D. A Directory of Symbols and Definitions for PCA.Appendix E. Some Classic Examples.Appendix F. Data Sets Used in This Book.Appendix G. Tables.Bibliography.Author Index.Subject Index.

3,534 citations

••

Abstract: This is an invited expository article for The American Statistician. It reviews the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule. The presentation is written at a relaxed mathematical level, omitting most proofs, regularity conditions, and technical details.

3,060 citations