scispace - formally typeset
Search or ask a question
Author

Richard G. Brereton

Bio: Richard G. Brereton is an academic researcher from University of Bristol. The author has contributed to research in topics: Chemometrics & Principal component analysis. The author has an hindex of 41, co-authored 237 publications receiving 10827 citations. Previous affiliations of Richard G. Brereton include University of Cambridge & Austrian Academy of Sciences.


Papers
More filters
Journal ArticleDOI
25 Jan 2010-Analyst
TL;DR: The increasing interest in Support Vector Machines (SVMs) over the past 15 years is described, including its application to multivariate calibration, and why it is useful when there are outliers and non-linearities.
Abstract: The increasing interest in Support Vector Machines (SVMs) over the past 15 years is described. Methods are illustrated using simulated case studies, and 4 experimental case studies, namely mass spectrometry for studying pollution, near infrared analysis of food, thermal analysis of polymers and UV/visible spectroscopy of polyaromatic hydrocarbons. The basis of SVMs as two-class classifiers is shown with extensive visualisation, including learning machines, kernels and penalty functions. The influence of the penalty error and radial basis function radius on the model is illustrated. Multiclass implementations including one vs. all, one vs. one, fuzzy rules and Directed Acyclic Graph (DAG) trees are described. One-class Support Vector Domain Description (SVDD) is described and contrasted to conventional two- or multi-class classifiers. The use of Support Vector Regression (SVR) is illustrated including its application to multivariate calibration, and why it is useful when there are outliers and non-linearities.

1,899 citations

Book
12 Mar 2003
TL;DR: The concept and need for Principal Components Analysis, a method forsupervised Pattern Recognition: Cluster Analysis, and its application in Chemistry are explained.
Abstract: Preface. Supplementary Information. Acknowledgements. 1. INTRODUCTION. Points of View. Software and Calculations. Further Reading. References. 2. EXPERIMENTAL DESIGN. Introduction. Basic Principles. Factorial Designs. Central Composite or Response Surface Designs. Mixture Designs. Simplex Optimisation. Problems. 3. SIGNAL PROCESSING. Sequential Signals in Chemistry. Basics. Linear Filters. Correlograms and Time Series Analysis. Fourier Transform Techniques. Topical Methods. Problems. 4. PATTERN RECOGNITION. Introduction. The Concept and Need for Principal Components Analysis. Principal Components Analysis: the Method. Unsupervised Pattern Recognition: Cluster Analysis. Supervised Pattern Recognition. Multiway Pattern Recognition. Problems. 5. CALIBRATION. Introduction. Univariate Calibration. Multiple Linear Regression. Principal Components Regression. Partial Least Squares. Model Validation. Problems. 6. EVOLUTIONARY SIGNALS. Introduction. Exploratory Data Analysis and Preprocessing. Determining Composition. Resolution. Problems. Appendices A.1 Vectors and Matrices. A.2 Algorithms. A.3 Basic Statistical Concepts. A.4 Excel for Chemometrics. A.5 Matlab for Chemometrics. Index

1,411 citations

Journal ArticleDOI
TL;DR: Partial least squares discriminant analysis (PLS-DA) has been available for nearly 20 years yet is poorly understood by most users as mentioned in this paper, however, despite these limitations, PLS-DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualization of significant variables such as metabolites or spectroscopic peaks.
Abstract: Partial least squares discriminant analysis (PLS-DA) has been available for nearly 20 years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS-DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS-DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS-DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright © 2014 John Wiley & Sons, Ltd.

578 citations

Book
02 Apr 2007
TL;DR: This book focuses on the development of Chemometrics through the application of unsupervised pattern recognition to the study of Spectroscopy and its applications in medicine and science.
Abstract: Preface. 1 Introduction. 1.1 Development of Chemometrics. 1.2 Application Areas. 1.3 How to Use this Book. 1.4 Literature and Other Sources of Information. References. 2 Experimental Design. 2.1 Why Design Experiments in Chemistry? 2.2 Degrees of Freedom and Sources of Error. 2.3 Analysis of Variance and Interpretation of Errors. 2.4 Matrices, Vectors and the Pseudoinverse. 2.5 Design Matrices. 2.6 Factorial Designs. 2.7 An Example of a Factorial Design. 2.8 Fractional Factorial Designs. 2.9 Plackett-Burman and Taguchi Designs. 2.10 The Application of a Plackett-Burman Design to the Screening of Factors Influencing a Chemical Reaction. 2.11 Central Composite Designs. 2.12 Mixture Designs. 2.13 A Four Component Mixture Design Used to Study Blending of Olive Oils. 2.14 Simplex Optimization. 2.15 Leverage and Confidence in Models. 2.16 Designs for Multivariate Calibration. References. 3 Statistical Concepts. 3.1 Statistics for Chemists. 3.2 Errors. 3.3 Describing Data. 3.4 The Normal Distribution. 3.5 Is a Distribution Normal? 3.6 Hypothesis Tests. 3.7 Comparison of Means: the t-Test. 3.8 F-Test for Comparison of Variances. 3.9 Confidence in Linear Regression. 3.10 More about Confidence. 3.11 Consequences of Outliers and How to Deal with Them. 3.12 Detection of Outliers. 3.13 Shewhart Charts. 3.14 More about Control Charts. References. 4 Sequential Methods. 4.1 Sequential Data. 4.2 Correlograms. 4.3 Linear Smoothing Functions and Filters. 4.4 Fourier Transforms. 4.5 Maximum Entropy and Bayesian Methods. 4.6 Fourier Filters. 4.7 Peakshapes in Chromatography and Spectroscopy. 4.8 Derivatives in Spectroscopy and Chromatography. 4.9 Wavelets. References. 5 Pattern Recognition. 5.1 Introduction. 5.2 Principal Components Analysis. 5.3 Graphical Representation of Scores and Loadings. 5.4 Comparing Multivariate Patterns. 5.5 Preprocessing. 5.6 Unsupervised Pattern Recognition: Cluster Analysis. 5.7 Supervised Pattern Recognition. 5.8 Statistical Classification Techniques. 5.9 K Nearest Neighbour Method. 5.10 How Many Components Characterize a Dataset? 5.11 Multiway Pattern Recognition. References. 6 Calibration. 6.1 Introduction. 6.2 Univariate Calibration. 6.3 Multivariate Calibration and the Spectroscopy of Mixtures. 6.4 Multiple Linear Regression. 6.5 Principal Components Regression. 6.6 Partial Least Squares. 6.7 How Good is the Calibration and What is the Most Appropriate Model? 6.8 Multiway Calibration. References. 7 Coupled Chromatography. 7.1 Introduction. 7.2 Preparing the Data. 7.3 Chemical Composition of Sequential Data. 7.4 Univariate Purity Curves. 7.5 Similarity Based Methods. 7.6 Evolving and Window Factor Analysis. 7.7 Derivative Based Methods. 7.8 Deconvolution of Evolutionary Signals. 7.9 Noniterative Methods for Resolution. 7.10 Iterative Methods for Resolution. 8 Equilibria, Reactions and Process Analytics. 8.1 The Study of Equilibria using Spectroscopy. 8.2 Spectroscopic Monitoring of Reactions. 8.3 Kinetics and Multivariate Models for the Quantitative Study of Reactions 8.4 Developments in the Analysis of Reactions using On-line Spectroscopy. 8.5 The Process Analytical Technology Initiative. References. 9 Improving Yields and Processes Using Experimental Designs. 9.1 Introduction. 9.2 Use of Statistical Designs for Improving the Performance of Synthetic Reactions. 9.3 Screening for Factors that Influence the Performance of a Reaction. 9.4 Optimizing the Process Variables. 9.5 Handling Mixture Variables using Simplex Designs. 9.6 More about Mixture Variables. 10 Biological and Medical Applications of Chemometrics. 10.1 Introduction. 10.2 Taxonomy. 10.3 Discrimination. 10.4 Mahalanobis Distance. 10.5 Bayesian Methods and Contingency Tables. 10.6 Support Vector Machines. 10.7 Discriminant Partial Least Squares. 10.8 Micro-organisms. 10.9 Medical Diagnosis using Spectroscopy. 10.10 Metabolomics using Coupled Chromatography and Nuclear Magnetic Resonance. References. 11 Biological Macromolecules. 11.1 Introduction. 11.2 Sequence Alignment and Scoring Matches. 11.3 Sequence Similarity. 11.4 Tree Diagrams. 11.5 Phylogenetic Trees. References. 12 Multivariate Image Analysis. 12.1 Introduction. 12.2 Scaling Images. 12.3 Filtering and Smoothing the Image. 12.4 Principal Components for the Enhancement of Images. 12.5 Regression of Images. 12.6 Alternating Least Squares as Employed in Image Analysis. 12.7 Multiway Methods In Image Analysis. References. 13 Food. 13.1 Introduction. 13.2 How to Determine the Origin of a Food Product using Chromatography. 13.3 Near Infrared Spectroscopy. 13.4 Other Information. 13.5 Sensory Analysis: Linking Composition to Properties. 13.6 Varimax Rotation. 13.7 Calibrating Sensory Descriptors to Composition. References. Index.

496 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The asynchronous pipeline scheme provides other substantial advantages, including high flexibility, favorable processing speeds, choice of both all-in-memory and disk-bound processing, easy adaptation to different data formats, simpler software development and maintenance, and the ability to distribute processing tasks on multi-CPU computers and computer networks.
Abstract: The NMRPipe system is a UNIX software environment of processing, graphics, and analysis tools designed to meet current routine and research-oriented multidimensional processing requirements, and to anticipate and accommodate future demands and developments. The system is based on UNIX pipes, which allow programs running simultaneously to exchange streams of data under user control. In an NMRPipe processing scheme, a stream of spectral data flows through a pipeline of processing programs, each of which performs one component of the overall scheme, such as Fourier transformation or linear prediction. Complete multidimensional processing schemes are constructed as simple UNIX shell scripts. The processing modules themselves maintain and exploit accurate records of data sizes, detection modes, and calibration information in all dimensions, so that schemes can be constructed without the need to explicitly define or anticipate data sizes or storage details of real and imaginary channels during processing. The asynchronous pipeline scheme provides other substantial advantages, including high flexibility, favorable processing speeds, choice of both all-in-memory and disk-bound processing, easy adaptation to different data formats, simpler software development and maintenance, and the ability to distribute processing tasks on multi-CPU computers and computer networks.

13,804 citations

Journal ArticleDOI
TL;DR: It was found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection and the value of GLM in combination with penalised methods and thresholds when omitted variables are considered in the final interpretation.
Abstract: Collinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine-learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold-based pre-selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the ‘folk lore’-thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre-analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.

6,199 citations

Journal ArticleDOI
01 May 1981
TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.
Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

4,948 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations