Performance of some variable selection methods when multicollinearity is present

doi:10.1016/J.CHEMOLAB.2004.12.011

Home
/
Papers
/
Performance of some variable selection methods when multicollinearity is present

Journal Article•DOI•

Performance of some variable selection methods when multicollinearity is present

Il-Gyo Chong¹, Chi-Hyuck Jun¹•Institutions (1)

Pohang University of Science and Technology¹

28 Jul 2005-Chemometrics and Intelligent Laboratory Systems (Elsevier)-Vol. 78, Iss: 1, pp 103-112

TL;DR: The nature of the VIP method is explored and it is compared with other methods through computer simulation experiments considering four factors–the proportion of the number of relevant predictor, the magnitude of correlations between predictors, the structure of regression coefficients, andThe magnitude of signal to noise.

read less

About: This article is published in Chemometrics and Intelligent Laboratory Systems.The article was published on 2005-07-28. It has received 1595 citations till now. The article focuses on the topics: Multicollinearity & Partial least squares regression.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Macroscopic mass and energy balance of a pilot plant anaerobic bioreactor operated under thermophilic conditions.

[...]

Teodoro Espinosa-Solares, John Bombardiere¹, Mark Chatfield¹, Max Domaschko¹, Michael Easter¹, David A. Stafford¹, Saul Castillo-Angeles, Nehemias Castellanos-Hernandez - Show less +4 more•Institutions (1)

West Virginia State University¹

01 Jun 2006-Applied Biochemistry and Biotechnology

TL;DR: Results suggest some changes to the pilot plant configuration are necessary to reduce power consumption although maximizing biodigester performance, and a modification of the typical continuous stirred tank reactor is a promising process being relatively stable and owing to its capability to manage considerable amounts of residuals at low operational cost.

...read moreread less

Abstract: Intensive poultry production generates over 100,000 t of litter annually in West Virginia and 9×106 t nationwide. Current available technological alternatives based on thermophilic anaerobic digestion for residuals treatment are diverse. A modification of the typical continuous stirred tank reactor is a promising process being relatively stable and owing to its capability to manage considerable amounts of residuals at low operational cost. A 40-m3 pilot plant digester was used for performance evaluation considering energy input and methane production. Results suggest some changes to the pilot plant configuration are necessary to reduce power consumption although maximizing biodigester performance.

...read moreread less

1,287 citations

Journal Article•DOI•

A review of variable selection methods in Partial Least Squares Regression

[...]

Tahir Mehmood¹, Kristian Hovde Liland¹, Lars Snipen¹, Solve Sæbø¹•Institutions (1)

Norwegian University of Life Sciences¹

15 Aug 2012-Chemometrics and Intelligent Laboratory Systems

TL;DR: A review of available methods for variable selection within one of the many modeling approaches for high-throughput data, Partial Least Squares Regression, to get an understanding of the characteristics of the methods and to get a basis for selecting an appropriate method for own use.

...read moreread less

1,180 citations

Cites background from "Performance of some variable select..."

...It is generally accepted that a variable should be selected if vj>1, [27–29], but a proper threshold between 0....
[...]
...21 can yield more relevant variables according to [28]....
[...]

Journal Article•DOI•

Using data mining to model and interpret soil diffuse reflectance spectra.

[...]

R. A. Viscarra Rossel¹, Thorsten Behrens²•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, University of Tübingen²

15 Aug 2010-Geoderma

TL;DR: In this article, the root mean square error (RMSE) and the Akaike Information Criterion (AIC) were used to compare different data mining algorithms for modelling soil visible-near infrared (vis-NIR) diffuse reflectance spectra and to assess the interpretability of the results.

...read moreread less

928 citations

Journal Article•DOI•

Variable selection in regression—a tutorial

[...]

C. M. Andersen¹, Rasmus Bro¹•Institutions (1)

University of Copenhagen¹

01 Nov 2010-Journal of Chemometrics

TL;DR: The emphasis in this paper is on how to use variable selection in practice and avoid the most common pitfalls.

...read moreread less

Abstract: This paper provides a practical guide to variable selection in chemometrics with a focus on regression-based calibration models. Several approaches, such as genetic algorithms (GAs), jack-knifing, forward selection, etc., are explained; it is also explained how to choose between different kinds of variable selection methods. The emphasis in this paper is on how to use variable selection in practice and avoid the most common pitfalls. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

580 citations

Journal Article•DOI•

Plankton networks driving carbon export in the oligotrophic ocean.

[...]

Lionel Guidi, Samuel Chaffron, Lucie Bittner¹, Damien Eveillard, Abdelhalim Larhlimi, Simon Roux, Youssef Darzi, Stéphane Audic, Léo Berline, Jennifer R. Brum, Luis Pedro Coelho, Julio Cesar Ignacio Espinoza, Shruti Malviya, Shinichi Sunagawa, Céline Dimier, Stefanie Kandels-Lewis, Marc Picheral, Julie Poulain, Sarah Searson, Lars Stemmann, Fabrice Not, Pascal Hingamp, Sabrina Speich, Mick Follows², Lee Karp-Boss, Emmanuel Boss, Hiroyuki Ogata, Stephane Pesant, Jean Weissenbach, Patrick Wincker, Silvia G. Acinas, Peer Bork, Daniele Iudicone³, Matthew B. Sullivan, Jeroen Raes, Eric Karsenti¹, Chris Bowler¹, Gabriel Gorsky - Show less +34 more•Institutions (3)

École Normale Supérieure¹, Massachusetts Institute of Technology², Stazione Zoologica Anton Dohrn³

28 Apr 2016-Nature

TL;DR: It is shown that specific plankton communities, from the surface and deep chlorophyll maximum, correlate with carbon export at 150 m and that the relative abundance of a few bacterial and viral genes can predict a significant fraction of the variability in carbon export in these regions.

...read moreread less

Abstract: The biological carbon pump is the process by which CO2 is transformed to organic carbon via photosynthesis, exported through sinking particles, and finally sequestered in the deep ocean. While the intensity of the pump correlates with plankton community composition, the underlying ecosystem structure driving the process remains largely uncharacterized. Here we use environmental and metagenomic data gathered during the Tara Oceans expedition to improve our understanding of carbon export in the oligotrophic ocean. We show that specific plankton communities, from the surface and deep chlorophyll maximum, correlate with carbon export at 150 m and highlight unexpected taxa such as Radiolaria and alveolate parasites, as well as Synechococcus and their phages, as lineages most strongly associated with carbon export in the subtropical, nutrient-depleted, oligotrophic ocean. Additionally, we show that the relative abundance of a few bacterial and viral genes can predict a significant fraction of the variability in carbon export in these regions.

...read moreread less

556 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

The Elements of Statistical Learning

[...]

Trevor Hastie, Robert Tibshirani, Jerome H. Friedman

01 Jan 2001

19,211 citations

Journal Article•DOI•

The Elements of Statistical Learning

[...]

Eric R. Ziegel

01 Aug 2003-Technometrics

TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.

...read moreread less

Abstract: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research. Chapter 12 concludes the book with some commentary about the scienti c contributions of MTS. The Taguchi method for design of experiment has generated considerable controversy in the statistical community over the past few decades. The MTS/MTGS method seems to lead another source of discussions on the methodology it advocates (Montgomery 2003). As pointed out by Woodall et al. (2003), the MTS/MTGS methods are considered ad hoc in the sense that they have not been developed using any underlying statistical theory. Because the “normal” and “abnormal” groups form the basis of the theory, some sampling restrictions are fundamental to the applications. First, it is essential that the “normal” sample be uniform, unbiased, and/or complete so that a reliable measurement scale is obtained. Second, the selection of “abnormal” samples is crucial to the success of dimensionality reduction when OAs are used. For example, if each abnormal item is really unique in the medical example, then it is unclear how the statistical distance MD can be guaranteed to give a consistent diagnosis measure of severity on a continuous scale when the larger-the-better type S/N ratio is used. Multivariate diagnosis is not new to Technometrics readers and is now becoming increasingly more popular in statistical analysis and data mining for knowledge discovery. As a promising alternative that assumes no underlying data model, The Mahalanobis–Taguchi Strategy does not provide suf cient evidence of gains achieved by using the proposed method over existing tools. Readers may be very interested in a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Overall, although the idea of MTS/MTGS is intriguing, this book would be more valuable had it been written in a rigorous fashion as a technical reference. There is some lack of precision even in several mathematical notations. Perhaps a follow-up with additional theoretical justi cation and careful case studies would answer some of the lingering questions.

...read moreread less

11,507 citations

"Performance of some variable select..." refers methods in this paper

...The number of latent variables for PLS regression, the tuning parameter for the Lasso and the significant levels for stepwise regression are determined by five-fold crossvalidation which is widely used for estimating prediction error [12]....
[...]

Journal Article•DOI•

PLS-regression: a basic tool of chemometrics

[...]

Svante Wold¹, Michael Sjöström¹, Lennart Eriksson¹•Institutions (1)

Umeå University¹

28 Oct 2001-Chemometrics and Intelligent Laboratory Systems

TL;DR: PLS-regression (PLSR) as mentioned in this paper is the PLS approach in its simplest, and in chemistry and technology, most used form (two-block predictive PLS) is a method for relating two data matrices, X and Y, by a linear multivariate model.

...read moreread less

7,861 citations

Journal Article•DOI•

Least angle regression

[...]

Bradley Efron¹, Trevor Hastie¹, Iain M. Johnstone¹, Robert Tibshirani¹, Hemant Ishwaran², Keith Knight³, Jean-Michel Loubes⁴, Jean-Michel Loubes⁵, Pascal Massart⁵, Pascal Massart⁶, David Madigan⁷, David Madigan⁸, Greg Ridgeway⁷, Greg Ridgeway⁹, Saharon Rosset¹, Saharon Rosset¹⁰, Ji Zhu, Robert A. Stine¹¹, Berwin A. Turlach¹², Sanford Weisberg¹³ - Show less +16 more•Institutions (13)

Stanford University¹, Cleveland Clinic², University of Toronto³, Centre national de la recherche scientifique⁴, Université Paris-Saclay⁵, University of Paris-Sud⁶, Rutgers University⁷, Avaya⁸, RAND Corporation⁹, IBM¹⁰, University of Pennsylvania¹¹, University of Western Australia¹², University of Minnesota¹³

01 Apr 2004-Annals of Statistics

TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.

...read moreread less

Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

...read moreread less

7,828 citations

Journal Article•DOI•

Partial least-squares regression: a tutorial

[...]

Paul Geladi¹, Bruce R. Kowalski¹•Institutions (1)

University of Washington¹

01 Jan 1986-Analytica Chimica Acta

TL;DR: In this paper, a tutorial on the Partial Least Squares (PLS) regression method is provided, and an algorithm for a predictive PLS and some practical hints for its use are given.

...read moreread less

6,393 citations