An introduction to variable and feature selection

doi:10.1162/153244303322753616

Home
/
Papers
/
An introduction to variable and feature selection

Journal Article•DOI•

An introduction to variable and feature selection

Isabelle Guyon, André Elisseeff¹•Institutions (1)

Max Planck Society¹

01 Mar 2003-Journal of Machine Learning Research (MIT Press)-Vol. 3, pp 1157-1182

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

read less

Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Robust enumeration of cell subsets from tissue expression profiles

[...]

Aaron M. Newman¹, Chih Long Liu¹, Michael R. Green¹, Andrew J. Gentles¹, Weiguo Feng¹, Yue Xu¹, Chuong D. Hoang¹, Maximilian Diehn¹, Arash Ash Alizadeh¹ - Show less +5 more•Institutions (1)

Stanford University¹

01 May 2015-Nature Methods

TL;DR: CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types when applied to enumeration of hematopoietic subsets in RNA mixtures from fresh, frozen and fixed tissues, including solid tumors.

...read moreread less

Abstract: We introduce CIBERSORT, a method for characterizing cell composition of complex tissues from their gene expression profiles When applied to enumeration of hematopoietic subsets in RNA mixtures from fresh, frozen and fixed tissues, including solid tumors, CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types CIBERSORT should enable large-scale analysis of RNA mixtures for cellular biomarkers and therapeutic targets (http://cibersortstanfordedu/)

...read moreread less

6,967 citations

Journal Article•DOI•

A review of feature selection techniques in bioinformatics

[...]

Yvan Saeys¹, Iñaki Inza¹, Pedro Larrañaga¹•Institutions (1)

University of the Basque Country¹

10 Sep 2007-Bioinformatics

TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.

...read moreread less

Abstract: Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications. Contact: yvan.saeys@psb.ugent.be Supplementary information: http://bioinformatics.psb.ugent.be/supplementary_data/yvsae/fsreview

...read moreread less

4,706 citations

Cites background from "An introduction to variable and fea..."

...As many pattern recognition techniques were originally not designed to cope with large amounts of irrelevant features, combining them with FS techniques has become a necessity in many applications [43, 78, 79]....
[...]
...As many pattern recognition techniques were originally not designed to cope with large amounts of irrelevant features, combining them with FS techniques has become a necessity in many applications (Guyon and Elisseeff, 2003; Liu and Motoda, 1998; Liu and Yu, 2005)....
[...]

Book•

Understanding Machine Learning: From Theory To Algorithms

[...]

Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

Hebrew University of Jerusalem¹, University of Waterloo²

01 Jan 2015

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

...read moreread less

3,857 citations

Book•

Applied Predictive Modeling

[...]

Max Kuhn, Kjell Johnson

17 May 2013

TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.

...read moreread less

Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

...read moreread less

3,672 citations

Cites background from "An introduction to variable and fea..."

...For example, Guyon et al. (2002) demonstrated recursive feature elimination with support vector machine classification models for a well-known colon cancer microarray data set....
[...]

Journal Article•DOI•

A survey on feature selection methods

[...]

Girish Chandrashekar¹, Ferat Sahin¹•Institutions (1)

Rochester Institute of Technology¹

01 Jan 2014-Computers & Electrical Engineering

TL;DR: The objective is to provide a generic introduction to variable elimination which can be applied to a wide array of machine learning problems and focus on Filter, Wrapper and Embedded methods.

...read moreread less

3,517 citations

Cites background or methods from "An introduction to variable and fea..."

...Another method used in literature is to use the weights of a classifier [1,2,50] to rank the feature for their removal....
[...]
...Embedded methods [1,9,10] include variable selection as part of the training process without splitting the data into training and testing sets....
[...]
...One of the simplest criteria is the Pearson correlation coefficient [1,12] defined as:...
[...]
...The focus of feature selection is to select a subset of variables from the input which can efficiently describe the input data while reducing effects from noise or irrelevant variables and still provide good prediction results [1]....
[...]
...relevant variables is addressed in [1] with good examples....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

Statistical learning theory

[...]

Vladimir Vapnik

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

26,531 citations

"An introduction to variable and fea..." refers background or methods in this paper

...The proposal ofRakotomamonjy (2003) is to train non-linear SVMs (Boser et al., 1992, Vapnik, 1998) with a regular training procedure and select features with backward elimination like in RFE (Guyon et al., 2002)....
[...]
...Many other types of penalization of the training error have been proposed in the literatur (see, e.g., Vapnik, 1998, Hastie et al., 2001)....
[...]
...Many authors resort to using the leave-one-out cross-validation procedure, even though it is known to be a high variance estimator of generalization error (Vapnik, 1982) and to give overly optimistic results, particularly when data are not properly independently and identically sampled from the ”true” distribution....
[...]
...This is the case, for instance, for the linear least square model using J = ∑k=1(w · xk + b− yk)(2) and for the linear SVM or optimum margin classifier, which minimizesJ = (1/2)||w||2, under constraints (Vapnik, 1982)....
[...]
...This correspondence is formally established in the paper of Weston et al. (2003) for the particular case of classification with linear predictors f (x) = w ·x+b, in the SVM framework (Boser et al., 1992, Vapnik, 1998)....
[...]

Journal Article•DOI•

Classification and Regression Trees.

[...]

John Van Ryzin, Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone - Show less +1 more

01 Mar 1986-Journal of the American Statistical Association

21,694 citations

Book•

Pattern Classification

[...]

Peter E. Hart, Richard O. Duda, David G. Stork

01 Jan 1973

20,541 citations

Book•

Classification and regression trees

[...]

Leo Breiman

01 Jan 1983

TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

...read moreread less

Abstract: The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

...read moreread less

14,825 citations