Theoretical and Empirical Analysis of ReliefF and RReliefF

doi:10.1023/A:1025667309714

Home
/
Papers
/
Theoretical and Empirical Analysis of ReliefF and RReliefF

Journal Article•DOI•

Theoretical and Empirical Analysis of ReliefF and RReliefF

Marko Robnik-Šikonja¹, Igor Kononenko¹•Institutions (1)

University of Ljubljana¹

01 Oct 2003-Machine Learning (Kluwer Academic Publishers)-Vol. 53, Iss: 1, pp 23-69

TL;DR: How and why Relief algorithms work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them.

read less

Abstract: Relief algorithms are general and successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation in regression and classification. In addition, their quality estimates have a natural interpretation. While they have commonly been viewed as feature subset selection methods that are applied in prepossessing step before a model is learned, they have actually been used successfully in a variety of settings, e.g., to select splits or to guide constructive induction in the building phase of decision or regression tree learning, as the attribute weighting method and also in the inductive logic programming. A broad spectrum of successful uses calls for especially careful investigation of various features Relief algorithms have. In this paper we theoretically and empirically investigate and discuss how and why they work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•

Efficient Feature Selection via Analysis of Relevance and Redundancy

[...]

Lei Yu¹, Huan Liu²•Institutions (2)

Arizona State University¹, Biodesign Institute²

01 Dec 2004-Journal of Machine Learning Research

TL;DR: It is shown that feature relevance alone is insufficient for efficient feature selection of high-dimensional data, and a new framework is introduced that decouples relevance analysis and redundancy analysis.

...read moreread less

Abstract: Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

...read moreread less

1,971 citations

Cites background from "Theoretical and Empirical Analysis ..."

...One algorithm, from individual evaluation, is ReliefF (Robnik-Sikonja and Kononenko, 2003) which searches for nearest neighbors of instances of different classes and weights features according to how well they differentiate instances of different classes....
[...]
...We use three synthetic data sets to illustrate the strengthes and limitations of FCBF and compare it with ReliefF, CFS-SF, and FOCUS-SF....
[...]
...Comparison between FCBF(0) and ReliefF shows that ReliefF is unexpectedly slow even though its time complexity is linear to dimensionality....
[...]
...For each data set, we conduct Student’s paired two-tailed t-Test in order to evaluate the statistical significance of the difference between two averaged accuracy values: on resulted from FCBF(log) and the other resulted from one of FCBF(0), the full set, ReliefF, CFS-SF, and FOCUS-SF....
[...]
...For ReliefF, we use 5 neighbors and 30 instances throughout the experiments as suggested by Robnik-Siko ja and Kononenko (2003)....
[...]

Journal Article•DOI•

Feature Selection: A Data Perspective

[...]

Jundong Li¹, Kewei Cheng¹, Suhang Wang¹, Fred Morstatter¹, Robert P. Trevino¹, Jiliang Tang², Huan Liu¹ - Show less +3 more•Institutions (2)

Arizona State University¹, Michigan State University²

06 Dec 2017-ACM Computing Surveys

TL;DR: This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based.

...read moreread less

Abstract: Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data-mining and machine-learning problems. The objectives of feature selection include building simpler and more comprehensible models, improving data-mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity-based, information-theoretical-based, sparse-learning-based, and statistical-based methods. To facilitate and promote the research in this community, we also present an open source feature selection repository that consists of most of the popular feature selection algorithms (http://featureselection.asu.edu/). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research.

...read moreread less

1,566 citations

Cites background from "Theoretical and Empirical Analysis ..."

...Some representative criteria include feature discriminative ability to separate samples (Kira and Rendell 1992; Robnik-Šikonja and Kononenko 2003; Yang et al. 2011; Du et al. 2013; Tang et al. 2014), feature correlation (Koller and Sahami 1995; Guyon and Elisseeff 2003), mutual information (Yu and Liu 2003; Peng et al. 2005; Nguyen et al. 2014; Shishkin et al. 2016; Gao et al. 2016), feature ability to preserve data manifold structure (He et al. 2005; Zhao and Liu 2007; Gu et al. 2011b; Jiang and Ren 2011), and feature ability to reconstruct the original data (Masaeli et al. 2010; Farahat et al. 2011; Li et al. 2017a)....
[...]
...ReliefF (Robnik-Šikonja and Kononenko 2003) selects features to separate instances from different classes....
[...]
...Some representative criteria include feature discriminative ability to separate samples (Kira and Rendell 1992; Robnik-Šikonja and Kononenko 2003; Yang et al. 2011; Du et al. 2013; Tang et al. 2014), feature correlation (Koller and Sahami 1995; Guyon and Elisseeff 2003), mutual information (Yu and…...
[...]

Journal Article•DOI•

Detecting gene-gene interactions that underlie human diseases

[...]

Heather J. Cordell¹•Institutions (1)

Centre for Life¹

01 Jun 2009-Nature Reviews Genetics

TL;DR: A critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease is provided.

...read moreread less

Abstract: Following the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, interest is now focusing on the detection of effects that, owing to their interaction with other genetic or environmental factors, might not be identified by using standard single-locus tests. In addition to increasing the power to detect associations, it is hoped that detecting interactions between loci will allow us to elucidate the biological and biochemical pathways that underpin disease. Here I provide a critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biological relevance of statistical interactions.

...read moreread less

1,353 citations

Journal Article•DOI•

Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans

[...]

Heather J. Cordell¹•Institutions (1)

University of Cambridge¹

01 Oct 2002-Human Molecular Genetics

TL;DR: It is noted that the degree to which statistical tests of epistasis can elucidate underlying biological interactions may be more limited than previously assumed.

...read moreread less

Abstract: Epistasis, the interaction between genes, is a topic of current interest in molecular and quantitative genetics. A large amount of research has been devoted to the detection and investigation of epistatic interactions. However, there has been much confusion in the literature over definitions and interpretations of epistasis. In this review, we provide a historical background to the study of epistatic interaction effects and point out the differences between a number of commonly used definitions of epistasis. A brief survey of some methods for detecting epistasis in humans is given. We note that the degree to which statistical tests of epistasis can elucidate underlying biological interactions may be more limited than previously assumed.

...read moreread less

1,056 citations

Proceedings Article•DOI•

Spectral feature selection for supervised and unsupervised learning

[...]

Zheng Zhao¹, Huan Liu¹•Institutions (1)

Arizona State University¹

20 Jun 2007

TL;DR: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory, and shows that existing powerful algorithms such as ReliefF and Laplacian Score are special cases of the proposed framework.

...read moreread less

Abstract: Feature selection aims to reduce dimensionality for building comprehensible learning models with good generalization performance. Feature selection algorithms are largely studied separately according to the type of learning: supervised or unsupervised. This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. The proposed framework is able to generate families of algorithms for both supervised and unsupervised feature selection. And we show that existing powerful algorithms such as ReliefF (supervised) and Laplacian Score (unsupervised) are special cases of the proposed framework. To the best of our knowledge, this work is the first attempt to unify supervised and unsupervised feature selection, and enable their joint study under a general framework. Experiments demonstrated the efficacy of the novel algorithms derived from the framework.

...read moreread less

857 citations

Cites methods from "Theoretical and Empirical Analysis ..."

...We show that two powerful feature selection algorithms, ReliefF ( Robnik-Sikonja & Kononenko, 2003 ) and Laplacian Score (He et al., 2005) are special cases of the proposed framework....
[...]
...We show that two powerful feature selection algorithms, ReliefF (Robnik-Sikonja & Kononenko, 2003) and Laplacian Score (He et al., 2005) are special cases of the proposed framework....
[...]
...Supervised feature selection algorithm ReliefF ( Robnik-Sikonja & Kononenko, 2003 ) is a special case of SPEC by setting b ’(¢) = b ’1(¢), ∞(L) = L and deflning W as:...
[...]
...Supervised feature selection algorithm ReliefF (Robnik-Sikonja & Kononenko, 2003) is a special case of SPEC by setting .f(·) = .f1(·), ....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Classification and Regression Trees.

[...]

John Van Ryzin, Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone - Show less +1 more

01 Mar 1986-Journal of the American Statistical Association

21,694 citations

Book•

C4.5: Programs for Machine Learning

[...]

J. Ross Quinlan¹•Institutions (1)

University of Sydney¹

15 Oct 1992

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

...read moreread less

21,674 citations

"Theoretical and Empirical Analysis ..." refers background or methods in this paper

...5 (Quinlan, 1993)) and for regression it is the mean squared error (MSE) of average prediction value (used in e....
[...]
..., 1984) or Gain ratio (Quinlan, 1993) in classification and mean squared error (Breiman et al....
[...]

Journal Article•DOI•

Induction of Decision Trees

[...]

J. R. Quinlan

25 Mar 1986-Machine Learning

TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.

...read moreread less

Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

...read moreread less

17,177 citations

Book•

Classification and regression trees

[...]

Leo Breiman

01 Jan 1983

TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

...read moreread less

Abstract: The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

...read moreread less

14,825 citations

UCI Repository of machine learning databases

[...]

Catherine Blake

01 Jan 1998

12,940 citations