scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Theoretical and Empirical Analysis of ReliefF and RReliefF

01 Oct 2003-Machine Learning (Kluwer Academic Publishers)-Vol. 53, Iss: 1, pp 23-69
TL;DR: How and why Relief algorithms work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them.
Abstract: Relief algorithms are general and successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation in regression and classification. In addition, their quality estimates have a natural interpretation. While they have commonly been viewed as feature subset selection methods that are applied in prepossessing step before a model is learned, they have actually been used successfully in a variety of settings, e.g., to select splits or to guide constructive induction in the building phase of decision or regression tree learning, as the attribute weighting method and also in the inductive logic programming. A broad spectrum of successful uses calls for especially careful investigation of various features Relief algorithms have. In this paper we theoretically and empirically investigate and discuss how and why they work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes influence their output and how different metrics influences them.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article
TL;DR: It is shown that feature relevance alone is insufficient for efficient feature selection of high-dimensional data, and a new framework is introduced that decouples relevance analysis and redundancy analysis.
Abstract: Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

1,971 citations


Cites background from "Theoretical and Empirical Analysis ..."

  • ...One algorithm, from individual evaluation, is ReliefF (Robnik-Sikonja and Kononenko, 2003) which searches for nearest neighbors of instances of different classes and weights features according to how well they differentiate instances of different classes....

    [...]

  • ...We use three synthetic data sets to illustrate the strengthes and limitations of FCBF and compare it with ReliefF, CFS-SF, and FOCUS-SF....

    [...]

  • ...Comparison between FCBF(0) and ReliefF shows that ReliefF is unexpectedly slow even though its time complexity is linear to dimensionality....

    [...]

  • ...For each data set, we conduct Student’s paired two-tailed t-Test in order to evaluate the statistical significance of the difference between two averaged accuracy values: on resulted from FCBF(log) and the other resulted from one of FCBF(0), the full set, ReliefF, CFS-SF, and FOCUS-SF....

    [...]

  • ...For ReliefF, we use 5 neighbors and 30 instances throughout the experiments as suggested by Robnik-Siko ja and Kononenko (2003)....

    [...]

Journal ArticleDOI
TL;DR: This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based.
Abstract: Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data-mining and machine-learning problems. The objectives of feature selection include building simpler and more comprehensible models, improving data-mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity-based, information-theoretical-based, sparse-learning-based, and statistical-based methods. To facilitate and promote the research in this community, we also present an open source feature selection repository that consists of most of the popular feature selection algorithms (http://featureselection.asu.edu/). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research.

1,566 citations


Cites background from "Theoretical and Empirical Analysis ..."

  • ...Some representative criteria include feature discriminative ability to separate samples (Kira and Rendell 1992; Robnik-Šikonja and Kononenko 2003; Yang et al. 2011; Du et al. 2013; Tang et al. 2014), feature correlation (Koller and Sahami 1995; Guyon and Elisseeff 2003), mutual information (Yu and Liu 2003; Peng et al. 2005; Nguyen et al. 2014; Shishkin et al. 2016; Gao et al. 2016), feature ability to preserve data manifold structure (He et al. 2005; Zhao and Liu 2007; Gu et al. 2011b; Jiang and Ren 2011), and feature ability to reconstruct the original data (Masaeli et al. 2010; Farahat et al. 2011; Li et al. 2017a)....

    [...]

  • ...ReliefF (Robnik-Šikonja and Kononenko 2003) selects features to separate instances from different classes....

    [...]

  • ...Some representative criteria include feature discriminative ability to separate samples (Kira and Rendell 1992; Robnik-Šikonja and Kononenko 2003; Yang et al. 2011; Du et al. 2013; Tang et al. 2014), feature correlation (Koller and Sahami 1995; Guyon and Elisseeff 2003), mutual information (Yu and…...

    [...]

Journal ArticleDOI
TL;DR: A critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease is provided.
Abstract: Following the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, interest is now focusing on the detection of effects that, owing to their interaction with other genetic or environmental factors, might not be identified by using standard single-locus tests. In addition to increasing the power to detect associations, it is hoped that detecting interactions between loci will allow us to elucidate the biological and biochemical pathways that underpin disease. Here I provide a critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biological relevance of statistical interactions.

1,353 citations

Journal ArticleDOI
TL;DR: It is noted that the degree to which statistical tests of epistasis can elucidate underlying biological interactions may be more limited than previously assumed.
Abstract: Epistasis, the interaction between genes, is a topic of current interest in molecular and quantitative genetics. A large amount of research has been devoted to the detection and investigation of epistatic interactions. However, there has been much confusion in the literature over definitions and interpretations of epistasis. In this review, we provide a historical background to the study of epistatic interaction effects and point out the differences between a number of commonly used definitions of epistasis. A brief survey of some methods for detecting epistasis in humans is given. We note that the degree to which statistical tests of epistasis can elucidate underlying biological interactions may be more limited than previously assumed.

1,056 citations

Proceedings ArticleDOI
20 Jun 2007
TL;DR: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory, and shows that existing powerful algorithms such as ReliefF and Laplacian Score are special cases of the proposed framework.
Abstract: Feature selection aims to reduce dimensionality for building comprehensible learning models with good generalization performance. Feature selection algorithms are largely studied separately according to the type of learning: supervised or unsupervised. This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. The proposed framework is able to generate families of algorithms for both supervised and unsupervised feature selection. And we show that existing powerful algorithms such as ReliefF (supervised) and Laplacian Score (unsupervised) are special cases of the proposed framework. To the best of our knowledge, this work is the first attempt to unify supervised and unsupervised feature selection, and enable their joint study under a general framework. Experiments demonstrated the efficacy of the novel algorithms derived from the framework.

857 citations


Cites methods from "Theoretical and Empirical Analysis ..."

  • ...We show that two powerful feature selection algorithms, ReliefF ( Robnik-Sikonja & Kononenko, 2003 ) and Laplacian Score (He et al., 2005) are special cases of the proposed framework....

    [...]

  • ...We show that two powerful feature selection algorithms, ReliefF (Robnik-Sikonja & Kononenko, 2003) and Laplacian Score (He et al., 2005) are special cases of the proposed framework....

    [...]

  • ...Supervised feature selection algorithm ReliefF ( Robnik-Sikonja & Kononenko, 2003 ) is a special case of SPEC by setting b ’(¢) = b ’1(¢), ∞(L) = L and deflning W as:...

    [...]

  • ...Supervised fea­ture selection algorithm ReliefF (Robnik-Sikonja & Kononenko, 2003) is a special case of SPEC by set­ting .f(·) = .f1(·), ....

    [...]

References
More filters
Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations


"Theoretical and Empirical Analysis ..." refers background or methods in this paper

  • ...5 (Quinlan, 1993)) and for regression it is the mean squared error (MSE) of average prediction value (used in e....

    [...]

  • ..., 1984) or Gain ratio (Quinlan, 1993) in classification and mean squared error (Breiman et al....

    [...]

Journal ArticleDOI
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

17,177 citations

Book
01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Abstract: The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

14,825 citations