scispace - formally typeset
Open AccessProceedings Article

Learning with many irrelevant features

Reads0
Chats0
TLDR
It is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features, and suggests that training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.
Abstract
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that-- contrary to expectations--these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

read more

Citations
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

Wrappers for feature subset selection

TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.
Book

Evolutionary algorithms for solving multi-objective problems

TL;DR: This paper presents a meta-anatomy of the multi-Criteria Decision Making process, which aims to provide a scaffolding for the future development of multi-criteria decision-making systems.

Correlation-based Feature Selection for Machine Learning

Mark Hall
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
References
More filters
Journal ArticleDOI

Induction of Decision Trees

J. R. Quinlan
- 25 Mar 1986 - 
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Journal ArticleDOI

Learnability and the Vapnik-Chervonenkis dimension

TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Journal ArticleDOI

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

TL;DR: This work presents one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions.
Journal ArticleDOI

Generalization as search

TL;DR: The problem of concept learning, or forming a general description of a class of objects given a set of examples and non-examples, is viewed here as a search problem.
Journal ArticleDOI

Occam's razor

TL;DR: It is shown that a polynomial learning algorithm, as defined by Valiant (1984), is obtained whenever there exists aPolynomial-time method of producing, for any sequence of observations, a nearly minimum hypothesis that is consistent with these observations.