scispace - formally typeset
Search or ask a question

Showing papers on "AdaBoost published in 1999"


Yoav Freund1, Robert E. Schapire1
01 Jan 1999
TL;DR: This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines.
Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines. Some examples of recent applications of boosting are also described.

3,212 citations


Journal ArticleDOI
TL;DR: It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference.
Abstract: Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only “hard” areas but also outliers and noise.

2,686 citations


Proceedings Article
Robert E. Schapire1
31 Jul 1999
TL;DR: The boosting algorithm AdaBoost is introduced, and the underlying theory of boosting is explained, including an explanation of why boosting often does not suffer from overfitting.
Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. This short paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting. Some examples of recent applications of boosting are also described.

1,339 citations


Proceedings Article
29 Nov 1999
TL;DR: Following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, a new algorithm (DOOM II) is presented for performing a gradient descent optimization of such cost functions.
Abstract: We provide an abstract characterization of boosting algorithms as gradient decsent on cost-functionals in an inner-product function space. We prove convergence of these functional-gradient-descent algorithms under quite weak conditions. Following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on several data sets from the UC Irvine repository demonstrate that DOOM II generally outperforms AdaBoost, especially in high noise situations, and that the overfitting behaviour of AdaBoost is predicted by our cost functions.

786 citations


Journal ArticleDOI
TL;DR: The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost and others in reducing generalization error has not been well understood, and an explanation of whyAdaboost works in terms of its ability to produce generally high margins is offered.
Abstract: The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund & Schapire, 1996a, 1997) and others in reducing generalization error has not been well understood. By formulating prediction as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. The minimax theorem is an essential ingredient of the convergence proofs. An arcing algorithm is described that converges to the optimal strategy. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Schapire, Freund, Bartlett, and Lee (1997) offered an explanation of why Adaboost works in terms of its ability to produce generally high margins. The empirical comparison of Adaboost to the optimal arcing algorithm shows that their explanation is n...

558 citations


Proceedings Article
18 Jul 1999
TL;DR: This paper presents an ensemble feature selection approach that is based on genetic algorithms and shows improved performance over the popular and powerful ensemble approaches of AdaBoost and Bagging and demonstrates the utility of ensemble features selection.
Abstract: The traditional motivation behind feature selection algorithms is to find the best subset of features for a task using one particular learning algonthm. Given the recent success of ensembles, however, we investigate the notion of ensemble feature selection in this paper. This task is harder than traditional feature selection in that one not only needs to find features germane to the learning task and learning algorithm, but one also needs to find a set of feature subsets that will promote disagreement among the ensemble's classifiers. In this paper, we present an ensemble feature selection approach that is based on genetic algorithms. Our algorithm shows improved performance over the popular and powerful ensemble approaches of AdaBoost and Bagging and demonstrates the utility of ensemble feature selection.

354 citations


Proceedings Article
William W. Cohen1, Yoram Singer
18 Jul 1999
TL;DR: SLIPPER, a new rule learner that generates rulesets by repeatedly boosting a simple, greedy, rule-builder, is described, and like the rulesets built by other rule learners, the ensemble of rules created by SLIPPER is compact and comprehensible.
Abstract: We describe SLIPPER, a new rule learner that generates rulesets by repeatedly boosting a simple, greedy, rule-builder. Like the rulesets built by other rule learners, the ensemble of rules created by SLIPPER is compact and comprehensible. This is made possible by imposing appropriate constraints on the rule-builder, and by use of a recently-proposed generalization of Adaboost called confidence-rated boosting. In spite of its relative simplicity, SLIPPER is highly scalable, and an effective learner. Experimentally, SLIPPER scales no worse than O(n log n), where n is the number of examples, and on a set of 32 benchmark problems, SLIPPER achieves lower error rates than RIPPER 20 times, and lower error rates than C4.5rules 22 times.

314 citations


Book ChapterDOI
06 Dec 1999
TL;DR: Focusing primarily on the AdaBoost algorithm, theoretical work on boosting is briefly surveyed including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of Ada boost for multiclass classiffication problems.
Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of AdaBoost for multiclass classiffication problems. Some empirical work and applications are also described.

196 citations


Proceedings ArticleDOI
Yoav Freund1
06 Jul 1999
TL;DR: A new boosting algorithm is proposed that is an adaptive version of the boost by majority algorithm and combines bounded goals of the boosted algorithm with the adaptivity of AdaBoost.
Abstract: We propose a new boosting algorithm. This boosting algorithm is an adaptive version of the boost by majority algorithm and combines bounded goals of the boost by majority algorithm with the adaptivity of AdaBoost.

166 citations


Proceedings ArticleDOI
01 Aug 1999
TL;DR: Two new ways to apply AdaBoost are proposed to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning where new data become available periodically.
Abstract: We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning where new data become available periodically. We propose two new ways to apply AdaBoost. The first allows the use of a small sample of the weighted training set to compute a weak hypothesis. The second approach involves using AdaBoost as a means to re-weight classifiers in an ensemble, and thus to reuse previously computed classifiers along with new classifier computed on a new increment of data. These two techniques of using AdaBoost provide scalable, distributed and on-line learning. We discuss these methods and their implementation in JAM, an agent-based learning system. Empirical studies on four real world and artifical data sets have shown results that are either comparable to or better than learning classifiers over the complete training set and, in some cases, are comparable to boosting on the complete data set. However, our algorithms use much smaller samples of the training set and require much less memory.

127 citations


Book ChapterDOI
29 Mar 1999
TL;DR: Focusing primarily on the AdaBoost algorithm, theoretical work on boosting is surveyed including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of Ada boost for multiclass classification problems.
Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, we briefly survey theoretical work on boosting including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of AdaBoost for multiclass classification problems. We also briefly mention some empirical work.

Proceedings ArticleDOI
06 Jul 1999
TL;DR: A framework for designing incremental learning algorithms derived from generalized entropy functionals based on the use of Bregman divergences together with the associated class of additive models constructed using the Legendre transform is presented.
Abstract: We present a framework for designing incremental learning algorithms derived from generalized entropy functionals Our approach is based on the use of Bregman divergences together with the associated class of additive models constructed using the Legendre transform A particular one-parameter family of Bregman divergences is shown to yield a family of loss functions that includes the log-likelihood criterion of logistic regression as a special case, and that closely approximates the exponential loss criterion used in the AdaBoost algorithms of Schapire et a/, as the natural parameter of the family varies We also show how the quadratic approximation of the gain in Bregman divergence results in a weighted least-squares criterion This leads to a family of incremental learning algorithms that builds upon and extends the recent interpretation of boosting in terms of additive models proposed by Friedman, Hastie, and Tibshirani

Proceedings ArticleDOI
15 Mar 1999
TL;DR: This paper investigates if AdaBoost can be used to improve a hybrid HMM/neural network continuous speech recognizer and reports results on the Numbers 95 corpus and compares them with other classifier combination techniques.
Abstract: "Boosting" is a general method for improving the performance of almost any learning algorithm. A previously proposed and very promising boosting algorithm is AdaBoost. In this paper we investigate if AdaBoost can be used to improve a hybrid HMM/neural network continuous speech recognizer. Boosting significantly improves the word error rate from 6.3% to 5.3% on a test set of the OGI Numbers 95 corpus, a medium size continuous numbers recognition task. These results compare favorably with other combining techniques using several different feature representations or additional information from longer time spans. In summary, we can say that the reasons for the impressive success of AdaBoost are still not completely understood. To the best of our knowledge, an application of AdaBoost to a real world problem has not yet been reported in the literature either. In this paper we investigate if AdaBoost can be applied to boost the performance of a continuous speech recognition system. In this domain we have to deal with large amounts of data (often more than 1 million training examples) and inherently noisy phoneme labels. The paper is organized as follows. We summarize the AdaBoost algorithm and our baseline speech recognizer. We show how AdaBoost can be applied to this task and we report results on the Numbers 95 corpus and compare them with other classifier combination techniques. The paper finishes with a conclusion and perspectives for future work.

Proceedings Article
29 Nov 1999
TL;DR: The question of which potential functions lead to new algorithms that are boosters is examined, and two main results are general sets of conditions on the potential that imply that the resulting algorithm is a booster, while the other implies that the algorithm is not.
Abstract: Recent interpretations of the Adaboost algorithm view it as performing a gradient descent on a potential function. Simply changing the potential function allows one to create new algorithms related to AdaBoost. However, these new algorithms are generally not known to have the formal boosting property. This paper examines the question of which potential functions lead to new algorithms that are boosters. The two main results are general sets of conditions on the potential; one set implies that the resulting algorithm is a booster, while the other implies that the algorithm is not. These conditions are applied to previously studied potential functions, such as those used by LogitBoost and Doom II.

Book ChapterDOI
29 Mar 1999
TL;DR: A new leveraging algorithm is introduced based on a natural potential function that has bounds that are incomparable to AdaBoost's, and their empirical performance is similar to Ada boost's.
Abstract: AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is the direction of steepest descent. We introduce a new leveraging algorithm based on a natural potential function. For this potential function, the direction of steepest descent can have negative components. Therefore we provide two transformations for obtaining suitable distributions from these directions of steepest descent. The resulting algorithms have bounds that are incomparable to AdaBoost's, and their empirical performance is similar to AdaBoost's.

01 Jan 1999
TL;DR: This work presents an extension of Freund and Schapire's AdaBoost algorithm that allows an input-dependent combination of the base hypotheses, and shows that the dynamic approach significantly improves the results on most data sets when (rather weak) perceptron base hypotheses are used.
Abstract: We present an extension of Freund and Schapire's AdaBoost algorithm that allows an input-dependent combination of the base hypotheses. A separate weak learner is used for determining the input-dependent weights of each hypothesis. The error function minimized by these additional weak learners is a margin cost function that has also been shown to be minimized by AdaBoost. The weak learners used for dynamically combining the base hypotheses are simple perceptrons. We compare our dynamic combination model with AdaBoost on a range of binary and multi-class classification problems. It is shown that the dynamic approach significantly improves the results on most data sets when (rather weak) perceptron base hypotheses are used, while the difference in performance is small when the base hypotheses are MLPs.

Proceedings Article
29 Nov 1999
TL;DR: A new boosting algorithm is proposed which allows for the possibility of a pre-specified fraction of points to lie in the margin area, even on the wrong side of the decision boundary.
Abstract: AdaBoost and other ensemble methods have successfully been applied to a number of classification tasks, seemingly defying problems of overfitting. AdaBoost performs gradient descent in an error function with respect to the margin, asymptotically concentrating on the patterns which are hardest to learn. For very noisy problems, however, this can be disadvantageous. Indeed, theoretical analysis has shown that the margin distribution, as opposed to just the minimal margin, plays a crucial role in understanding this phenomenon. Loosely speaking, some outliers should be tolerated if this has the benefit of substantially increasing the margin on the remaining points. We propose a new boosting algorithm which allows for the possibility of a pre-specified fraction of points to lie in the margin area Or even on the wrong side of the decision boundary.

Proceedings Article
29 Nov 1999
TL;DR: An iterative algorithm for building vector machines used in classification tasks that builds on ideas from support vector machines, boosting, and generalized additive models and is very simple to implement.
Abstract: We describe an iterative algorithm for building vector machines used in classification tasks. The algorithm builds on ideas from support vector machines, boosting, and generalized additive models. The algorithm can be used with various continuously differential functions that bound the discrete (0-1) classification loss and is very simple to implement. We test the proposed algorithm with two different loss functions on synthetic and natural data. We also describe a norm-penalized version of the algorithm for the exponential loss function used in AdaBoost. The performance of the algorithm on natural data is comparable to support vector machines while typically its running time is shorter than of SVM.

Proceedings Article
01 Jan 1999
TL;DR: This paper decomposes and modify AdaBoost for use with RBF NNs, the methodology being based on the technique of combining multiple classi ers.
Abstract: AdaBoost, a recent version of Boosting is known to improve the performance of decision trees in many classi cation problems, but in some cases it does not do as well as expected. There are also a few reports of its application to more complex classi ers such as neural networks. In this paper we decompose and modify this algorithm for use with RBF NNs, our methodology being based on the technique of combining multiple classi ers.

Proceedings Article
13 Jul 1999
TL;DR: This paper investigates an ensemble feature selection algorithm that is based on genetic algorithms and shows improved performance over the popular and powerful ensemble approaches of AdaBoost and Bagging.
Abstract: This paper investigates an ensemble feature selection algorithm that is based on genetic algorithms. The task of ensemble feature selection is harder than traditional feature selection in that one not only needs to find features germane to the learning task and learning algorithm, but one also needs to find a set of feature subsets that will promote disagreement among the ensemble's classifiers. Our algorithm shows improved performance over the popular and powerful ensemble approaches of AdaBoost and Bagging.