scispace - formally typeset
Search or ask a question

Showing papers by "Nitesh V. Chawla published in 2001"


Proceedings ArticleDOI
29 Nov 2001
TL;DR: The significance of the finding is that a partition strategy for even small/moderate sized datasets when combined with bagging can yield better performance than applying a single learner using the entire dataset.
Abstract: Ensembles of classifiers offer promise in increasing overall classification accuracy. The availability of extremely large datasets has opened avenues for application of distributed and/or parallel learning to efficiently learn models of them. In this paper, distributed learning is done by training classifiers on disjoint subsets of the data. We examine a random partitioning method to create disjoint subsets and propose a more intelligent way of partitioning into disjoint subsets using clustering. It was observed that the intelligent method of partitioning generally performs better than random partitioning for our datasets. In both methods a significant gain in accuracy may be obtained by applying bagging to each of the disjoint subsets, creating multiple diverse classifiers. The significance of our finding is that a partition strategy for even small/moderate sized datasets when combined with bagging can yield better performance than applying a single learner using the entire dataset.

52 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: The results indicate that the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging in applications that involve the use of datasets too large to handle in the memory of a typical computer.
Abstract: Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in better performance than bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve the use of datasets that are too large to handle in the memory of a typical computer. Our results indicate that, in such applications, the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging.

13 citations


Proceedings Article
26 Aug 2001
TL;DR: It is shown that with "large" data sets and with bag size equal to partition size, simple disjoint partitioning performs at least as well as standard bagging, and there are subtle differences in the operation of binary decision trees and neural networks for this problem.
Abstract: In the Third Critical Assessment of Techniques for Protein Structure Prediction ("CASP-3") contest, the best performance was obtained with a classifier that uses neural networks, a window size of fifteen around a given amino acid, and a training set of about 299,186 amino acids. We set out to investigate the possibility of obtaining better performance by using a bagging-like committee of binary decision trees, created using an order-of-magnitude more training data. There are two main reasons to believe that it should be possible to obtain better performance in this way. One is that Jones did not use a committee of classifiers in CASP-3 (and used only a four-classifier committee in CASP-4), whereas bagging studies indicate that improvement plateaus in the range of thirty to fifty classifiers in a committee. A second is that, by using supercomputers available at the Sandia National Labs, it is feasible to use an order of magnitude more training data than was used by Jones. This paper reports on our experiences pursuing this line of research. We show that with "large" data sets and with bag size equal to partition size, simple disjoint partitioning performs at least as well as standard bagging. Given large datasets, either outperforms a single classifier built on all the data. We also show that there are subtle differences in the operation of binary decision trees and neural networks for this problem. One difference is that the neural network seems less prone to "over-learning" the "easy" subset of the training data.

10 citations