Search or ask a question

Showing papers by "Nitesh V. Chawla published in 2001"

PDF

Open Access

Proceedings Article•DOI•

Creating ensembles of classifiers

[...]

Nitesh V. Chawla¹, Steven A. Eschrich, Lawrence O. Hall¹•Institutions (1)

University of South Florida¹

29 Nov 2001

TL;DR: The significance of the finding is that a partition strategy for even small/moderate sized datasets when combined with bagging can yield better performance than applying a single learner using the entire dataset.

...read moreread less

Abstract: Ensembles of classifiers offer promise in increasing overall classification accuracy. The availability of extremely large datasets has opened avenues for application of distributed and/or parallel learning to efficiently learn models of them. In this paper, distributed learning is done by training classifiers on disjoint subsets of the data. We examine a random partitioning method to create disjoint subsets and propose a more intelligent way of partitioning into disjoint subsets using clustering. It was observed that the intelligent method of partitioning generally performs better than random partitioning for our datasets. In both methods a significant gain in accuracy may be obtained by applying bagging to each of the disjoint subsets, creating multiple diverse classifiers. The significance of our finding is that a partition strategy for even small/moderate sized datasets when combined with bagging can yield better performance than applying a single learner using the entire dataset.

...read moreread less

52 citations

Proceedings Article•DOI•

Bagging is a small-data-set phenomenon

[...]

Nitesh V. Chawla¹, Thomas E. Moore¹, Kevin W. Bowyer², Lawrence O. Hall¹, Clayton Springer³, Philip W. Kegelmeyer³ - Show less +2 more•Institutions (3)

University of South Florida¹, University of Notre Dame², Sandia National Laboratories³

01 Dec 2001

TL;DR: The results indicate that the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging in applications that involve the use of datasets too large to handle in the memory of a typical computer.

...read moreread less

Abstract: Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in better performance than bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve the use of datasets that are too large to handle in the memory of a typical computer. Our results indicate that, in such applications, the simple approach of creating a committee of classifiers from disjoint partitions is preferred over the more complex approach of bagging.

...read moreread less

13 citations

Proceedings Article•

Bagging-like effects for decision trees and neural nets in protein secondary structure prediction

[...]

Nitesh V. Chawla¹, Thomas E. Moore¹, Kevin W. Bowyer¹, Lawrence O. Hall¹, Clayton Springer², Philip W. Kegelmeyer² - Show less +2 more•Institutions (2)

University of South Florida¹, Sandia National Laboratories²

26 Aug 2001

TL;DR: It is shown that with "large" data sets and with bag size equal to partition size, simple disjoint partitioning performs at least as well as standard bagging, and there are subtle differences in the operation of binary decision trees and neural networks for this problem.

...read moreread less

Abstract: In the Third Critical Assessment of Techniques for Protein Structure Prediction ("CASP-3") contest, the best performance was obtained with a classifier that uses neural networks, a window size of fifteen around a given amino acid, and a training set of about 299,186 amino acids. We set out to investigate the possibility of obtaining better performance by using a bagging-like committee of binary decision trees, created using an order-of-magnitude more training data. There are two main reasons to believe that it should be possible to obtain better performance in this way. One is that Jones did not use a committee of classifiers in CASP-3 (and used only a four-classifier committee in CASP-4), whereas bagging studies indicate that improvement plateaus in the range of thirty to fifty classifiers in a committee. A second is that, by using supercomputers available at the Sandia National Labs, it is feasible to use an order of magnitude more training data than was used by Jones. This paper reports on our experiences pursuing this line of research. We show that with "large" data sets and with bag size equal to partition size, simple disjoint partitioning performs at least as well as standard bagging. Given large datasets, either outperforms a single classifier built on all the data. We also show that there are subtle differences in the operation of binary decision trees and neural networks for this problem. One difference is that the neural network seems less prone to "over-learning" the "easy" subset of the training data.

...read moreread less

10 citations

Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction.

[...]

Nitesh V. Chawla, Thomas E. Moore, Kevin W. Bowyer, Lawrence O. Hall, Clayton Springer, W. Philip Kegelmeyer - Show less +2 more

01 Jan 2001

8 citations