A study of the behavior of several methods for balancing machine learning training data
Citations
6,320 citations
Cites background or methods from "A study of the behavior of several ..."
...Finally, a conclusion is provided in Section 6....
[...]
...To provide a concrete understanding of the direct effects of the imbalanced learning problem on standard learning algorithms, we observe a case study of the popular decision tree learning algorithm....
[...]
3,672 citations
Cites background from "A study of the behavior of several ..."
...A substantial amount of research has been conducted on the effectiveness of using sampling procedures to combat skewed class distributions, most notably Weiss and Provost (2001b), Batista et al. (2004), Van Hulse et al. (2007), Burez and Van den Poel (2009), and Jeatrakul et al....
[...]
...A substantial amount of research has been conducted on the effectiveness of using sampling procedures to combat skewed class distributions, most notably Weiss and Provost (2001b), Batista et al. (2004), Van Hulse et al. (2007), Burez and Van den Poel (2009), and Jeatrakul et al. (2010). These and other publications show that, in many cases, sampling can mitigate the issues caused by an imbalance, but there is no clear winner among the various approaches....
[...]
...A substantial amount of research has been conducted on the effectiveness of using sampling procedures to combat skewed class distributions, most notably Weiss and Provost (2001b), Batista et al. (2004), Van Hulse et al....
[...]
2,800 citations
Cites background from "A study of the behavior of several ..."
...combined over-sampling and under-sampling methods to resolve the imbalanced problem [14]....
[...]
2,228 citations
Cites background or methods or result from "A study of the behavior of several ..."
...Previous works have shown the positive synergy of this combination leading to significant improvements [20], [53]....
[...]
...2) Data level (or external) approaches rebalance the class distribution by resampling the data space [20], [52], [53],...
[...]
...5 [58] as base classifier for our experiments since it has been widely used in imbalanced domains [20], [59]–[61]; besides, most of the proposals we are studying were tested with C4....
[...]
...The Wilcoxon test shows, in concordance with previous studies [20], [53], that making use of SMOTE as a preprocessing technique significantly outperforms C4....
[...]
...empirically proved that the application of a preprocessing step in order to balance the class distribution is usually a positive solution [20], [53]....
[...]
2,057 citations
References
21,674 citations
17,313 citations
12,940 citations
"A study of the behavior of several ..." refers methods or result in this paper
...However, it has also been observed that in some domains, for instance the Sick data set [ 3 ], standard ML algorithms are capable of inducing good classifiers, even using highly imbalanced training sets....
[...]
...To make this comparison, we have selected thirteen data sets from UCI [ 3 ] which have dierent degrees of imbalance....
[...]
11,512 citations
8,046 citations
"A study of the behavior of several ..." refers background in this paper
...5 symbolic learning algorithm to induce decision trees [20]....
[...]