Uncertainty-aware Self-training for Text Classification with Few Labels.
Citations
102 citations
Additional excerpts
...Different neural model architectures (Kim, 2014; Zhou et al., 2015; Radford et al., 2018; Chai et al., 2020) have demonstrated their effectiveness against traditional statistical feature based methods (Wallach, 2006)....
[...]
83 citations
Cites background from "Uncertainty-aware Self-training for..."
...UST (Mukherjee and Awadallah, 2020) is stateof-the-art for self-training with limited labels....
[...]
...We implement Self-ensemble, FreeLB, Mixup and UST based on their original paper....
[...]
...C.3 Number of Parameters COSINE and most of the baselines (RoBERTaWL / RoBERTa-CL / SMART / WeSTClass / SelfEnsemble / FreeLB / Mixup / UST) are built on the RoBERTa-base model with about 125M parameters....
[...]
...Com- pared with advanced fine-tuning and self-training methods (e.g. SMART and UST)11, our model consistently outperforms the baselines....
[...]
...We highlight that although UST, the state-of-the-art method to date, achieves strong performance under few-shot settings, their approach cannot estimate confidence well with noisy labels, and this yields inferior performance....
[...]
25 citations
17 citations
15 citations
References
111,197 citations
"Uncertainty-aware Self-training for..." refers methods in this paper
...We use Adam [Kingma and Ba, 2015] as the optimizer with early stopping and use the best model found so far from the validation loss for all the models....
[...]
65,425 citations
33,597 citations
24,672 citations
13,994 citations