scispace - formally typeset
Search or ask a question

Showing papers by "Dongsoo Han published in 2023"




TL;DR: Zhang et al. as mentioned in this paper proposed a distribution-aware active learning strategy that captures and mitigates the distribution discrepancy between the labeled and unlabeled sets to cope with overfitting.
Abstract: In this paper, we propose a distribution-aware active learning strategy that captures and mitigates the distribution discrepancy between the labeled and unlabeled sets to cope with overfitting. By taking advantage of gaussian mixture models (GMM) and Wasserstein distance, we first design a distribution-aware training strategy to improve the model performance. Then, we introduce a hybrid informativeness metric for active learning which considers both likelihood-based and model-based information simultaneously. Experimental results on four different datasets show the effectiveness of our method against existing active learning baselines.

Journal ArticleDOI
TL;DR: In this paper , the authors propose to shift the style of the test sample (that has a large style gap with the source domains) to the nearest source domain that the model is already familiar with, before making the prediction.
Abstract: In domain generalization (DG), the target domain is unknown when the model is being trained, and the trained model should successfully work on an arbitrary (and possibly unseen) target domain during inference. This is a difficult problem, and despite active studies in recent years, it remains a great challenge. In this paper, we take a simple yet effective approach to tackle this issue. We propose test-time style shifting, which shifts the style of the test sample (that has a large style gap with the source domains) to the nearest source domain that the model is already familiar with, before making the prediction. This strategy enables the model to handle any target domains with arbitrary style statistics, without additional model update at test-time. Additionally, we propose style balancing, which provides a great platform for maximizing the advantage of test-time style shifting by handling the DG-specific imbalance issues. The proposed ideas are easy to implement and successfully work in conjunction with various other DG schemes. Experimental results on different datasets show the effectiveness of our methods.

TL;DR: In this paper , the weight space rotation process is introduced, which transforms the original parameter space into a new space so that they can push most of the previous knowledge compactly into only a few important parameters.
Abstract: Class-incremental few-shot learning, where new sets of classes are provided sequentially with only a few training samples, presents a great challenge due to catastrophic forgetting of old knowledge and overfitting caused by lack of data. During finetuning on new classes, the performance on previous classes deteriorates quickly even when only a small fraction of parameters are updated, since the previous knowledge is broadly associated with most of the model parameters in the original parameter space. In this paper, we introduce WaRP, the weight space rotation process, which transforms the original parameter space into a new space so that we can push most of the previous knowledge compactly into only a few important parameters. By properly identifying and freezing these key parameters in the new weight space, we can finetune the remaining parameters without affecting the knowledge of previous classes. As a result, WaRP provides an additional room for the model to effectively learn new classes in future incremental sessions. Experimental results confirm the effectiveness of our solution and show the improved performance over the state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this paper , a new method for training multi-exit neural networks by strategically imposing different objectives on individual blocks is proposed, which improves the prediction performance at the earlier exits while not degrading the performance of later ones.
Abstract: As the size of a model increases, making predictions using deep neural networks (DNNs) is becoming more computationally expensive. Multi-exit neural network is one promising solution that can flexibly make anytime predictions via early exits, depending on the current test-time budget which may vary over time in practice (e.g., self-driving cars with dynamically changing speeds). However, the prediction performance at the earlier exits is generally much lower than the final exit, which becomes a critical issue in low-latency applications having a tight test-time budget. Compared to the previous works where each block is optimized to minimize the losses of all exits simultaneously, in this work, we propose a new method for training multi-exit neural networks by strategically imposing different objectives on individual blocks. The proposed idea based on grouping and overlapping strategies improves the prediction performance at the earlier exits while not degrading the performance of later ones, making our scheme to be more suitable for low-latency applications. Extensive experimental results on both image classification and semantic segmentation confirm the advantage of our approach. The proposed idea does not require any modifications in the model architecture and can be easily combined with existing strategies aiming to improve the performance of multi-exit neural networks.