Learning from massive noisy labeled data for image classification
read more
Citations
DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
Generalized cross entropy loss for training deep neural networks with noisy labels
Learning to Reweight Examples for Robust Deep Learning
Robust Loss Functions under Label Noise for Deep Neural Networks
References
ImageNet Classification with Deep Convolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
ImageNet: A large-scale hierarchical image database
Going deeper with convolutions
Related Papers (5)
Frequently Asked Questions (11)
Q2. What is the purpose of the semi-supervised learning?
Semi-supervised learning: Apart from direct learning with label noise, some semi-supervised learning algorithms were developed to utilize weakly labeled or even unlabeled data.
Q3. How do the authors assign a noisy label to an image?
The authors assign an image a noisy label if the authors find its surrounding text contains only the keywords of that label, otherwise the authors discard the image to reduce ambiguity.
Q4. What is the way to train deep models?
One possible solution is to automatically collect a large amount of annotations from the Internet web images [10] (i.e. extracting tags from the surrounding texts or keywords from search engines) and directly use them as ground truth to train deep models.
Q5. How many clean and noisy samples are in the classifier?
In their experiments, the authors find that the performance of the classifier drops significantly without upsampling, but it is not sensitive with the upsampling ratio as long as the number of clean and noisy samples are in the same order.
Q6. What is the way to train a CNN?
In their case of clothing classification, the authors find that training a CNN from scratch with limited clean labels and massive noisy labels is better than finetuning it only on the clean labels.
Q7. What is the main challenge of the training process?
All the data are then used to train CNNs, while the major challenge is to identify and correct wrong labels during the training process.
Q8. What is the proposed model layer for the CNN?
the authors append a label noise model layer at the end, which takes as input the CNNs’ prediction scores and the observed noisy label.
Q9. How many images are used in the training dataset?
the size of training datasets are |Dc| = 47, 570 and |Dη| = 106, while validation and test set have 14, 313 and 10, 526 images, respectively.
Q10. What is the way to solve the problem of label noise?
The authors first randomly generate a confusion matrix Q between clean labels and noisy labels, and then corrupt the training labels according to it.
Q11. How do the authors deal with the noise problem?
To deal with this problem, the authors bootstrapNoise Free 41% Random 3% Confusing 56%p(z | x)5 Layers of Conv +Pool + Norm3 FC Layers of Size4096→4096→145 Layers of Conv +Pool + Norm3 FC Layers of Size4096→1024→3Label Noise Model LayerDown Coat Windbreaker 4% Jacket 1%……94%p(y | x)Noisy Label: WindbreakerDown Coat Windbreaker 11% Jacket 4%……75% p(y | y!,x)Noise Free 11% Random 4% Confusing 85%p(z | y!,x)Data with Clean LabelsFigure