Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
Citations
1,328 citations
1,313 citations
Cites methods from "Show and Tell: Lessons Learned from..."
...The results reported were generated with the optimized CL schedule reported in [7]....
[...]
850 citations
827 citations
Cites background from "Show and Tell: Lessons Learned from..."
...There are many known improvements that can be implemented, including ensembling diverse architectures generated by evolution, fine-tuning of the ImageNet model, using a more recent ImageNet model, and performing beam search or scheduled sampling during training (Vinyals et al. 2016) (preliminary experiments with ensembling alone suggest improvements of about 20%)....
[...]
660 citations
Cites methods from "Show and Tell: Lessons Learned from..."
...With the advent of Deep Neural Networks, most captioning techniques have employed RNNs as language models and used the output of one or more layers of a CNN to encode visual information and condition language generation [41, 31, 9, 14]....
[...]
References
72,897 citations
"Show and Tell: Lessons Learned from..." refers background or methods in this paper
...Personal use is permitted....
[...]
...Third, we describe the lessons learned from participating in the first MSCOCO competition, which helped us to improve our initial model and place first in automatic metrics, and first (tied with another team) in human evaluation....
[...]
...Finally, it yields significantly better performance compared to state-of-the-art approaches; for instance, on the Pascal dataset, NIC yielded a BLEU score of 59, to be compared to the current state-of-the-art of 25, while human performance Copyright (c) 2016 IEEE....
[...]
...F...
[...]
40,257 citations
30,843 citations
30,811 citations
30,462 citations
Additional excerpts
...When running the MSCOCO model on SBU, our performance degrades from 28 down to 16....
[...]
...MSCOCO is even bigger (5 times more training data than Flickr30k), but since the collection process was done differently, there are likely more differences in vocabulary and a larger mismatch....
[...]
...Section 5.3 shows a summary of the results on both automatic and human metrics from the MSCOCO competition....
[...]
...Pascal VOC 2008 [2] 1,000 Flickr8k [42] 6,000 1,000 1,000 Flickr30k [43] 28,000 1,000 1,000 MSCOCO [44] 82,783 40,504 40,775 SBU [18] 1M -...
[...]
...Third, we describe the lessons learned from participating in the first MSCOCO competition, which helped us to improve our initial model and place first in automatic metrics, and first (tied with another team) in human evaluation....
[...]