Meshed-Memory Transformer for Image Captioning
Marcella Cornia,Matteo Stefanini,Lorenzo Baraldi,Rita Cucchiara +3 more
- pp 10578-10587
Reads0
Chats0
TLDR
The architecture improves both the image encoding and the language generation steps: it learns a multi-level representation of the relationships between image regions integrating learned a priori knowledge, and uses a mesh-like connectivity at decoding stage to exploit low- and high-level features.Citations
More filters
Journal ArticleDOI
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar,Farhad Pourpanah,Sadiq Hussain,Dana Rezazadegan,Li Liu,Mohammad Ghavamzadeh,Paul Fieguth,Xiaochun Cao,Abbas Khosravi,U. Rajendra Acharya,U. Rajendra Acharya,U. Rajendra Acharya,Vladimir Makarenkov,Saeid Nahavandi +13 more
TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.
Proceedings ArticleDOI
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
TL;DR: The Conceptual 12M (CC12M) dataset as mentioned in this paper is a dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training.
Posted Content
AdaBins: Depth Estimation using Adaptive Bins
TL;DR: A transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image, and which shows a decisive improvement over the state-of-the-art on several popular depth datasets across all metrics.
Journal Article
Contrastive Learning of Medical Visual Representations from Paired Images and Text
TL;DR: This work proposes an alternative unsupervised strategy to learn medical visual representations directly from the naturally occurring pairing of images and textual data, and shows that this method leads to image representations that considerably outperform strong baselines in most settings.
Proceedings ArticleDOI
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words
Xuying Zhang,Xiaoshuai Sun,Yunpeng Luo,Jiayi Ji,Yiyi Zhou,Wu Yongjian,Feiyue Huang,Rongrong Ji +7 more
TL;DR: Zhang et al. as mentioned in this paper proposed Grid-Augmented (GA) module, in which relative geometry features between grids are incorporated to enhance visual representations, and proposed Adaptive-Attention (AA) module on top of a transformer decoder to adaptively measure the contribution of visual and language cues before making decisions for word prediction.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.