L
Lucas Smaira
Publications - 13
Citations - 923
Lucas Smaira is an academic researcher. The author has contributed to research in topics: Computer science & Modality (human–computer interaction). The author has an hindex of 6, co-authored 9 publications receiving 365 citations.
Papers
More filters
Proceedings ArticleDOI
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
TL;DR: This work proposes a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos and outperforms all published self-supervised approaches for these tasks as well as several fully supervised baselines.
Proceedings Article
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac,Adrià Recasens,Rosalia Schneider,Relja Arandjelovic,Jason Ramapuram,Jeffrey De Fauw,Lucas Smaira,Sander Dieleman,Andrew Zisserman +8 more
TL;DR: In this article, a multimodal versatile network (MVN) is proposed to learn representations using self-supervision by leveraging three modalities naturally present in videos: vision, audio and language.
Posted Content
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
TL;DR: In this paper, a self-supervised learning approach, MIL-NCE, is proposed to address misalignments inherent in narrated videos without the need for any manual annotation.
Posted Content
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac,Adrià Recasens,Rosalia Schneider,Relja Arandjelovic,Jason Ramapuram,Jeffrey De Fauw,Lucas Smaira,Sander Dieleman,Andrew Zisserman +8 more
TL;DR: This work learns representations using self-supervision by leveraging three modalities naturally present in videos: vision, audio and language by incorporating a novel process of deflation, so that the networks can be effortlessly applied to the visual data in the form of video or a static image.
Posted Content
Visual Grounding in Video for Unsupervised Word Translation
Gunnar A. Sigurdsson,Jean-Baptiste Alayrac,Aida Nematzadeh,Lucas Smaira,Mateusz Malinowski,Joao Carreira,Phil Blunsom,Andrew Zisserman +7 more
TL;DR: The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instructional videos narrated in the native language, forming the basis for the proposed hybrid visual-text mapping algorithm, MUVE.