scispace - formally typeset
Y

Yumao Lu

Researcher at Microsoft

Publications -  6
Citations -  28

Yumao Lu is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Closed captioning. The author has an hindex of 1, co-authored 6 publications receiving 3 citations.

Papers
More filters
Posted Content

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

TL;DR: Wang et al. as discussed by the authors proposed an end-to-end transformer-based model for video captioning, which takes video frame patches directly as inputs, and outputs a natural language description.
Posted Content

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

TL;DR: PICa as discussed by the authors proposes to use image captions for knowledge-based visual question answering (VQA) in a few-shot manner by converting the image into captions (or tags) that GPT-3 can understand.
Posted Content

Florence: A New Foundation Model for Computer Vision

TL;DR: Florence as discussed by the authors proposes a new computer vision foundation model to expand the representations from coarse (scene) to fine (object), from static (images) to dynamic (videos), and from RGB to multiple modalities (caption, depth).
Posted Content

Scaling Up Vision-Language Pre-training for Image Captioning

TL;DR: Li et al. as mentioned in this paper used the state-of-the-art VinVL model as their reference model, which consists of an image feature extractor and a transformer model, and scale the transformer both up and down, with model sizes ranging from 13 to 675 million parameters.
Posted Content

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

TL;DR: In this article, a single UniFied transfOrmer (UFO) is proposed for vision-language representation learning, which is capable of processing either unimodal inputs (e.g., image or language) or multimodal input (i.e., the concatenation of the image and the question), for visual question answering.