scispace - formally typeset
Z

Zhengyuan Yang

Researcher at Microsoft

Publications -  3
Citations -  1

Zhengyuan Yang is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Closed captioning. The author has an hindex of 1, co-authored 3 publications receiving 1 citations.

Papers
More filters
Posted Content

Scaling Up Vision-Language Pre-training for Image Captioning

TL;DR: Li et al. as mentioned in this paper used the state-of-the-art VinVL model as their reference model, which consists of an image feature extractor and a transformer model, and scale the transformer both up and down, with model sizes ranging from 13 to 675 million parameters.
Posted Content

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

TL;DR: In this article, a single UniFied transfOrmer (UFO) is proposed for vision-language representation learning, which is capable of processing either unimodal inputs (e.g., image or language) or multimodal input (i.e., the concatenation of the image and the question), for visual question answering.
Posted Content

Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

TL;DR: UNICORN as discussed by the authors unifies text generation and bounding box prediction into a single architecture by quantizing each box into four discrete box tokens and serializing them as a sequence, which can be integrated with text tokens.