Z
Zhengyuan Yang
Researcher at Microsoft
Publications - 3
Citations - 1
Zhengyuan Yang is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Closed captioning. The author has an hindex of 1, co-authored 3 publications receiving 1 citations.
Papers
More filters
Posted Content
Scaling Up Vision-Language Pre-training for Image Captioning
TL;DR: Li et al. as mentioned in this paper used the state-of-the-art VinVL model as their reference model, which consists of an image feature extractor and a transformer model, and scale the transformer both up and down, with model sizes ranging from 13 to 675 million parameters.
Posted Content
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
TL;DR: In this article, a single UniFied transfOrmer (UFO) is proposed for vision-language representation learning, which is capable of processing either unimodal inputs (e.g., image or language) or multimodal input (i.e., the concatenation of the image and the question), for visual question answering.
Posted Content
Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
Zhengyuan Yang,Zhe Gan,Jianfeng Wang,Xiaowei Hu,Faisal Ahmed,Zicheng Liu,Yumao Lu,Lijuan Wang +7 more
TL;DR: UNICORN as discussed by the authors unifies text generation and bounding box prediction into a single architecture by quantizing each box into four discrete box tokens and serializing them as a sequence, which can be integrated with text tokens.