scispace - formally typeset
P

Peter Vajda

Researcher at Facebook

Publications -  10
Citations -  82

Peter Vajda is an academic researcher from Facebook. The author has contributed to research in topics: Language model & Closed captioning. The author has an hindex of 2, co-authored 10 publications receiving 26 citations. Previous affiliations of Peter Vajda include Rice University.

Papers
More filters
Book ChapterDOI

Learning to Generate Grounded Visual Captions Without Localization Supervision

TL;DR: This paper proposed a cyclical training regimen that forces the model to localize each word in the image after the sentence decoder generates it, and then reconstruct the sentence from the localized image region(s) to match the ground-truth.
Proceedings ArticleDOI

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

TL;DR: The authors proposed a data-efficient contrastive distillation method that uses soft labels to learn from noisy image-text pairs, which achieves strong performance with only 3M image text pairs, 133x smaller than CLIP.
Book ChapterDOI

Deep Space-Time Video Upsampling Networks

TL;DR: An end-to-end DNN framework for the space-time video upsampling by efficiently merging VSR and FI into a joint framework is proposed and a novel weighting scheme is proposed to fuse input frames effectively without explicit motion compensation for efficient processing of videos.
Posted Content

Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild.

TL;DR: A novel differentiable renderer that learns to approximate the rasterization backward pass from data instead of relying on a hand-crafted algorithm to perform a gradient-based optimization directly on the 3D pose.
Book ChapterDOI

Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild

TL;DR: In this article, a differentiable renderer is proposed to learn to approximate the rasterization backward pass from data instead of relying on a hand-crafted algorithm, which results in significantly improved 3D pose estimates.