Search or ask a question

Showing papers by "Dumitru Erhan published in 2022"

PDF

Open Access

Proceedings Article•DOI•

Phenaki: Variable Length Video Generation From Open Domain Textual Description

[...]

Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan - Show less +5 more

05 Oct 2022

TL;DR: A new model for learning video representation which compresses the video to a small representation of discrete tokens, which results in better spatio-temporal consistency and joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets.

...read moreread less

Abstract: We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the per-frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency.

...read moreread less

92 citations

Proceedings Article•DOI•

INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

[...]

Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine

18 Apr 2022

TL;DR: In this paper , the authors propose a modified objective for model-based reinforcement learning that, in combination with mutual information maximization, allows to learn representations and dynamics for visual modelbased RL without reconstruction in a way that explicitly prioritizes functionally relevant factors.

...read moreread less

Abstract: Model-based reinforcement learning (RL) algorithms designed for handling complex visual observations typically learn some sort of latent state representation, either explicitly or implicitly. Standard methods of this sort do not distinguish between functionally relevant aspects of the state and irrelevant distractors, instead aiming to represent all available information equally. We propose a modified objective for model-based RL that, in combination with mutual information maximization, allows us to learn representations and dynamics for visual model-based RL without reconstruction in a way that explicitly prioritizes functionally relevant factors. The key principle behind our design is to integrate a term inspired by variational empowerment into a state-space model based on mutual information. This term prioritizes information that is correlated with action, thus ensuring that functionally relevant factors are captured first. Furthermore, the same empowerment term also promotes faster exploration during the RL process, especially for sparse-reward tasks where the reward signal is insufficient to drive exploration in the early stages of learning. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds, and show that the proposed prioritized information objective outperforms state-of-the-art model based RL approaches with higher sample efficiency and episodic returns. https://sites.google.com/view/information-empowerment

...read moreread less

12 citations

Info rmation p rioritization through e m power ment in v isual m odel -b ased rl

[...]

Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan

TL;DR: The key principle behind the design is to integrate a term inspired by variational empowerment into a state-space model based on mutual information that prioritizes information that is correlated with action, thus ensur-ing that functionally relevant factors are captured during the RL process.

...read moreread less

Abstract: Model-based algorithms designed for handling complex visual typically learn some sort of latent representation, implicitly. Standard methods of this do functionally relevant aspects of the state and irrelevant distractors, instead aiming to represent all available information We propose a modiﬁed objective for model-based RL that, in combination with mutual information maximization, allows us to learn representations and dynamics for visual model-based RL without reconstruction in a way that explicitly prioritizes functionally relevant factors. The key principle behind our design is to integrate a term inspired by variational empowerment into a state-space model based on mutual information. This term prioritizes information that is correlated with action, thus ensur-ing that functionally relevant factors are captured ﬁrst. Furthermore, the same empowerment term also promotes faster exploration during the RL process, es-pecially for sparse-reward tasks where the reward signal is insufﬁcient to drive exploration in the early stages of learning. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds, and show that the proposed prioritized information objective outperforms state-of-the-art model based RL approaches with higher sample efﬁciency and episodic returns.

...read moreread less