Ruslan Salakhutdinov

Researcher at Carnegie Mellon University

Publications - 457

Citations - 142495

Ruslan Salakhutdinov is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 107, co-authored 410 publications receiving 115921 citations. Previous affiliations of Ruslan Salakhutdinov include Carnegie Learning & University of Toronto.

Papers

PDF

Open Access

More filters

Journal ArticleDOI

Plan, Eliminate, and Track - Language Models are Good Teachers for Embodied Agents

Yue-Fen Wu, +7 more

- 03 May 2023 -

arXiv.org

TL;DR: In this article , the authors propose the Plan, Eliminate, and Track (PET) framework, which translates a task description into a list of high-level sub-tasks and then determines whether the agent has accomplished each sub-task.

...read moreread less

Proceedings Article

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Benjamin Eysenbach, +4 more

TL;DR: This paper proposed a probabilistic approach for domain adaptation in reinforcement learning, where the agent's experience in the source domain should look similar to its experience in target domain by compensating for the difference in dynamics by modifying the reward function.

...read moreread less

Posted Content

Conditional Contrastive Learning: Removing Undesirable Information in Self-Supervised Representations.

Yao-Hung Hubert Tsai, +5 more

- 05 Jun 2021 -

arXiv: Learning

TL;DR: This article proposed conditional contrastive learning (C-InfoNCE) to remove undesirable information in self-supervised representations, such as gender information, which may lead to biased decisions on many gender-irrelevant tasks.

...read moreread less

Posted Content

On Proximal Policy Optimization's Heavy-tailed Gradients

Saurabh Garg, +8 more

- 20 Feb 2021 -

arXiv: Learning

TL;DR: In this paper, a detailed empirical study is presented to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function, and the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavytailedness in gradients.

...read moreread less

Journal ArticleDOI

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

Xiang Fan, +4 more

- 10 Nov 2022 -

arXiv.org

TL;DR: This paper proposed a few-shot human-in-the-loop training algorithm that continuously learns from human feedback to control the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization.

...read moreread less

Collapse