R
Ruslan Salakhutdinov
Researcher at Carnegie Mellon University
Publications - 457
Citations - 142495
Ruslan Salakhutdinov is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 107, co-authored 410 publications receiving 115921 citations. Previous affiliations of Ruslan Salakhutdinov include Carnegie Learning & University of Toronto.
Papers
More filters
Journal ArticleDOI
Plan, Eliminate, and Track - Language Models are Good Teachers for Embodied Agents
Yue-Fen Wu,So Yeon Min,Yonatan Bisk,Ruslan Salakhutdinov,Amos Azaria,Yuan-Fang Li,Tom M. Mitchell,Shrimai Prabhumoye +7 more
TL;DR: In this article , the authors propose the Plan, Eliminate, and Track (PET) framework, which translates a task description into a list of high-level sub-tasks and then determines whether the agent has accomplished each sub-task.
Proceedings Article
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
TL;DR: This paper proposed a probabilistic approach for domain adaptation in reinforcement learning, where the agent's experience in the source domain should look similar to its experience in target domain by compensating for the difference in dynamics by modifying the reward function.
Posted Content
Conditional Contrastive Learning: Removing Undesirable Information in Self-Supervised Representations.
Yao-Hung Hubert Tsai,Martin Q. Ma,Han Zhao,Kun Zhang,Louis-Philippe Morency,Ruslan Salakhutdinov +5 more
TL;DR: This article proposed conditional contrastive learning (C-InfoNCE) to remove undesirable information in self-supervised representations, such as gender information, which may lead to biased decisions on many gender-irrelevant tasks.
Posted Content
On Proximal Policy Optimization's Heavy-tailed Gradients
Saurabh Garg,Joshua Zhanson,Emilio Parisotto,Adarsh Prasad,J. Zico Kolter,Sivaraman Balakrishnan,Zachary C. Lipton,Ruslan Salakhutdinov,Pradeep Ravikumar +8 more
TL;DR: In this paper, a detailed empirical study is presented to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function, and the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavytailedness in gradients.
Journal ArticleDOI
Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control
TL;DR: This paper proposed a few-shot human-in-the-loop training algorithm that continuously learns from human feedback to control the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization.