scispace - formally typeset
R

Ruslan Salakhutdinov

Researcher at Carnegie Mellon University

Publications -  457
Citations -  142495

Ruslan Salakhutdinov is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 107, co-authored 410 publications receiving 115921 citations. Previous affiliations of Ruslan Salakhutdinov include Carnegie Learning & University of Toronto.

Papers
More filters
Journal ArticleDOI

Plan, Eliminate, and Track - Language Models are Good Teachers for Embodied Agents

TL;DR: In this article , the authors propose the Plan, Eliminate, and Track (PET) framework, which translates a task description into a list of high-level sub-tasks and then determines whether the agent has accomplished each sub-task.
Proceedings Article

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

TL;DR: This paper proposed a probabilistic approach for domain adaptation in reinforcement learning, where the agent's experience in the source domain should look similar to its experience in target domain by compensating for the difference in dynamics by modifying the reward function.
Posted Content

Conditional Contrastive Learning: Removing Undesirable Information in Self-Supervised Representations.

TL;DR: This article proposed conditional contrastive learning (C-InfoNCE) to remove undesirable information in self-supervised representations, such as gender information, which may lead to biased decisions on many gender-irrelevant tasks.
Posted Content

On Proximal Policy Optimization's Heavy-tailed Gradients

TL;DR: In this paper, a detailed empirical study is presented to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function, and the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavytailedness in gradients.
Journal ArticleDOI

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

TL;DR: This paper proposed a few-shot human-in-the-loop training algorithm that continuously learns from human feedback to control the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization.