scispace - formally typeset
A

Alex Ray

Researcher at OpenAI

Publications -  11
Citations -  8928

Alex Ray is an academic researcher from OpenAI. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 8, co-authored 9 publications receiving 4711 citations.

Papers
More filters
Proceedings ArticleDOI

Domain randomization for transferring deep neural networks from simulation to the real world

TL;DR: This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Proceedings ArticleDOI

Training language models to follow instructions with human feedback

TL;DR: The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.
Proceedings Article

Hindsight Experience Replay

TL;DR: A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum.
Journal ArticleDOI

Learning dexterous in-hand manipulation:

TL;DR: This work uses reinforcement learning (RL) to learn dexterous in-hand manipulation policies that can perform vision-based object reorientation on a physical Shadow Dexterous Hand, and these policies transfer to the physical robot despite being trained entirely in simulation.
Posted Content

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

TL;DR: In this article, the authors use domain randomization to train a real-world object detector that is accurate to $1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures.