Institution

OpenAI

About: OpenAI is a based out in . It is known for research contribution in the topics: Reinforcement learning & Artificial neural network. The organization has 105 authors who have published 213 publications receiving 68067 citations. The organization is also known as: Open AI & OpenAI LP.

...read moreread less

Topics: Reinforcement learning, Artificial neural network, Computer science, Language model, Deep learning ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Differentiable graphics with TensorFlow 2.0

[...]

Paige Bailey, Sofien Bouaziz¹, Shan Carter², Josh Gordon, Christian Häne¹, Alexander Mordvintsev¹, Julien Valentin¹, Martin Wicke³ - Show less +4 more•Institutions (3)

Google¹, OpenAI², University of Texas at Austin³

28 Jul 2019

2 citations

Proceedings Article•

Phasic Policy Gradient

[...]

Karl Cobbe¹, Jacob Hilton¹, Oleg Klimov¹, John Schulman¹•Institutions (1)

OpenAI¹

18 Jul 2021

TL;DR: Phasic Policy Gradient (PPG) as discussed by the authors is a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.

...read moreread less

Abstract: We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best of both worlds by splitting optimization into two phases, one that advances training and one that distills features. PPG also enables the value function to be more aggressively optimized with a higher level of sample reuse. Compared to PPO, we find that PPG significantly improves sample efficiency on the challenging Procgen Benchmark.

...read moreread less

2 citations

Posted Content•

Approximating Cycles in Directed Graphs: Fast Algorithms for Girth and Roundtrip Spanners

[...]

Jakub Pachocki¹, Liam Roditty², Aaron Sidford³, Roei Tov², Virginia Vassilevska Williams⁴ - Show less +1 more•Institutions (4)

OpenAI¹, Bar-Ilan University², Stanford University³, Massachusetts Institute of Technology⁴

02 Nov 2016-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors gave an algorithm that in O(m)$ time computes an approximation of the girth in directed weighted graphs with high probability (w.h.p).

...read moreread less

Abstract: The girth of a graph, i.e. the length of its shortest cycle, is a fundamental graph parameter. Unfortunately all known algorithms for computing, even approximately, the girth and girth-related structures in directed weighted $m$-edge and $n$-node graphs require $\Omega(\min\{n^{\omega}, mn\})$ time (for $2\leq\omega<2.373$). In this paper, we drastically improve these runtimes as follows: * Multiplicative Approximations in Nearly Linear Time: We give an algorithm that in $\widetilde{O}(m)$ time computes an $\widetilde{O}(1)$-multiplicative approximation of the girth as well as an $\widetilde{O}(1)$-multiplicative roundtrip spanner with $\widetilde{O}(n)$ edges with high probability (w.h.p). * Nearly Tight Additive Approximations: For unweighted graphs and any $\alpha \in (0,1)$ we give an algorithm that in $\widetilde{O}(mn^{1 - \alpha})$ time computes an $O(n^\alpha)$-additive approximation of the girth w.h.p, and partially derandomize it. We show that the runtime of our algorithm cannot be significantly improved without a breakthrough in combinatorial Boolean matrix multiplication. Our main technical contribution to achieve these results is the first nearly linear time algorithm for computing roundtrip covers, a directed graph decomposition concept key to previous roundtrip spanner constructions. Previously it was not known how to compute these significantly faster than $\Omega(\min\{n^\omega, mn\})$ time. Given the traditional difficulty in efficiently processing directed graphs, we hope our techniques may find further applications.

...read moreread less

1 citations

Proceedings Article•

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

[...]

Irene Solaiman¹, Christy Dennison¹•Institutions (1)

OpenAI¹

06 Dec 2021

TL;DR: The authors propose a process for adaptive language models to society (PALMS) with values-targeted data sets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values.

...read moreread less

Abstract: Language models can generate harmful and biased outputs and exhibit undesirable behavior. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, and toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.

...read moreread less

1 citations

Proceedings Article•

Reinforcement Learning Under Moral Uncertainty

[...]

Adrien Ecoffet¹, Joel Lehman²•Institutions (2)

OpenAI¹, Uber ²

18 Jul 2021

TL;DR: This paper proposed a formalism that translates moral philosophy insights to the field of reinforcement learning, and trained agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories.

...read moreread less

Abstract: An ambitious goal for artificial intelligence is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed. While ethical agents could be trained through reinforcement, by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement (both societally and among moral philosophers) about the nature of morality and what ethical theory (if any) is objectively correct. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. Inspired by such work, this paper proposes a formalism that translates such insights to the field of reinforcement learning. Demonstrating the formalism's potential, we then train agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories. The overall aim is to draw productive connections from the fields of moral philosophy and machine ethics to that of machine learning, to inspire further research by highlighting a spectrum of machine learning research questions relevant to training ethically capable reinforcement learning agents.

...read moreread less

1 citations

Collapse

Authors

Showing all 105 results

Name	H-index	Papers	Citations
Geoffrey E. Hinton	157	414	409047
Pieter Abbeel	126	589	70911
Ian Goodfellow	85	137	135390
Ilya Sutskever	75	131	235539
Kenneth O. Stanley	60	223	16921
Phillip Isola	48	101	45099
John Schulman	48	67	30168
Jeff Clune	48	140	21194
Wojciech Zaremba	39	58	34954
Elizabeth A. Barnes	39	132	5281
Igor Mordatch	36	89	6604
Dario Amodei	34	49	13108
Joel Lehman	33	98	5588
Gillian K. Hadfield	28	101	2420
Marcin Andrychowicz	28	49	6638