scispace - formally typeset
Search or ask a question
Institution

OpenAI

About: OpenAI is a based out in . It is known for research contribution in the topics: Reinforcement learning & Artificial neural network. The organization has 105 authors who have published 213 publications receiving 68067 citations. The organization is also known as: Open AI & OpenAI LP.

Papers published on a yearly basis

Papers
More filters
Proceedings Article
18 Jul 2021
TL;DR: Phasic Policy Gradient (PPG) as discussed by the authors is a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.
Abstract: We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best of both worlds by splitting optimization into two phases, one that advances training and one that distills features. PPG also enables the value function to be more aggressively optimized with a higher level of sample reuse. Compared to PPO, we find that PPG significantly improves sample efficiency on the challenging Procgen Benchmark.

2 citations

Posted Content
TL;DR: In this paper, the authors gave an algorithm that in O(m)$ time computes an approximation of the girth in directed weighted graphs with high probability (w.h.p).
Abstract: The girth of a graph, i.e. the length of its shortest cycle, is a fundamental graph parameter. Unfortunately all known algorithms for computing, even approximately, the girth and girth-related structures in directed weighted $m$-edge and $n$-node graphs require $\Omega(\min\{n^{\omega}, mn\})$ time (for $2\leq\omega<2.373$). In this paper, we drastically improve these runtimes as follows: * Multiplicative Approximations in Nearly Linear Time: We give an algorithm that in $\widetilde{O}(m)$ time computes an $\widetilde{O}(1)$-multiplicative approximation of the girth as well as an $\widetilde{O}(1)$-multiplicative roundtrip spanner with $\widetilde{O}(n)$ edges with high probability (w.h.p). * Nearly Tight Additive Approximations: For unweighted graphs and any $\alpha \in (0,1)$ we give an algorithm that in $\widetilde{O}(mn^{1 - \alpha})$ time computes an $O(n^\alpha)$-additive approximation of the girth w.h.p, and partially derandomize it. We show that the runtime of our algorithm cannot be significantly improved without a breakthrough in combinatorial Boolean matrix multiplication. Our main technical contribution to achieve these results is the first nearly linear time algorithm for computing roundtrip covers, a directed graph decomposition concept key to previous roundtrip spanner constructions. Previously it was not known how to compute these significantly faster than $\Omega(\min\{n^\omega, mn\})$ time. Given the traditional difficulty in efficiently processing directed graphs, we hope our techniques may find further applications.

1 citations

Proceedings Article
Irene Solaiman1, Christy Dennison1
06 Dec 2021
TL;DR: The authors propose a process for adaptive language models to society (PALMS) with values-targeted data sets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values.
Abstract: Language models can generate harmful and biased outputs and exhibit undesirable behavior. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, and toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.

1 citations

Proceedings Article
Adrien Ecoffet1, Joel Lehman2
18 Jul 2021
TL;DR: This paper proposed a formalism that translates moral philosophy insights to the field of reinforcement learning, and trained agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories.
Abstract: An ambitious goal for artificial intelligence is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed. While ethical agents could be trained through reinforcement, by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement (both societally and among moral philosophers) about the nature of morality and what ethical theory (if any) is objectively correct. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. Inspired by such work, this paper proposes a formalism that translates such insights to the field of reinforcement learning. Demonstrating the formalism's potential, we then train agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories. The overall aim is to draw productive connections from the fields of moral philosophy and machine ethics to that of machine learning, to inspire further research by highlighting a spectrum of machine learning research questions relevant to training ethically capable reinforcement learning agents.

1 citations


Authors

Showing all 105 results

NameH-indexPapersCitations
Geoffrey E. Hinton157414409047
Pieter Abbeel12658970911
Ian Goodfellow85137135390
Ilya Sutskever75131235539
Kenneth O. Stanley6022316921
Phillip Isola4810145099
John Schulman486730168
Jeff Clune4814021194
Wojciech Zaremba395834954
Elizabeth A. Barnes391325281
Igor Mordatch36896604
Dario Amodei344913108
Joel Lehman33985588
Gillian K. Hadfield281012420
Marcin Andrychowicz28496638
Network Information
Related Institutions (5)
Facebook
10.9K papers, 570.1K citations

89% related

Google
39.8K papers, 2.1M citations

88% related

Microsoft
86.9K papers, 4.1M citations

86% related

Adobe Systems
8K papers, 214.7K citations

85% related

Carnegie Mellon University
104.3K papers, 5.9M citations

84% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202129
202052
201921
201851
201736
201623