Institution
OpenAI
About: OpenAI is a based out in . It is known for research contribution in the topics: Reinforcement learning & Artificial neural network. The organization has 105 authors who have published 213 publications receiving 68067 citations. The organization is also known as: Open AI & OpenAI LP.
Topics: Reinforcement learning, Artificial neural network, Computer science, Language model, Deep learning
Papers
More filters
••
28 Jul 20192 citations
•
18 Jul 2021TL;DR: Phasic Policy Gradient (PPG) as discussed by the authors is a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.
Abstract: We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best of both worlds by splitting optimization into two phases, one that advances training and one that distills features. PPG also enables the value function to be more aggressively optimized with a higher level of sample reuse. Compared to PPO, we find that PPG significantly improves sample efficiency on the challenging Procgen Benchmark.
2 citations
•
TL;DR: In this paper, the authors gave an algorithm that in O(m)$ time computes an approximation of the girth in directed weighted graphs with high probability (w.h.p).
Abstract: The girth of a graph, i.e. the length of its shortest cycle, is a fundamental graph parameter. Unfortunately all known algorithms for computing, even approximately, the girth and girth-related structures in directed weighted $m$-edge and $n$-node graphs require $\Omega(\min\{n^{\omega}, mn\})$ time (for $2\leq\omega<2.373$). In this paper, we drastically improve these runtimes as follows:
* Multiplicative Approximations in Nearly Linear Time: We give an algorithm that in $\widetilde{O}(m)$ time computes an $\widetilde{O}(1)$-multiplicative approximation of the girth as well as an $\widetilde{O}(1)$-multiplicative roundtrip spanner with $\widetilde{O}(n)$ edges with high probability (w.h.p).
* Nearly Tight Additive Approximations: For unweighted graphs and any $\alpha \in (0,1)$ we give an algorithm that in $\widetilde{O}(mn^{1 - \alpha})$ time computes an $O(n^\alpha)$-additive approximation of the girth w.h.p, and partially derandomize it. We show that the runtime of our algorithm cannot be significantly improved without a breakthrough in combinatorial Boolean matrix multiplication.
Our main technical contribution to achieve these results is the first nearly linear time algorithm for computing roundtrip covers, a directed graph decomposition concept key to previous roundtrip spanner constructions. Previously it was not known how to compute these significantly faster than $\Omega(\min\{n^\omega, mn\})$ time. Given the traditional difficulty in efficiently processing directed graphs, we hope our techniques may find further applications.
1 citations
•
06 Dec 2021TL;DR: The authors propose a process for adaptive language models to society (PALMS) with values-targeted data sets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values.
Abstract: Language models can generate harmful and biased outputs and exhibit undesirable behavior. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, and toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.
1 citations
•
18 Jul 2021TL;DR: This paper proposed a formalism that translates moral philosophy insights to the field of reinforcement learning, and trained agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories.
Abstract: An ambitious goal for artificial intelligence is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed. While ethical agents could be trained through reinforcement, by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement (both societally and among moral philosophers) about the nature of morality and what ethical theory (if any) is objectively correct. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. Inspired by such work, this paper proposes a formalism that translates such insights to the field of reinforcement learning. Demonstrating the formalism's potential, we then train agents in simple environments to act under moral uncertainty, highlighting how such uncertainty can help curb extreme behavior from commitment to single theories. The overall aim is to draw productive connections from the fields of moral philosophy and machine ethics to that of machine learning, to inspire further research by highlighting a spectrum of machine learning research questions relevant to training ethically capable reinforcement learning agents.
1 citations
Authors
Showing all 105 results
Name | H-index | Papers | Citations |
---|---|---|---|
Geoffrey E. Hinton | 157 | 414 | 409047 |
Pieter Abbeel | 126 | 589 | 70911 |
Ian Goodfellow | 85 | 137 | 135390 |
Ilya Sutskever | 75 | 131 | 235539 |
Kenneth O. Stanley | 60 | 223 | 16921 |
Phillip Isola | 48 | 101 | 45099 |
John Schulman | 48 | 67 | 30168 |
Jeff Clune | 48 | 140 | 21194 |
Wojciech Zaremba | 39 | 58 | 34954 |
Elizabeth A. Barnes | 39 | 132 | 5281 |
Igor Mordatch | 36 | 89 | 6604 |
Dario Amodei | 34 | 49 | 13108 |
Joel Lehman | 33 | 98 | 5588 |
Gillian K. Hadfield | 28 | 101 | 2420 |
Marcin Andrychowicz | 28 | 49 | 6638 |