scispace - formally typeset
Search or ask a question

Showing papers by "Alexander Peysakhovich published in 2017"


Posted Content
TL;DR: This work shows how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice, provokable, and forgiving, and shows both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas.
Abstract: Social dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Building artificially intelligent agents that achieve good outcomes in these situations is important because many real world interactions include a tension between selfish interests and the welfare of others. We show how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice (begin by cooperating), provokable (try to avoid being exploited), and forgiving (try to return to mutual cooperation). We show both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Our construction does not require training methods beyond a modification of self-play, thus if an environment is such that good strategies can be constructed in the zero-sum case (eg. Atari) then we can construct agents that solve social dilemmas in this environment.

142 citations


Journal ArticleDOI
TL;DR: The authors compare standard economic models to ML models in the domain of uncertainty and risk, and show that under risk, the ML methods outperform the conventional economic models in terms of expected utility with probability weighting.
Abstract: How can behavioral science incorporate tools from machine learning (ML)? We propose that ML models can be used as upper bounds for the “explainable” variance in a given data set and thus serve as upper bounds for the potential power of a theory. We demonstrate this method in the domain of uncertainty. We ask over 600 individuals to make a total of 6000 choices with randomized parameters and compare standard economic models to ML models. In the domain of risk, a version of expected utility that allows for non-linear probability weighting (as in cumulative prospect theory) and individual-level parameters performs as well out-of-sample as ML techniques. By contrast, in the domain of ambiguity, two of the most widely studied models (a linear version of maximin preferences and second order expected utility) fail to compete with the ML methods. We open the “black boxes” of the ML methods and show that under risk we “rediscover” expected utility with probability weighting. However, in the case of ambiguity the form of ambiguity aversion implied by our ML models suggests that there is gain from theoretical work on a portable model of ambiguity aversion. Our results highlight ways in which behavioral scientists can incorporate ML techniques in their daily practice to gain genuinely new insights.

54 citations


Posted Content
TL;DR: In this paper, the authors extend existing work on reward-shaping in multi-agent reinforcement learning and show that making a single agent prosocial, that is, making them care about the rewards of their partners, can increase the probability that groups converge to good outcomes.
Abstract: Deep reinforcement learning has become an important paradigm for constructing agents that can enter complex multi-agent situations and improve their policies through experience. One commonly used technique is reactive training - applying standard RL methods while treating other agents as a part of the learner's environment. It is known that in general-sum games reactive training can lead groups of agents to converge to inefficient outcomes. We focus on one such class of environments: Stag Hunt games. Here agents either choose a risky cooperative policy (which leads to high payoffs if both choose it but low payoffs to an agent who attempts it alone) or a safe one (which leads to a safe payoff no matter what). We ask how we can change the learning rule of a single agent to improve its outcomes in Stag Hunts that include other reactive learners. We extend existing work on reward-shaping in multi-agent reinforcement learning and show that that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes. Thus, even if we control a single agent in a group making that agent prosocial can increase our agent's long-run payoff. We show experimentally that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.

35 citations


Posted Content
TL;DR: In this paper, the authors show that in a large class of games good strategies can be constructed by conditioning one's behavior solely on outcomes (i.e., one's past rewards).
Abstract: Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of individual actions are hard to predict. We show that in a large class of games good strategies can be constructed by conditioning one's behavior solely on outcomes (ie. one's past rewards). We call this consequentialist conditional cooperation. We show how to construct such strategies using deep reinforcement learning techniques and demonstrate, both analytically and experimentally, that they are effective in social dilemmas beyond simple matrix games. We also show the limitations of relying purely on consequences and discuss the need for understanding both the consequences of and the intentions behind an action.

20 citations


Journal ArticleDOI
TL;DR: This article used the release of a popular augmented reality game Pokemon Go to study this phenomenon in a hybrid lab-field experiment and found that participants are much more cooperative when their partners are from the same team, demonstrating an ecologically valid occurrence of minimal group paradigm.
Abstract: A large body of laboratory-based research suggests that arbitrary group assignments (ie. "minimal groups") can lead to in-group bias. We use the release of a popular augmented reality game Pokemon Go to study this phenomenon in a hybrid lab-field experiment. We analyze the behavior of 940 Pokemon Go players randomly matched to other Pokemon Go players to participate in Prisoner's Dilemma games. We find that participants are much more cooperative when their partners is from the same Pokemon Go team, demonstrating an ecologically valid occurrence of the minimal group paradigm. We also use transformed outcome lasso regressions to look for heterogeneity in treatment effects. Machine learning, rather than manual data mining, minimizes overfitting and reduces susceptibility to multiple comparison issues and researcher degrees of freedom. We find one important moderator of the effect: the salience of Pokemon Go. As it's popularity wanes, so does the size of the group bias in our experiments. Thus our full set of results show that real-world minimal group bias is quick to arise but also potentially fragile.

8 citations


Proceedings ArticleDOI
27 Jun 2017
TL;DR: A large body of existing work in social science and computer science attempts to infer preferences of individuals from the actions they take is the rational choice model.
Abstract: A large body of existing work in social science as well as computer science attempts to infer preferences of individuals from the actions they take. This includes research areas such as industrial organization [4], marketing [1], political science [12], analysis of auctions [3], recommender systems [8], search engine ranking [9], and many others. The workhorse model used either implicitly or explicitly in these disparate literatures is the rational choice model.

5 citations


Posted Content
TL;DR: It is shown how a sparsity-inducing l0 regularization can reduce bias (and thus error) of interventional predictions, and a modified cross-validation procedure (IVCV) is proposed to feasibly select the regularization parameter.
Abstract: Scientific and business practices are increasingly resulting in large collections of randomized experiments. Analyzed together, these collections can tell us things that individual experiments in the collection cannot. We study how to learn causal relationships between variables from the kinds of collections faced by modern data scientists: the number of experiments is large, many experiments have very small effects, and the analyst lacks metadata (e.g., descriptions of the interventions). Here we use experimental groups as instrumental variables (IV) and show that a standard method (two-stage least squares) is biased even when the number of experiments is infinite. We show how a sparsity-inducing l0 regularization can --- in a reversal of the standard bias--variance tradeoff in regularization --- reduce bias (and thus error) of interventional predictions. Because we are interested in interventional loss minimization we also propose a modified cross-validation procedure (IVCV) to feasibly select the regularization parameter. We show, using a trick from Monte Carlo sampling, that IVCV can be done using summary statistics instead of raw data. This makes our full procedure simple to use in many real-world applications.

4 citations


Patent
09 Nov 2017
TL;DR: In this paper, a set of individual time series associated with users can be used to select a plurality of variables represented via the set of time series, including at least a first variable and a second variable, and one or more regression techniques can be applied to each of the variables.
Abstract: Systems, methods, and non-transitory computer-readable media can acquire a set of individual time series associated with a set of users. Each of the individual time series can be associated with a respective user out of the set of the users. A plurality of variables represented via the set of individual time series can be selected. The plurality of variables can include at least a first variable and a second variable. One or more regression techniques can be applied to at least the first variable and the second variable. A set of sensitivity metrics for the set of users can be determined based on the one or more regression techniques. A respective sensitivity metric out of the set of sensitivity metrics can be determined for each of the users.

1 citations


Patent
08 Jun 2017
TL;DR: In this article, a social networking system builds a quality controlled and desired population-representative pool of human raters to provide ratings on content items to improve a feed ranking model used for providing its users with more relevant content.
Abstract: A social networking system builds a quality controlled and desired population-representative pool of human raters to provide ratings on content items to improve a feed ranking model used for providing its users with more relevant content. The system identifies a pool of candidate human raters for providing ratings on a feed of content items. For each candidate human rater of the pool of candidate human raters, the system presents a feed of content items based on a feed ranking model, obtains ratings on the feed of content items, and determines a score representing the consistency of the obtained ratings, the representativeness of the pool of human raters, or the relevance of the content provided by the ranking model. The system uses the computed scores to modify the ranking model used to present content to its users for improving the relevance of the presented content.

1 citations


Patent
02 Nov 2017
TL;DR: In this article, one or more social engagement signals associated with the predefined geographical region can be acquired from non-transitory computer-readable media to identify a predefined geographic region out of a set of pre-defined geographical regions.
Abstract: Systems, methods, and non-transitory computer-readable media can identify a predefined geographical region out of a set of predefined geographical regions. One or more social engagement signals associated with the predefined geographical region can be acquired. One or more usage patterns for the predefined geographical region can be determined based on the one or more social engagement signals. Data analytics associated with the predefined geographical region can be provided based on the one or more usage patterns for the predefined geographical region.

1 citations