How to Do Things With Words

An Introduction to MultiAgent Systems.

Nos últimos trinta trinta anos, houve profundas mudanças na forma como compreendemos o crime e a justiça criminal. O crime tornou-se um evento simbólico, um verdadeiro teste para a ordem social e para as políticas governamentais, um desafio para a sociedade civil, para a democracia e para os direitos humanos. Segundo David Garland, professor da Faculdade de Direito da New York University, um dos principais autores no campo da Sociologia da Punição e com artigo publicado na Revista de Sociologia e Política , número 13, na modernidade tardia houve uma verdadeira obsessão securitária, direcionando as políticas criminais para um maior rigor em relação às penas e maior intolerância com o criminoso. Há trinta anos, nos EUA e na Inglaterra essa tendência era insuspeita. O livro mostra que os dois países compartilham intrigantes similaridades em suas práticas criminais, a despeito da divisão racial, das desigualdades econômicas e da letalidade violenta que marcam fortemente o cenário americano. Segundo David Garland, encontram-se nos dois países os “mesmos tipos de riscos e inseguranças, a mesma percepção a respeito dos problemas de um controle social não-efetivo, as mesmas críticas da justiça criminal tradicional, e as mesmas ansiedades recorrentes sobre mudança e ordem sociais”1 (GARLAND, 2001, p. 2). O argumento principal da obra é o seguinte: a modernidade tardia, esse distintivo padrão de relações sociais, econômicas e culturais, trouxe consigo um conjunto de riscos, inseguranças e problemas de controle social que deram uma configuração específica às nossas respostas ao crime, ao garantir os altos custos das políticas criminais, o grau máximo de duração das penas e a excessivas taxas de encarceramento.

http://www.scielo.br/pdf/rsocp/n20/n20a15.pdf

The culture of control: crime and social order in contemporary, society

人教版高中英语新课程教材中,语言运用（Using Language）是每个单元必不可少的部分,提供了围绕单元中心话题的听、说、读、写的综合性练习,是单元中心话题的延续和升华.如何设计Using Language部分的教学,使自己的教学模式既不落俗套,又能真正体现新课程标准所倡导的教学理念,正是广大一线英语教师一直努力探索的问题.

打磨Using Language,倡导新理念

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

In order for artificial agents to coordinate effectively with people, they must act consistently with existing conventions (e.g. how to navigate in traffic, which language to speak, or how to coordinate with teammates). A group's conventions can be viewed as a choice of equilibrium in a coordination game. We consider the problem of an agent learning a policy for a coordination game in a simulated environment and then using this policy when it enters an existing group. When there are multiple possible conventions we show that learning a policy via multi-agent reinforcement learning (MARL) is likely to find policies which achieve high payoffs at training time but fail to coordinate with the real group into which the agent enters. We assume access to a small number of samples of behavior from the true convention and show that we can augment the MARL objective to help it find policies consistent with the real group's convention. In three environments from the literature - traffic, communication, and team coordination - we observe that augmenting MARL with a small amount of imitation learning greatly increases the probability that the strategy found by MARL fits well with the existing social convention. We show that this works even in an environment where standard training methods very rarely find the true convention of the agent's partners.

Learning Existing Social Conventions via Observationally Augmented Self-Play

Computing market equilibria is an important practical problem for market design (e.g. fair division, item allocation). However, computing equilibria requires large amounts of information (e.g. all valuations for all buyers for all items) and compute power. We consider ameliorating these issues by applying a method used for solving complex games: constructing a coarsened abstraction of a given market, solving for the equilibrium in the abstraction, and lifting the prices and allocations back to the original market. We show how to bound important quantities such as regret, envy, Nash social welfare, Pareto optimality, and maximin share when the abstracted prices and allocations are used in place of the real equilibrium. We then study two abstraction methods of interest for practitioners: 1) filling in unknown valuations using techniques from matrix completion, 2) reducing the problem size by aggregating groups of buyers/items into smaller numbers of representative buyers/items and solving for equilibrium in this coarsened market. We find that in real data allocations/prices that are relatively close to equilibria can be computed from even very coarse abstractions.

Computing Large Market Equilibria using Abstractions

Nash equilibrium takes optimization as a primitive, but suboptimal behavior can persist in simple stochastic decision problems. This has motivated the development of other equilibrium concepts such as cursed equilibrium and behavioral equilibrium. We experimentally study a simple adverse selection (or â€œlemonsâ€ ) problem and find that learning models that heavily discount past information (i.e. display recency bias) explain patterns of behavior better than Nash, cursed or behavioral equilibrium. Providing counterfactual information or a record of past outcomes does little to aid convergence to optimal strategies, but providing sample averages (â€œrecapsâ€ ) gets individuals most of the way to optimality. Thus recency effects are not solely due to limited memory but stem from some other form of cognitive constraints. Our results show the importance of going beyond static optimization and incorporating features of human learning into economic models.

Recency, Records and Recaps: Learning and Non-Equilibrium Behavior in a Simple Decision Problem

Social conventions - arbitrary ways to organize group behavior - are an important part of social life. Any agent that wants to enter an existing society must be able to learn its conventions (e.g. which side of the road to drive on, which language to speak) from relatively few observations or risk being unable to coordinate with everyone else. We consider the game theoretic framework of David Lewis which views the selection of a social convention as the selection of an equilibrium in a coordination game. We ask how to construct reinforcement learning based agents that can solve the convention learning task in the self-play paradigm: at training time the agent has access to a good model of the environment and a small amount of observations about how individuals in society act. The agent then has to construct a policy that is compatible with the test-time social convention. We study three environments from the literature which have multiple conventions: traffic, communication, and risky coordination. In each of these we observe that adding a small amount of imitation learning during self-play training greatly increases the probability that the strategy found by self-play fits well with the social convention the agent will face at test time. We show that this works even in an environment where standard independent multi-agent RL very rarely finds the correct test-time equilibrium.

Learning Social Conventions in Markov Games.

The human willingness to pay costs to benefit anonymous others is often explained by social preferences: rather than only valuing their own material payoff, people also care in some fashion about the outcomes of others. But how successful is this concept of outcome-based social preferences for actually predicting out-of-sample behavior? We investigate this question by having 1067 human subjects each make 20 cooperation decisions, and using machine learning to predict their last 5 choices based on their first 15. We find that decisions can be predicted with high accuracy by models that include outcome-based features and allow for heterogeneity across individuals in baseline cooperativeness and the weights placed on the outcome-based features (AUC=0.89). It is not necessary, however, to have a fully heterogeneous model -- excellent predictive power (AUC=0.88) is achieved by a model that allows three different sets of baseline cooperativeness and feature weights (i.e. three behavioral types), defined based on the participant's cooperation frequency in the 15 training trials: those who cooperated at least half the time, those who cooperated less than half the time, and those who never cooperated. Finally, we provide evidence that this inclination to cooperate cannot be well proxied by other personality/morality survey measures or demographics, and thus is a natural kind (or "cooperative phenotype")

https://papers.ssrn.com/sol3/Delivery.cfm/SSRN_ID2737983_code1038894.pdf?abstractid=2737983&mirid=2

Alexander Peysakhovich

Papers

Learning Existing Social Conventions via Observationally Augmented Self-Play

Computing Large Market Equilibria using Abstractions

Recency, Records and Recaps: Learning and Non-Equilibrium Behavior in a Simple Decision Problem

Learning Social Conventions in Markov Games.

The Good, the Bad, and the Unflinchingly Selfish: Cooperative Decision-Making Can Be Predicted with High Accuracy Using Only Three Behavioral Types