Terrestrial net primary production (NPP) quantifies the amount of atmospheric carbon fixed by plants and accumulated as biomass. Previous studies have shown that climate constraints were relaxing with increasing temperature and solar radiation, allowing an upward trend in NPP from 1982 through 1999. The past decade (2000 to 2009) has been the warmest since instrumental measurements began, which could imply continued increases in NPP; however, our estimates suggest a reduction in the global NPP of 0.55 petagrams of carbon. Large-scale droughts have reduced regional NPP, and a drying trend in the Southern Hemisphere has decreased NPP in that area, counteracting the increased NPP over the Northern Hemisphere. A continued decline in NPP would not only weaken the terrestrial carbon sink, but it would also intensify future competition between food demand and proposed biofuel production.

Drought-Induced Reduction in Global Terrestrial Net Primary Production from 2000 Through 2009

Dimensions of Reinforcement Learning

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection with simple annealing can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10$\times$10 board, using TD($\lambda$) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa($\lambda$) agent with SiLU and dSiLU hidden units.

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

A variety of Reinforcement Learning (RL) techniques blends with one or more techniques from Evolutionary Computation (EC) resulting in hybrid methods classified according to their goal, new focus, and their component methodologies. We denote this class of hybrid algorithmic techniques as the evolutionary computation versus reinforcement learning (ECRL) paradigm. This overview considers the entire spectrum of algorithmic aspects and proposes a novel methodology that analyses the technical resemblances and differences in ECRL. Our design analyses the motivation for each ECRL paradigm, the underlying natural models, the sub-component algorithmic techniques, as well as the properties of their ensemble.

Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms

Machine learning is impacting modern society at large, thanks to its increasing potential to effciently and effectively model complex and heterogeneous phenomena. While machine learning models can achieve very accurate predictions in many applications, they are not infallible. In some cases, machine learning models can deliver unreasonable outcomes. For example, deep neural networks for self-driving cars have been found to provide wrong steering directions based on the lighting conditions of street lanes (e.g., due to cloudy weather). In other cases, models can capture and reflect unwanted biases that were concealed in the training data. For example, deep neural networks used to predict likely jobs and social status of people based on their pictures, were found to consistently discriminate based on gender and ethnicity–this was later attributed to human bias in the labels of the training data.

Design and application of gene-pool optimal mixing evolutionary algorithms for genetic programming

The highly addictive stochastic puzzle game 2048 has recently invaded the Internet and mobile devices, stealing countless hours of players' lives. In this study we investigate the possibility of creating a game-playing agent capable of winning this game without incorporating human expertise or performing game tree search. For this purpose, we employ three variants of temporal difference learning to acquire i) action value, ii) state value, and iii) afterstate value functions for evaluating player moves at 1-ply. To represent these functions we adopt n-tuple networks, which have recently been successfully applied to Othello and Connect 4. The conducted experiments demonstrate that the learning algorithm using afterstate value functions is able to consistently produce players winning over 97% of games. These results show that n-tuple networks combined with an appropriate learning algorithm have large potential, which could be exploited in other board games.

/pdf/temporal-difference-learning-of-n-tuple-networks-for-the-5fr36u9wtg.pdf

Temporal difference learning of N-tuple networks for the game 2048

This study investigates different methods of learning to play the game of Othello. The main questions posed concern scalability of algorithms with respect to the search space size and their capability to generalize and produce players that fare well against various opponents. The considered algorithms represent strategies as n-tuple networks, and employ self-play temporal difference learning (TDL), evolutionary learning (EL) and coevolutionary learning (CEL), and hybrids thereof. To assess the performance, three different measures are used: score against an a priori given opponent (a fixed heuristic strategy), against opponents trained by other methods (round-robin tournament), and against the top-ranked players from the online Othello League. We demonstrate that although evolutionary-based methods yield players that fare best against a fixed heuristic player, it is the coevolutionary temporal difference learning (CTDL), a hybrid of coevolution and TDL, that generalizes better and proves superior when confronted with a pool of previously unseen opponents. Moreover, CTDL scales well with the size of representation, attaining better results for larger n-tuple networks. By showing that a strategy learned in this way wins against the top entries from the Othello League, we conclude that it is one of the best 1-ply Othello players obtained to date without explicit use of human knowledge.

/pdf/on-scalability-generalization-and-hybridization-of-2wiotn9xhz.pdf

On Scalability, Generalization, and Hybridization of Coevolutionary Learning: A Case Study for Othello

We propose Coevolutionary Gradient Search, a blueprint for a family of iterative learning algorithms that combine elements of local search and population-based search. The approach is applied to learning Othello strategies represented as n-tuple networks, using different search operators and modes of learning. We focus on the interplay between the continuous, directed, gradient-based search in the space of weights, and fitness-driven, combinatorial, coevolutionary search in the space of entire n-tuple networks. In an extensive experiment, we assess both the objective and relative performance of algorithms, concluding that the hybridization of search techniques improves the convergence. The best algorithms not only learn faster than constituent methods alone, but also produce top ranked strategies in the online Othello League.

/pdf/learning-n-tuple-networks-for-othello-by-coevolutionary-1vpdjd07kv.pdf

Learning n-tuple networks for othello by coevolutionary gradient search

This paper presents Coevolutionary Temporal Difference Learning (CTDL), a novel way of hybridizing co-evolutionary search with reinforcement learning that works by interlacing one-population competitive coevolution with temporal difference learning. The coevolutionary part of the algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search. We apply CTDL to the board game of Othello, using weighted piece counter for representing players' strategies. The results of an extensive computational experiment demonstrate CTDL's superiority when compared to coevolution and reinforcement learning alone, particularly when coevolution maintains an archive to provide historical progress. The paper investigates the role of the relative intensity of coevolutionary search and temporal difference search, which turns out to be an essential parameter. The formulation of CTDL leads also to the introduction of Lamarckian form of coevolution, which we discuss in detail.

/pdf/coevolutionary-temporal-difference-learning-for-othello-46obkk1424.pdf

Coevolutionary Temporal Difference Learning for Othello

Recent developments cast doubts on the effectiveness of coevolutionary learning in interactive domains. A simple evolution with fitness evaluation based on games with random strategies has been found to generalize better than competitive coevolution. In an attempt to investigate this phenomenon, we analyze the utility of random opponents for one and two-population competitive coevolution applied to learning strategies for the game of Othello. We show that if coevolution uses two-population setup and engages also random opponents, it is capable of producing equally good strategies as evolution with random sampling for the expected utility performance measure. To investigate the differences between analyzed methods, we introduce performance profile, a tool that measures the player's performance against opponents of various strength. The profiles reveal that evolution with random sampling produces players coping well with mediocre opponents, but playing relatively poorly against stronger ones. This finding explains why in the round-robin tournament, evolution with random sampling is one of the worst methods from all those considered in this study.

/pdf/improving-coevolution-by-random-sampling-4f5hrxmkb6.pdf

Marcin Szubert

Papers

Temporal difference learning of N-tuple networks for the game 2048

On Scalability, Generalization, and Hybridization of Coevolutionary Learning: A Case Study for Othello

Learning n-tuple networks for othello by coevolutionary gradient search

Coevolutionary Temporal Difference Learning for Othello

Improving coevolution by random sampling