The Architecture Of Cognition

Review of digital twin about concepts, technologies, and industrial applications

Mathematical programming

Two sides of the same coin

Computers in Operations Research.

In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot's capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands.

https://www.mdpi.com/1424-8220/19/18/3837/pdf

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Scheduling parallel jobs with tentative runs and consolidation in the cloud

In modern training, entertainment and education applications, behavior trees (BTs) have already become a fantastic alternative to finite state machines (FSMs) in modeling and controlling autonomous agents. However, it is expensive and inefficient to create BTs for various task scenarios manually. Thus, the genetic programming (GP) approach has been devised to evolve BTs automatically but only received limited success. The standard GP approaches to evolve BTs fail to scale up and to provide good solutions, while GP approaches with domain-specific constraints can accelerate learning but need significant knowledge engineering effort. In this paper, we propose a modified approach, named evolving BTs with hybrid constraints (EBT-HC), to improve the evolution of BTs for autonomous agents. We first propose a novel idea of dynamic constraint based on frequent sub-trees mining, which can accelerate evolution by protecting preponderant behavior sub-trees from undesired crossover. Then we introduce the existing ‘static’ structural constraint into our dynamic constraint to form the evolving BTs with hybrid constraints. The static structure can constrain expected BT form to reduce the size of the search space, thus the hybrid constraints would lead more efficient learning and find better solutions without the loss of the domain-independence. Preliminary experiments, carried out on the Pac-Man game environment, show that the hybrid EBT-HC outperforms other approaches in facilitating the BT design by achieving better behavior performance within fewer generations. Moreover, the generated behavior models by EBT-HC are human readable and easy to be fine-tuned by domain experts.

https://www.mdpi.com/2076-3417/8/7/1077/pdf

Learning Behavior Trees for Autonomous Agents with Hybrid Constraints Evolution

Recognizing destinations of a maneuvering agent is important in real time strategy games. Because finding path in an uncertain environment is essentially a sequential decision problem, we can model the maneuvering process by the Markov decision process (MDP). However, the MDP does not define an action duration. In this paper, we propose a novel semi-Markov decision model (SMDM). In the SMDM, the destination is regarded as a hidden state, which affects selection of an action; the action is affiliated with a duration variable, which indicates whether the action is completed. We also exploit a Rao-Blackwellised particle filter (RBPF) for inference under the dynamic Bayesian network structure of the SMDM. In experiments, we simulate agents’ maneuvering in a combat field and employ agents’ traces to evaluate the performance of our method. The results show that the SMDM outperforms another extension of the MDP in terms of precision, recall, and -measure. Destinations are recognized efficiently by our method no matter whether they are changed or not. Additionally, the RBPF infer destinations with smaller variance and less time than the SPF. The average failure rates of the RBPF are lower when the number of particles is not enough.

/pdf/a-semi-markov-decision-model-for-recognizing-the-destination-31ggbo1wma.pdf

A Semi-Markov Decision Model for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games

Goal recognition, which is the task of inferring an agent’s goals given some or all of the agent’s observed actions, is one of the important approaches in bridging the gap between the observation and decision making within an observe-orient-decide-act cycle. Unfortunately, few research focuses on how to improve the utilization of knowledge produced by a goal recognition system. In this work, we propose aMarkov Decision Process-based goal recognition approach tailored to a dynamic shortest-path local network interdiction (DSPLNI) problem. We first introduce a novel DSPLNI model and its solvable dual form so as to incorporate real-time knowledge acquired from goal recognition system. Then a Markov Decision Process-based goal recognition model along with its dynamic Bayesian network representation and the applied goal inference method is proposed to identify the evader’s real goal within the DSPLNI context. Based on that, we further propose an efficient scalable technique in maintaining action utility map used in fast goal inference, and develop a flexible resource assignment mechanism in DSPLNI using knowledge from goal recognition system. Experimental results show the effectiveness and accuracy of our methods both in goal recognition and dynamic network interdiction.

/pdf/bridging-the-gap-between-observation-and-decision-making-3var9d7cpb.pdf

Quanjun Yin

Papers

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Scheduling parallel jobs with tentative runs and consolidation in the cloud

Learning Behavior Trees for Autonomous Agents with Hybrid Constraints Evolution

A Semi-Markov Decision Model for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games

Bridging the Gap between Observation and Decision Making: Goal Recognition and Flexible Resource Allocation in Dynamic Network Interdiction