PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems.

Home
/
Papers
/
PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems.

Posted Content•

PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems.

Ting-Han Fan¹, Xian Yeow Lee², Yubo Wang³•Institutions (3)

Princeton University¹, Iowa State University², Siemens³

08 Sep 2021-arXiv: Learning-

TL;DR: PowerGym as mentioned in this paper is an open-source RL environment for Volt-Var control in power distribution systems, which targets minimizing power loss and voltage violations under physical networked constraints.

read less

Abstract: We introduce PowerGym, an open-source reinforcement learning environment for Volt-Var control in power distribution systems. Following OpenAI Gym APIs, PowerGym targets minimizing power loss and voltage violations under physical networked constraints. PowerGym provides four distribution systems (13Bus, 34Bus, 123Bus, and 8500Node) based on IEEE benchmark systems and design variants for various control difficulties. To foster generalization, PowerGym offers a detailed customization guide for users working with their distribution systems. As a demonstration, we examine state-of-the-art reinforcement learning algorithms in PowerGym and validate the environment by studying controller behaviors. The repository is available at \url{this https URL}.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

A Graph Policy Network Approach for Volt-Var Control in Power Distribution Systems.

[...]

Xian Yeow Lee¹, Soumik Sarkar¹, Yubo Wang²•Institutions (2)

Iowa State University¹, Siemens²

24 Sep 2021-arXiv: Learning

TL;DR: In this paper, the authors proposed a framework that combines RL with graph neural networks and study the benefits and limitations of graph-based policy in the VVC setting, and showed that graphbased policies converge to the same rewards asymptotically however at a slower rate when compared to vector representation counterpart.

...read moreread less

Abstract: Volt-var control (VVC) is the problem of operating power distribution systems within healthy regimes by controlling actuators in power systems. Existing works have mostly adopted the conventional routine of representing the power systems (a graph with tree topology) as vectors to train deep reinforcement learning (RL) policies. We propose a framework that combines RL with graph neural networks and study the benefits and limitations of graph-based policy in the VVC setting. Our results show that graph-based policies converge to the same rewards asymptotically however at a slower rate when compared to vector representation counterpart. We conduct further analysis on the impact of both observations and actions: on the observation end, we examine the robustness of graph-based policy on two typical data acquisition errors in power systems, namely sensor communication failure and measurement misalignment. On the action end, we show that actuators have various impacts on the system, thus using a graph representation induced by power systems topology may not be the optimal choice. In the end, we conduct a case study to demonstrate that the choice of readout function architecture and graph augmentation can further improve training performance and robustness.

...read moreread less

4 citations

Posted Content•

Soft Actor-Critic With Integer Actions.

[...]

Ting-Han Fan¹, Yubo Wang²•Institutions (2)

Princeton University¹, Siemens²

17 Sep 2021-arXiv: Learning

TL;DR: In this paper, the Soft Actor-Critic (SAC) algorithm with an integer reparameterization is proposed for reinforcement learning under integer actions, where their discrete structure can be simplified using their comparability property.

...read moreread less

Abstract: Reinforcement learning is well-studied under discrete actions. Integer actions setting is popular in the industry yet still challenging due to its high dimensionality. To this end, we study reinforcement learning under integer actions by incorporating the Soft Actor-Critic (SAC) algorithm with an integer reparameterization. Our key observation for integer actions is that their discrete structure can be simplified using their comparability property. Hence, the proposed integer reparameterization does not need one-hot encoding and is of low dimensionality. Experiments show that the proposed SAC under integer actions is as good as the continuous action version on robot control tasks and outperforms Proximal Policy Optimization on power distribution systems control tasks.

...read moreread less

1 citations

References

PDF

Open Access

More filters

Posted Content•

Proximal Policy Optimization Algorithms

[...]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - Show less +1 more

20 Jul 2017-arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

...read moreread less

9,020 citations

Proceedings Article•

Asynchronous methods for deep reinforcement learning

[...]

Volodymyr Mnih¹, Adrià Puigdomènech Badia¹, Mehdi Mirza², Alex Graves¹, Tim Harley¹, Timothy P. Lillicrap¹, David Silver¹, Koray Kavukcuoglu¹ - Show less +4 more•Institutions (2)

Google¹, Université de Montréal²

19 Jun 2016

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

Abstract: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

6,736 citations

Proceedings Article•DOI•

MuJoCo: A physics engine for model-based control

[...]

Emanuel Todorov¹, Tom Erez¹, Yuval Tassa¹•Institutions (1)

University of Washington¹

24 Dec 2012

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.

...read moreread less

Abstract: We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.

...read moreread less

4,018 citations

Proceedings Article•

Deterministic Policy Gradient Algorithms

[...]

David Silver, Guy Lever¹, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller - Show less +2 more•Institutions (1)

University College London¹

21 Jun 2014

TL;DR: This paper introduces an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy and demonstrates that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.

...read moreread less

Abstract: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.

...read moreread less

2,174 citations

Posted Content•

Soft Actor-Critic Algorithms and Applications

[...]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine - Show less +7 more

13 Dec 2018-arXiv: Learning

TL;DR: Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.

...read moreread less

Abstract: Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. In this framework, the actor aims to simultaneously maximize expected return and entropy. That is, to succeed at the task while acting as randomly as possible. We extend SAC to incorporate a number of modifications that accelerate training and improve stability with respect to the hyperparameters, including a constrained formulation that automatically tunes the temperature hyperparameter. We systematically evaluate SAC on a range of benchmark tasks, as well as real-world challenging tasks such as locomotion for a quadrupedal robot and robotic manipulation with a dexterous hand. With these improvements, SAC achieves state-of-the-art performance, outperforming prior on-policy and off-policy methods in sample-efficiency and asymptotic performance. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving similar performance across different random seeds. These results suggest that SAC is a promising candidate for learning in real-world robotics tasks.

...read moreread less

1,209 citations