scispace - formally typeset
Search or ask a question
Author

Siddharth Mysore

Bio: Siddharth Mysore is an academic researcher from Boston University. The author has contributed to research in topics: Reinforcement learning & Control theory. The author has an hindex of 2, co-authored 8 publications receiving 7 citations.

Papers
More filters
Posted Content
TL;DR: Conditioning for Action Policy Smoothness (CAPS), an effective yet intuitive regularization on action policies, which offers consistent improvement in the smoothness of the learned state-to-action mappings of neural network controllers, reflected in the elimination of high-frequency components in the control signal.
Abstract: A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies. This trend often presents itself in the form of control signal oscillation and can result in poor control, high power consumption, and undue system wear. We introduce Conditioning for Action Policy Smoothness (CAPS), an effective yet intuitive regularization on action policies, which offers consistent improvement in the smoothness of the learned state-to-action mappings of neural network controllers, reflected in the elimination of high-frequency components in the control signal. Tested on a real system, improvements in controller smoothness on a quadrotor drone resulted in an almost 80% reduction in power consumption while consistently training flight-worthy controllers. Project website: this http URL

17 citations

Proceedings ArticleDOI
30 May 2021
TL;DR: In this article, the authors introduce Conditioning for Action Policy Smoothness (CAPS), an effective yet intuitive regularization on action policies, which offers consistent improvement in the smoothness of the learned state-to-action mappings of neural network controllers, reflected in the elimination of highfrequency components in the control signal.
Abstract: A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies. This trend often presents itself in the form of control signal oscillation and can result in poor control, high power consumption, and undue system wear. We introduce Conditioning for Action Policy Smoothness (CAPS), an effective yet intuitive regularization on action policies, which offers consistent improvement in the smoothness of the learned state-to-action mappings of neural network controllers, reflected in the elimination of high-frequency components in the control signal. Tested on a real system, improvements in controller smoothness on a quadrotor drone resulted in an almost 80% reduction in power consumption while consistently training flight-worthy controllers. Project website: http://ai.bu.edu/caps

16 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed reinforcement-based transferable agents through learning (RE+AL) for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms.
Abstract: We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure.Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality—what is known as the reality gap. To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.

7 citations

Proceedings ArticleDOI
28 Oct 2021
TL;DR: In this paper, a reinforcement learning (RL)-based framework was developed and utilized for the design of composite structures, which avoided the need for user-selected training data and achieved a success rate exceeding 90%.
Abstract: Advancements in additive manufacturing have enabled design and fabrication of materials and structures not previously realizable. In particular, the design space of composite materials and structures has vastly expanded, and the resulting size and complexity has challenged traditional design methodologies, such as brute force exploration and one factor at a time (OFAT) exploration, to find optimum or tailored designs. To address this challenge, supervised machine learning approaches have emerged to model the design space using curated training data; however, the selection of the training data is often determined by the user. In this work, we develop and utilize a Reinforcement learning (RL)-based framework for the design of composite structures which avoids the need for user-selected training data. For a 5 × 5 composite design space comprised of soft and compliant blocks of constituent material, we find that using this approach, the model can be trained using 2.78% of the total design space consists of 225 design possibilities. Additionally, the developed RL-based framework is capable of finding designs at a success rate exceeding 90%. The success of this approach motivates future learning frameworks to utilize RL for the design of composites and other material systems.

5 citations

Posted Content
TL;DR: It is demonstrated that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents, and is shown to consistently train agents that are flight-capable and with minimal degradation in controller quality upon transfer.
Abstract: We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers %intended for deployment on real hardware as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality - what is known as the reality gap. To combat issues of instability in RL agents, we propose a systematic framework, `REinforcement-based transferable Agents through Learning' (RE+AL), for designing simulated training environments which preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight-capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control and reduced power consumption.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper , a combination of GAN and actor-critic reinforcement learning is proposed to generate realistic 3D microstructures with controlled structural properties using the combination of generative adversarial networks (GAN) and reinforcement learning.
Abstract: For material modeling and discovery, synthetic microstructures play a critical role as digital twins. They provide stochastic samples upon which direct numerical simulations can be conducted to populate material databases. A large ensemble of simulation data on synthetic microstructures may provide supplemental data to inform and refine macroscopic material models, which might not be feasible from physical experiments alone. However, synthesizing realistic microstructures with realistic microstructural attributes is highly challenging. Thus, it is often oversimplified via rough approximations that may yield an inaccurate representation of the physical world. Here, we propose a novel deep learning method that can synthesize realistic three-dimensional microstructures with controlled structural properties using the combination of generative adversarial networks (GAN) and actor-critic (AC) reinforcement learning. The GAN-AC combination enables the generation of microstructures that not only resemble the appearances of real specimens but also yield user-defined physical quantities of interest (QoI). Our validation experiments confirm that the properties of synthetic microstructures generated by the GAN-AC framework are within a 5% error margin with respect to the target values. The scientific contribution of this paper resides in the novel design of the GAN-AC microstructure generator and the mathematical and algorithmic foundations therein. The proposed method will have a broad and substantive impact on the materials community by providing lenses for analyzing structure-property-performance linkages and for implementing the notion of 'materials-by-design'.

13 citations

Proceedings ArticleDOI
23 May 2022
TL;DR: In this paper , a series of time-lap tasks for an F1Tenth racing robot, equipped with a high-dimensional LiDAR sensor, on a set of test tracks with a gradual increase in their complexity is investigated.
Abstract: World models learn behaviors in a latent imagination space to enhance the sample-efficiency of deep reinforcement learning (RL) algorithms. While learning world models for high-dimensional observations (e.g., pixel inputs) has become practicable on standard RL benchmarks and some games, their effectiveness in real-world robotics applications has not been explored. In this paper, we investigate how such agents generalize to real-world autonomous vehicle control tasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with a high-dimensional LiDAR sensor, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the choice of their observation model. We provide extensive empirical evidence for the effectiveness of world models provided with long enough memory horizons in sim2real tasks.

8 citations

Proceedings ArticleDOI
15 Feb 2022
TL;DR: By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness.
Abstract: This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of policy and value functions learned mainly in RL contributes to these problems. However, if these functions are extremely smooth, their expressiveness would be lost, resulting in not obtaining the global optimal solution. This paper therefore considers RL under local Lipschitz continuity constraint, so-called L2C2. By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness. Numerical noisy simulations verified that the proposed L2C2 outperforms the task performance while smoothing out the robot action generated from the learned policy.

4 citations

Journal ArticleDOI
TL;DR: In this paper , a DRL-based cascade quadrotor flight controller is proposed to overcome the drawbacks of both conventional and learning-based methods, where the dynamics are decomposed into six subsystems, each containing only one degree of freedom (DOF) for the agent to control.
Abstract: Numerous algorithms have been proposed for quadrotor flight control. Conventional methods require massive labor of parameter adjustment. Deep reinforcement learning (DRL) methods also need enormous computation and complicated hyperparameter tuning, since most of them regard the quadrotor dynamics as a black box. To overcome the drawbacks of both conventional and learning-based methods, this letter proposes a DRL-based cascade quadrotor flight controller. Under the small-angle restriction, the quadrotor dynamics are decomposed into six subsystems, each containing only one degree of freedom (DOF) for the agent to control. Six agents are sequentially trained to fully control the corresponding DOF without any prior knowledge of quadrotor dynamic parameters. Experiments show that, with a total training time of 17 min, the proposed controller could accomplish the fixed point tracking task with an error lower than 5 mm, rise time lower than 1.5 s, and peak value lower than 1%. The controller could also track time-variant trajectories.

4 citations

Posted Content
TL;DR: In this article, the authors propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE).
Abstract: Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at this https URL.

3 citations