scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Parameterized value iteration for output reference model tracking of a high order nonlinear aerodynamic system

TL;DR: It is shown that a linearly parameterized VI such as the one used for linear systems is still effective for a nonlinear complex process and on similar performance level with that of a neural-network (NN)-based implementation that is more complex and takes significantly more time to learn.
Abstract: Linearly and nonlinearly parameterized approximated value iteration (VI) approaches used for output reference model (ORM) tracking control are proposed herein. The ORM problem is of significant interest in practice since, by selecting a linear ORM, the closed-loop control system is indirectly feedback linearized and value iteration (VI) offers the means to achieve this feedback linearization in a model-free manner. We show that a linearly parameterized VI such as the one used for linear systems is still effective for a nonlinear complex process and on similar performance level with that of a neural-network (NN)-based implementation that is more complex and takes significantly more time to learn. While the nonlinearly parameterized NN-based VI proves to be generally more robust to parameters selection, to dataset size and to exploration strategies. The case study is aimed at ORM tracking of a nonlinear two inputs-two outputs aerodynamic process as a representative high dimensional system. Convergence analysis accounting for approximation errors in the VI is also proposed.
Citations
More filters
01 Jan 2009
TL;DR: A transversal view through microfluidics theory and applications, covering different kinds of phenomena, from continuous to multiphase flow, and a vision of two phasemicrofluidic phenomena is given through nonlinear analyses applied to experimental time series.
Abstract: This paper first offers a transversal view through microfluidics theory and applications, starting from a brief overview on microfluidic systems and related theoretical issues, covering different kinds of phenomena, from continuous to multiphase flow. Multidimensional models, from lumped parameters to numerical models and computational solutions, are then considered as preliminary tools for the characterization of spatio-temporal dynamics in microfluidic flows. Following these, experimental approaches through original monitoring opto-electronic interfaces and systems are discussed. Finally, a vision of two phase microfluidic phenomena is given through nonlinear analyses applied to experimental time series.

261 citations

Journal ArticleDOI
TL;DR: It is found that, given a transition sample dataset and a general linear parameterization of the Q-function, the ORM tracking performance obtained with an approximate VI scheme can reach the performance level of a more general implementation using neural networks (NNs).
Abstract: This work suggests a solution for the output reference model (ORM) tracking control problem, based on approximate dynamic programming. General nonlinear systems are included in a control system (CS) and subjected to state feedback. By linear ORM selection, indirect CS feedback linearization is obtained, leading to favorable linear behavior of the CS. The Value Iteration (VI) algorithm ensures model-free nonlinear state feedback controller learning, without relying on the process dynamics. From linear to nonlinear parameterizations, a reliable approximate VI implementation in continuous state-action spaces depends on several key parameters such as problem dimension, exploration of the state-action space, the state-transitions dataset size, and a suitable selection of the function approximators. Herein, we find that, given a transition sample dataset and a general linear parameterization of the Q-function, the ORM tracking performance obtained with an approximate VI scheme can reach the performance level of a more general implementation using neural networks (NNs). Although the NN-based implementation takes more time to learn due to its higher complexity (more parameters), it is less sensitive to exploration settings, number of transition samples, and to the selected hyper-parameters, hence it is recommending as the de facto practical implementation. Contributions of this work include the following: VI convergence is guaranteed under general function approximators; a case study for a low-order linear system in order to generalize the more complex ORM tracking validation on a real-world nonlinear multivariable aerodynamic process; comparisons with an offline deep deterministic policy gradient solution; implementation details and further discussions on the obtained results.

8 citations


Cites background or methods from "Parameterized value iteration for o..."

  • ...Let the discrete-time known open-loop stable minimum-phase (MP) state-space deterministic strictly causal ORM be [12,46]...

    [...]

  • ...A discrete-time nonlinear unknown open-loop stable state-space deterministic strictly causal process is defined as [12,46] P : {xk+1 = f(xk, uk), yk = g(xk)}, (1)...

    [...]

  • ...Therefore it fits with the recent data-driven control [35–43] and reinforcement learning [12,44,45] applications....

    [...]

  • ...) to be minimized starting with x0 be [6,12,46]...

    [...]

  • ...Consider next that the extended state-space model that consists of (1), (2), and the state-space generative model of the reference input signal is, in the most general form [12,46]:...

    [...]

Journal ArticleDOI
TL;DR: The authors would like to mention that their paper is an extended version of the IEEE conference paper “Conference paper on parallel star grammars”, which describes the development of parallel star-spotting systems and their applications in the space of minutes.
Abstract: The authors would like to mention that their paper is an extended version of the IEEE conference paper [...]

Cites background or result from "Parameterized value iteration for o..."

  • ...The authors would like to mention that their paper is an extended version of the IEEE conference paper [1] from the same authors....

    [...]

  • ...The extended results contained in this article, with respect to the results in [1], are detailed at the end of the seventh paragraph of the Introduction Section, as follows: The main updates with respect to our paper [12] include the following: detailed IMF-AVI convergence proofs under general function approximators; a case study for a low order linear system in order to generalize the more complex ORM tracking validation on the TITOAS process; comparisons with an offline deep deterministic policy gradient solution; more implementation details and insightful discussions on the obtained results....

    [...]

  • ...Additionally, the references [1] and [2] (below) are better acknowledged throughout the revised manuscript as references [12] and [46], respectively....

    [...]

References
More filters
Journal ArticleDOI
26 Feb 2015-Nature
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

23,074 citations


"Parameterized value iteration for o..." refers background in this paper

  • ...Although successful stories on RL and ADP applied to large stateaction spaces are reported mainly with AI [6], in control theory, most approaches use low-order processes as representative case studies and mainly in linear quadratic regulator (LQR)-like settings....

    [...]

Journal ArticleDOI
TL;DR: This work describes mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming that give insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.
Abstract: Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.

1,163 citations

Journal ArticleDOI
01 Aug 2008
TL;DR: It is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control.
Abstract: Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.

919 citations

Journal ArticleDOI
TL;DR: The new design method is direct and can be applied using a single set of data generated by the plant, with no need for specific experiments nor iterations, and it is shown that the method searches for the global optimum of the design criterion.

901 citations


"Parameterized value iteration for o..." refers background in this paper

  • ...The VRFT prefilter is chosen as )()( zz ML ....

    [...]

  • ...The IO data }~,~{ kk yu is collected with low-amplitude zero-mean inputs 2,1, , kk uu , to maintain the process linearity around the mechanical equilibrium, such that to fit the linear VRFT design framework....

    [...]

  • ...Nonlinear (in particular, linear) state-feedback controllers can also be found by VRFT as shown in [23] to serve as initializations for the IMF-AVI....

    [...]

  • ...However, such input-output (IO) or input-state feedback controllers were traditionally not to be designed without using a process model, until the advent of data-driven modelfree controller design techniques that have appeared from the field of control theory: Virtual Reference Feedback Tuning (VRFT) [12], Iterative Feedback Tuning [13], data-driven Iterative Learning Control [1], [14], Model Free (Adaptive) Control [15], [16]....

    [...]

  • ...The linear VRFT output feedback error diagonal controller is )1/()(),()...

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors describe the use of reinforcement learning to design feedback controllers for discrete and continuous-time dynamical systems that combine features of adaptive control and optimal control, which are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions.
Abstract: This article describes the use of principles of reinforcement learning to design feedback controllers for discrete- and continuous-time dynamical systems that combine features of adaptive control and optimal control. Adaptive control [1], [2] and optimal control [3] represent different philosophies for designing feedback controllers. Optimal controllers are normally designed of ine by solving Hamilton JacobiBellman (HJB) equations, for example, the Riccati equation, using complete knowledge of the system dynamics. Determining optimal control policies for nonlinear systems requires the offline solution of nonlinear HJB equations, which are often difficult or impossible to solve. By contrast, adaptive controllers learn online to control unknown systems using data measured in real time along the system trajectories. Adaptive controllers are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions. Indirect adaptive controllers use system identification techniques to first identify the system parameters and then use the obtained model to solve optimal design equations [1]. Adaptive controllers may satisfy certain inverse optimality conditions [4].

841 citations