scispace - formally typeset
Search or ask a question

Showing papers by "Robert Babuska published in 2016"


Proceedings ArticleDOI
24 Jul 2016
TL;DR: This paper proposes a deep convolutional neural network solution to the analysis of image data for the detection of rail surface defects, and compares the results of different network architectures characterized by different sizes and activation functions.
Abstract: In this paper, we propose a deep convolutional neural network solution to the analysis of image data for the detection of rail surface defects. The images are obtained from many hours of automated video recordings. This huge amount of data makes it impossible to manually inspect the images and detect rail surface defects. Therefore, automated detection of rail defects can help to save time and costs, and to ensure rail transportation safety. However, one major challenge is that the extraction of suitable features for detection of rail surface defects is a non-trivial and difficult task. Therefore, we propose to use convolutional neural networks as a viable technique for feature learning. Deep convolutional neural networks have recently been applied to a number of similar domains with success. We compare the results of different network architectures characterized by different sizes and activation functions. In this way, we explore the efficiency of the proposed deep convolutional neural network for detection and classification. The experimental results are promising and demonstrate the capability of the proposed approach.

260 citations


Journal ArticleDOI
TL;DR: This paper considers optimal output synchronization of heterogeneous linear multi-agent systems and shows that this optimal distributed approach implicitly solves the output regulation equations without actually doing so.

128 citations


Journal ArticleDOI
TL;DR: A comprehensive review of the current learning and adaptive control methodologies that have been adapted specifically to PH systems, and highlights the changes from the general setting due to PH model, followed by a detailed presentation of the respective control algorithm.
Abstract: Port-Hamiltonian (PH) theory is a novel, but well established modeling framework for nonlinear physical systems. Due to the emphasis on the physical structure and modular framework, PH modeling has become a prime focus in system theory. This has led to a considerable research interest in the control of PH systems, resulting in numerous nonlinear control techniques. General nonlinear control methodologies are classified in a spectrum from model-based to model-free, where adaptation and learning typically lie close to the end of the range. Various articles and monographs have provided a detailed overview of model-based control techniques on PH models, but no survey is specifically dedicated to the learning and adaptive control methods that can benefit from the PH structure. To this end, we provide a comprehensive review of the current learning and adaptive control methodologies that have been adapted specifically to PH systems. After establishing the required theoretical background, we elaborate on various general machine learning, iterative learning, and adaptive control techniques and their application to PH systems. For each method we highlight the changes from the general setting due to PH model, followed by a detailed presentation of the respective control algorithm. In general, the advantages of using PH models in learning and adaptive controllers are: i) Prior knowledge in the form of PH model speeds up the learning. ii) In some instances new stability or convergence guarantees are obtained by having a PH model. iii) The resulting control laws can be interpreted in the context of physical systems. We conclude the paper with notes on open research issues.

60 citations


Journal ArticleDOI
TL;DR: In this paper, a sampling approach is proposed to estimate the domain of attraction (DoA) of nonlinear systems in real time, which is validated to approximate the DoAs of stable equilibria.
Abstract: Most stabilizing controllers designed for nonlinear systems are valid only within a specific region of the state space, called the domain of attraction (DoA). Computation of the DoA is usually costly and time-consuming. This paper proposes a computationally effective sampling approach to estimate the DoAs of nonlinear systems in real time. This method is validated to approximate the DoAs of stable equilibria in several nonlinear systems. In addition, it is implemented for the passivity-based learning controller designed for a second-order dynamical system. Simulation and experimental results show that, in all cases studied, the proposed sampling technique quickly estimates the DoAs, corroborating its suitability for real-time applications.

54 citations


Proceedings ArticleDOI
01 Dec 2016
TL;DR: A new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task.
Abstract: Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.

54 citations


Journal ArticleDOI
TL;DR: An event-driven control approach that enables the realization of active running, walking, and walk-run transitions in a unified framework is presented and a novel analytical approximate solution to the otherwise nonintegrable double-stance dynamics of the SLIP model is proposed.
Abstract: This paper addresses the control of steady state and transition behaviors for the bipedal spring-loaded inverted pendulum (SLIP) model. We present an event-driven control approach that enables the realization of active running, walking, and walk–run transitions in a unified framework. The synthesis of the controlled behaviors is illustrated by the notion of hybrid automaton in which different gaits are generated as the sequential composition of SLIP's primary phases of motion. We also propose a novel analytical approximate solution to the otherwise nonintegrable double-stance dynamics of the SLIP model. The analytical simplicity of the solution is utilized in the design and analysis of dynamic walking gaits suitable for online implementation. The accuracy of the approximate solution and its influence on the stability properties of the controlled system are carefully analyzed. Finally, we present two simulation examples. The first demonstrates the practicality of the proposed control strategy in creating human-like gaits and gait transitions. In the second example, we use the controlled SLIP as a planner for the control of a multibody bipedal robot model, and embed SLIP-like behaviors into a physics-based robot simulation model. The results corroborate both the practical utility and effectiveness of the proposed approach.

47 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: An experience replay method is proposed that ensures that the distribution of the experiences used for training is between that of the policy and a uniform distribution, which reduces the need for sustained exhaustive exploration during learning and is attractive in scenarios where sustained exploration is in-feasible or undesirable.
Abstract: Recent years have seen a growing interest in the use of deep neural networks as function approximators in reinforcement learning. In this paper, an experience replay method is proposed that ensures that the distribution of the experiences used for training is between that of the policy and a uniform distribution. Through experiments on a magnetic manipulation task it is shown that the method reduces the need for sustained exhaustive exploration during learning. This makes it attractive in scenarios where sustained exploration is in-feasible or undesirable, such as for physical systems like robots and for life long learning. The method is also shown to improve the generalization performance of the trained policy, which can make it attractive for transfer learning. Finally, for small experience databases the method performs favorably when compared to the recently proposed alternative of using the temporal difference error to determine the experience sample distribution, which makes it an attractive option for robots with limited memory capacity.

40 citations


Journal ArticleDOI
TL;DR: In this article, a defect-based risk analysis methodology for estimating rail failure risk is proposed, which relies on an evolution model addressing the severity level of rail surface defect, called squat.

32 citations


Journal ArticleDOI
TL;DR: In this paper, an adaptive friction compensation scheme is proposed, where the friction force is computed as a time-varying friction coefficient multiplied by the sign of the velocity and an online update law is designed to estimate this coefficient based on the actual position and velocity errors.
Abstract: In this paper, an adaptive friction compensation scheme is proposed. The friction force is computed as a time-varying friction coefficient multiplied by the sign of the velocity and an online update law is designed to estimate this coefficient based on the actual position and velocity errors. Furthermore, a modified signum function definition is proposed to better capture the behavior of friction over varying velocity profiles than the signum function commonly used in friction models. The properties of the closed-loop behavior of the proposed scheme are analyzed and stability of the closed-loop system is proven. Simulations and real-world experimental results are provided to confirm the theoretical findings: the compensator is able to eliminate steady-state errors, significantly decrease the stick–slip effect and compensate even rapidly varying friction forces.

31 citations


Proceedings ArticleDOI
01 Dec 2016
TL;DR: Experimental results and evaluation of a compensation method which improves the tracking performance of a nominal feedback controller by means of reinforcement learning (RL) have shown that the proposed RL-based compensation method significantly improves the performance of the nominal feedback controllers.
Abstract: In this article we provide experimental results and evaluation of a compensation method which improves the tracking performance of a nominal feedback controller by means of reinforcement learning (RL). The compensator is based on the actor-critic scheme and it adds a correction signal to the nominal control input with the goal to improve the tracking performance using on-line learning. The algorithm has been evaluated on a 6 DOF industrial robot manipulator with the objective to accurately track different types of reference trajectories. An extensive experimental study has shown that the proposed RL-based compensation method significantly improves the performance of the nominal feedback controller.

25 citations


Proceedings ArticleDOI
01 Dec 2016
TL;DR: A novel method based on genetic programming is proposed to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived, which outperforms the standard policy derivation method.
Abstract: This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

Journal ArticleDOI
TL;DR: A control scheme is proposed for nonredundant CMG systems in which oscillations at saturated states are avoided and all remaining singularities are efficiently escaped by exploiting the system geometry.
Abstract: Gyroscopic actuation is appealing for wearable applications due to its ability to impart free moments on a body without exoskeletal structures on the joints. We recently proposed an unobtrusive balancing aid consisting of multiple parallel-mounted control moment gyroscopes (CMGs) contained within a backpack-like orthopedic corset. Using conventional CMG control techniques, geometric singularities result in a number of performance issues, including either unintended oscillations or freezing of the gimbals at certain alignments, which are typically mitigated by the addition of redundant actuators or by allowing errors in the generated moment; however, because of the minimalistic design of the proposed device and focus on accurate moment tracking, a new methodology is required. In this paper, a control scheme is proposed for nonredundant CMG systems in which oscillations at saturated states are avoided and all remaining singularities are efficiently escaped by exploiting the system geometry; due to its use of classification-specific singularity proximity measures that account for the command moment orientation, it is named the directional singularity-robust control law. The performance of this control law is assessed in both simulations and hardware testing. The proposed method is suitable for a wide range of CMG systems, including both balancing and aerospace applications.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: A closed-loop force sensor based nested admittance/impedance control strategy to actively estimate and minimize the effects of geometric misalignment that naturally occur during assembly tasks with compliant robots is proposed.
Abstract: In this paper, we propose a closed-loop force sensor based nested admittance/impedance control strategy to actively estimate and minimize the effects of geometric misalignment that naturally occur during assembly tasks with compliant robots. The method allows the robot to be used with a stiff impedance control setting, which is beneficial for free air motion performance, yet allows to adjust for large misalignment errors between parts that need be assembled.

Journal ArticleDOI
TL;DR: A learning approach to augment the standard sequential composition framework by using online learning to handle unforeseen situations and the results show that in both cases a new controller can be rapidly learned and added to the supervisory control structure.
Abstract: Sequential composition is an effective supervisory control method for addressing control problems in nonlinear dynamical systems. It executes a set of controllers sequentially to achieve a control specification that cannot be realized by a single controller. As these controllers are designed offline, sequential composition cannot address unmodeled situations that might occur during runtime. This paper proposes a learning approach to augment the standard sequential composition framework by using online learning to handle unforeseen situations. New controllers are acquired via learning and added to the existing supervisory control structure. In the proposed setting, learning experiments are restricted to take place within the domain of attraction (DOA) of the existing controllers. This guarantees that the learning process is safe (i.e., the closed loop system is always stable). In addition, the DOA of the new learned controller is approximated after each learning trial. This keeps the learning process short as learning is terminated as soon as the DOA of the learned controller is sufficiently large. The proposed approach has been implemented on two nonlinear systems: 1) a nonlinear mass-damper system and 2) an inverted pendulum. The results show that in both cases a new controller can be rapidly learned and added to the supervisory control structure.

Journal ArticleDOI
TL;DR: The significance of this result is in showing guaranteed stable gaits and gait switching, and also a systematic methodology for synthesizing controllers that allow for legged robots to change rhythms fast.
Abstract: It has been shown that max-plus linear systems are well suited for applications in synchronization and scheduling, such as the generation of train timetables, manufacturing, or traffic. In this paper we show that the same is true for multi-legged locomotion. In this framework, the max-plus eigenvalue of the system matrix represents the total cycle time, whereas the max-plus eigenvector dictates the steady-state behavior. Uniqueness of the eigenstructure also indicates uniqueness of the resulting behavior. For the particular case of legged locomotion, the movement of each leg is abstracted to two-state circuits: swing and stance (leg in flight and on the ground, respectively). The generation of a gait (a manner of walking) for a multi-legged robot is then achieved by synchronizing the multiple discrete-event cycles via the max-plus framework. By construction, different gaits and gait parameters can be safely interleaved by using different system matrices. In this paper we address both the transient and steady-state behavior for a class of gaits by presenting closed-form expressions for the max-plus eigenvalue and max-plus eigenvector of the system matrix and the coupling time. The significance of this result is in showing guaranteed stable gaits and gait switching, and also a systematic methodology for synthesizing controllers that allow for legged robots to change rhythms fast.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel multivariable control strategy based on PI auto-tuning is proposed by combining the aforementioned model with optimization of the desired (time-varying) equilibria and zone setpoint temperature, which can lead to important energy savings.
Abstract: The field of energy efficiency in buildings offers challenging opportunities from a control point of view. Heating, Ventilation and Air-Conditioning (HVAC) units in buildings must be accurately controlled so as to ensure the occupants' comfort and reduced energy consumption. While the existing HVAC models consist of only one or a few HVAC components, this work involves the development of a complete HVAC model for one thermal zone. Also, a novel multivariable control strategy based on PI auto-tuning is proposed by combining the aforementioned model with optimization of the desired (time-varying) equilibria. One of the advantages of the proposed PI strategy is the use of time-varying input equilibria and zone setpoint temperature, which can lead to important energy savings. A comparison with a baseline control strategy with constant setpoint temperature is presented: the comparison results show good tracking performance and improved energy efficiency in terms of HVAC energy consumption.

Book ChapterDOI
01 Jul 2016
TL;DR: Two variants of hybrid SNGP utilizing a linear regression technique, LASSO, to improve its performance are proposed and are compared to the state-of-the-art symbolic regression methods.
Abstract: This paper presents a first step of our research on designing an effective and efficient GP-based method for symbolic regression. First, we propose three extensions of the standard Single Node GP, namely 1 a selection strategy for choosing nodes to be mutated based on depth and performance of the nodes, 2 operators for placing a compact version of the best-performing graph to the beginning and to the end of the population, respectively, and 3 a local search strategy with multiple mutations applied in each iteration. All the proposed modifications have been experimentally evaluated on five symbolic regression benchmarks and compared with standard GP and SNGP. The achieved results are promising showing the potential of the proposed modifications to improve the performance of the SNGP algorithm. We then propose two variants of hybrid SNGP utilizing a linear regression technique, LASSO, to improve its performance. The proposed algorithms have been compared to the state-of-the-art symbolic regression methods that also make use of the linear regression techniques on four real-world benchmarks. The results show the hybrid SNGP algorithms are at least competitive with or better than the compared methods.

Journal ArticleDOI
TL;DR: Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.

Book ChapterDOI
30 Jun 2016
TL;DR: It is empirically demonstrated that the decentralized architecture outperforms its centralized counterpart in terms of the learning time, while using less computational resources.
Abstract: In this paper, decentralized reinforcement learning is applied to a control problem with a multidimensional action space. We propose a decentralized reinforcement learning architecture for a mobile robot, where the individual components of the commanded velocity vector are learned in parallel by separate agents. We empirically demonstrate that the decentralized architecture outperforms its centralized counterpart in terms of the learning time, while using less computational resources. The method is validated on two problems: an extended version of the 3-dimensional mountain car, and a ball-pushing behavior performed with a differential-drive robot, which is also tested on a physical setup.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce an auxiliary system whose behavior under certain conditions is approximately equivalent to the B-SLIP in double-stance, and derive approximate solutions to the dynamics of the new system following two different methods: (i) an updated-momentum approach that can deal with both the lossy and lossless BSLIP models, and (ii) perturbation-based approach following which only derive a solution to the lossless case.
Abstract: This paper introduces approximate time-domain solutions to the otherwise non-integrable double-stance dynamics of the 'bipedal' spring-loaded inverted pendulum (B-SLIP) in the presence of non-negligible damping. We first introduce an auxiliary system whose behavior under certain conditions is approximately equivalent to the B-SLIP in double-stance. Then, we derive approximate solutions to the dynamics of the new system following two different methods: (i) updated-momentum approach that can deal with both the lossy and lossless B-SLIP models, and (ii) perturbation-based approach following which we only derive a solution to the lossless case. The prediction performance of each method is characterized via a comprehensive numerical analysis. The derived representations are computationally very efficient compared to numerical integrations, and, hence, are suitable for online planning, increasing the autonomy of walking robots. Two application examples of walking gait control are presented. The proposed solutions can serve as instrumental tools in various fields such as control in legged robotics and human motion understanding in biomechanics.

Journal ArticleDOI
TL;DR: This work introduces a method to learn online, from data, the upper bounds that are used to guide the planning process, and characterizes the influence of the approximation error on the performance, and reveals that for small errors, learning-based planning performs better.