scispace - formally typeset
Search or ask a question
Book

Linear complementarity, linear and nonlinear programming

01 Jan 1988-
About: The article was published on 1988-01-01 and is currently open access. It has received 1012 citations till now. The article focuses on the topics: Mixed complementarity problem & Complementarity theory.
Citations
More filters
Posted Content
TL;DR: A new vision of reinforcement learning is set forth, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved, and proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients.
Abstract: In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a Legendre transform. This allows temporal difference updates to occur in dual spaces, allowing a variety of important technical advantages. The Legendre transform elegantly generalizes past algorithms for solving reinforcement learning problems, such as natural gradient methods, which we show relate closely to the previously unconnected framework of mirror descent methods. Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in recent variants of gradient-based temporal difference learning. This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning. Finally, Legendre transforms enable a variety of other benefits, including modeling sparsity and domain geometry. Our work builds extensively on recent work on the convergence of saddle-point algorithms, and on the theory of monotone operators.

61 citations


Cites background from "Linear complementarity, linear and ..."

  • ...In an NCP, whenever the mapping function F is affine, that is F (x) = Mx + b, where M is an n × n matrix, then the corresponding NCP is called a linear complementarity problem (LCP) [131]....

    [...]

Proceedings Article
06 Dec 2010
TL;DR: It is demonstrated that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration and permit a form of modified policy iteration that can beused to approximate a "greedy" homotopy path, a generalization of the LARS-TD homotopic path that combines policy evaluation and optimization.
Abstract: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-the-shelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that canbeusedto approximate a "greedy" homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization.

61 citations


Cites background from "Linear complementarity, linear and ..."

  • ...Like the LCP, the BLCP has a unique solution whenM is a P-matrix1 and there exist algorithms which are guaranteed to find this solution [6, 7]....

    [...]

  • ...We note that if A is a Pmatrix, so is A−1 [18], that BLCPs for P-matrices have a unique solution for any q ([7], Chp....

    [...]

Book ChapterDOI
Christoph Glocker1
01 Jan 2006
TL;DR: In this paper, the standard impact constitutive laws from non-smooth dynamics are reviewed for planar frictional collisions and formulated in terms of set-valued maps and linear complementarity.
Abstract: Different methods to model and solve multi-contact collisions are presented in this report. The standard impact constitutive laws from non-smooth dynamics are reviewed for planar frictional collisions and formulated in terms of set-valued maps and linear complementarity. For the frictionless case, a geometric concept based on kinematic, kinetic and energetic compatibility is developed, which provides access to non-standard impact events as in Newton’s cradle. Within this context, Moreau’s impact law is reviewed and stated in various ways, providing even access to the collision problem at re-entrant corner points as an extension. Based on Moreau’s law, a geometric classification of impacts is proposed. Several examples are presented, such as the frictional reversible impact at a super ball, Newton’s cradle and the rocking rod.

60 citations


Cites background from "Linear complementarity, linear and ..."

  • ...A linear complementarity problem (LCP) is a problem of the following form, see e.g. Cottle et al. (1992) or Murty (1988) for a full account: For given A ∈ Rn,n and b ∈ Rn, find x ∈ Rn and y ∈ Rn such that the linear equation y = Ax + b holds together with the complementarity conditions yi ≥ 0, xi ≥…...

    [...]

Journal ArticleDOI
01 Mar 2015
TL;DR: This focus article reviews algorithms for convex QPs (in which the objective is a convex function) and provides pointers to various online resources about QPs.
Abstract: Optimization problems in which a quadratic objective function is optimized subject to linear constraints on the parameters are known as quadratic programming problemsQPs. This focus article reviews algorithms for convex QPs in which the objective is a convex function and provides pointers to various online resources about QPs. WIREs Comput Stat 2015, 7:153-159. doi: 10.1002/wics.1344

60 citations

Book ChapterDOI
TL;DR: The classical Lemke-Howson algorithm finds one equilibrium of a bimatrix game, and provides an elementary proof that a Nash equilibrium exists, and can be given a strong geometric intuition using graphs that show the subdivision of the players' mixed strategy sets into best-response regions.
Abstract: This paper is a self-contained survey of algorithms for computing Nash equilibria of two-person games. The games may be given in strategic form or extensive form. The classical Lemke-Howson algorithm finds one equilibrium of a bimatrix game, and provides an elementary proof that a Nash equilibrium exists. It can be given a strong geometric intuition using graphs that show the subdivision of the players' mixed strategy sets into best-response regions. The Lemke-Howson algorithm is presented with these graphs, as well as algebraically in terms of complementary pivoting. Degenerate games require a refinement of the algorithm based on lexicographic perturbations. Commonly used definitions of degenerate games are shown as equivalent. The enumeration of all equilibria is expressed as the problem of finding matching vertices in pairs of polytopes. Algorithms for computing simply stable equilibria and perfect equilibria are explained. The computation of equilibria for extensive games is difficult for larger games since the reduced strategic form may be exponentially large compared to the game tree. If the players have perfect recall, the sequence form of the extensive game is a strategic description that is more suitable for computation. In the sequence form, pure strategies of a player are replaced by sequences of choices along a play in the game. The sequence form has the same size as the game tree, and can be used for computing equilibria with the same methods as the strategic form. The paper concludes with remarks on theoretical and practical issues of concern to these computational approaches.

59 citations