scispace - formally typeset
Search or ask a question
Topic

Stochastic programming

About: Stochastic programming is a research topic. Over the lifetime, 12343 publications have been published within this topic receiving 421049 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.
Abstract: A new recursive algorithm of stochastic approximation type with the averaging of trajectories is investigated. Convergence with probability one is proved for a variety of classical optimization and identification problems. It is also demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.

1,970 citations

Book
01 Feb 2007
TL;DR: This research monograph is the authoritative and comprehensive treatment of the mathematical foundations of stochastic optimal control of discrete-time systems, including thetreatment of the intricate measure-theoretic issues.
Abstract: This research monograph is the authoritative and comprehensive treatment of the mathematical foundations of stochastic optimal control of discrete-time systems, including the treatment of the intricate measure-theoretic issues.

1,811 citations

Journal ArticleDOI
TL;DR: This paper characterize the desirable properties of a solution to models, when the problem data are described by a set of scenarios for their value, instead of using point estimates, and develops a general model formulation, called robust optimization RO, that explicitly incorporates the conflicting objectives of solution and model robustness.
Abstract: Mathematical programming models with noisy, erroneous, or incomplete data are common in operations research applications. Difficulties with such data are typically dealt with reactively-through sensitivity analysis-or proactively-through stochastic programming formulations. In this paper, we characterize the desirable properties of a solution to models, when the problem data are described by a set of scenarios for their value, instead of using point estimates. A solution to an optimization model is defined as: solution robust if it remains "close" to optimal for all scenarios of the input data, and model robust if it remains "almost" feasible for all data scenarios. We then develop a general model formulation, called robust optimization RO, that explicitly incorporates the conflicting objectives of solution and model robustness. Robust optimization is compared with the traditional approaches of sensitivity analysis and stochastic linear programming. The classical diet problem illustrates the issues. Robust optimization models are then developed for several real-world applications: power capacity expansion; matrix balancing and image reconstruction; air-force airline scheduling; scenario immunization for financial planning; and minimum weight structural design. We also comment on the suitability of parallel and distributed computer architectures for the solution of robust optimization models.

1,793 citations

Book
01 Jan 2002
TL;DR: This thesis proposes and studies actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies, and proves convergence of the algorithms for problems with general state and decision spaces.
Abstract: Many complex decision making problems like scheduling in manufacturing systems, portfolio management in finance, admission control in communication networks etc., with clear and precise objectives, can be formulated as stochastic dynamic programming problems in which the objective of decision making is to maximize a single “overall” reward. In these formulations, finding an optimal decision policy involves computing a certain “value function” which assigns to each state the optimal reward one would obtain if the system was started from that state. This function then naturally prescribes the optimal policy, which is to take decisions that drive the system to states with maximum value. For many practical problems, the computation of the exact value function is intractable, analytically and numerically, due to the enormous size of the state space. Therefore one has to resort to one of the following approximation methods to find a good sub-optimal policy: (1) Approximate the value function. (2) Restrict the search for a good policy to a smaller family of policies. In this thesis, we propose and study actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies. Actor-critic algorithms have two learning units: an actor and a critic. An actor is a decision maker with a tunable parameter. A critic is a function approximator. The critic tries to approximate the value function of the policy used by the actor, and the actor in turn tries to improve its policy based on the current approximation provided by the critic. Furthermore, the critic evolves on a faster time-scale than the actor. We propose several variants of actor-critic algorithms. In all the variants, the critic uses Temporal Difference (TD) learning with linear function approximation. Some of the variants are inspired by a new geometric interpretation of the formula for the gradient of the overall reward with respect to the actor parameters. This interpretation suggests a natural set of basis functions for the critic, determined by the family of policies parameterized by the actor's parameters. We concentrate on the average expected reward criterion but we also show how the algorithms can be modified for other objective criteria. We prove convergence of the algorithms for problems with general (finite, countable, or continuous) state and decision spaces. To compute the rate of convergence (ROC) of our algorithms, we develop a general theory of the ROC of two-time-scale algorithms and we apply it to study our algorithms. In the process, we study the ROC of TD learning and compare it with related methods such as Least Squares TD (LSTD). We study the effect of the basis functions used for linear function approximation on the ROC of TD. We also show that the ROC of actor-critic algorithms does not depend on the actual basis functions used in the critic but depends only on the subspace spanned by them and study this dependence. Finally, we compare the performance of our algorithms with other algorithms that optimize over a parameterized family of policies. We show that when only the “natural” basis functions are used for the critic, the rate of convergence of the actor critic algorithms is the same as that of certain stochastic gradient descent algorithms. However, with appropriate additional basis functions for the critic, we show that our algorithms outperform the existing ones in terms of ROC. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

1,766 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
86% related
Scheduling (computing)
78.6K papers, 1.3M citations
85% related
Optimal control
68K papers, 1.2M citations
84% related
Supply chain
84.1K papers, 1.7M citations
83% related
Markov chain
51.9K papers, 1.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023175
2022423
2021526
2020598
2019578
2018532