scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic programming published in 2015"


Journal ArticleDOI
TL;DR: The works that have contributed to the modeling and computational aspects of stochastic optimization (SO) based UC are reviewed to help transform research advances into real-world applications.
Abstract: Optimization models have been widely used in the power industry to aid the decision-making process of scheduling and dispatching electric power generation resources, a process known as unit commitment (UC). Since UC’s birth, there have been two major waves of revolution on UC research and real life practice. The first wave has made mixed integer programming stand out from the early solution and modeling approaches for deterministic UC, such as priority list, dynamic programming, and Lagrangian relaxation. With the high penetration of renewable energy, increasing deregulation of the electricity industry, and growing demands on system reliability, the next wave is focused on transitioning from traditional deterministic approaches to stochastic optimization for unit commitment. Since the literature has grown rapidly in the past several years, this paper is to review the works that have contributed to the modeling and computational aspects of stochastic optimization (SO) based UC. Relevant lines of future research are also discussed to help transform research advances into real-world applications.

519 citations


Proceedings Article
06 Jul 2015
TL;DR: Stochastic optimization, including prox-SMD and prox-SDCA, is studied with importance sampling, which improves the convergence rate by reducing the stochastic variance, and theoretically analyze and empirically validate their effectiveness.
Abstract: Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Mirror Descent (prox-SMD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a rather high variance, which negatively affects the convergence of the underlying optimization procedure. In this paper we study stochastic optimization, including prox-SMD and prox-SDCA, with importance sampling, which improves the convergence rate by reducing the stochastic variance. We theoretically analyze the algorithms and empirically validate their effectiveness.

313 citations


Journal ArticleDOI
TL;DR: This study proposes a bi-objective mixed possibilistic, two-stage stochastic programming model to address supplier selection and order allocation problem to build the resilient supply base under operational and disruption risks.
Abstract: This study proposes a bi-objective mixed possibilistic, two-stage stochastic programming model to address supplier selection and order allocation problem to build the resilient supply base under operational and disruption risks. The model accounts for epistemic uncertainty of critical data and applies several proactive strategies such as suppliers’ business continuity plans, fortification of suppliers and contracting with backup suppliers to enhance the resilience level of the selected supply base. A five-step method is designed to solve the problem efficiently. The computational results demonstrate the significant impact of considering disruptive events on the selected supply base.

285 citations


Journal ArticleDOI
TL;DR: A stochastic programming framework for conducting optimal 24-h scheduling of CHP-based MGs consisting of wind turbine, fuel cell, boiler, a typical power-only unit, and energy storage devices is presented.
Abstract: Microgrids (MGs) are considered as a key solution for integrating renewable and distributed energy resources, combined heat and power (CHP) systems, as well as distributed energy-storage systems This paper presents a stochastic programming framework for conducting optimal 24-h scheduling of CHP-based MGs consisting of wind turbine, fuel cell, boiler, a typical power-only unit, and energy storage devices The objective of scheduling is to find the optimal set points of energy resources for profit maximization considering demand response programs and uncertainties The impact of the wind speed, market, and MG load uncertainties on the MG scheduling problem is characterized through a stochastic programming formulation This paper studies three cases to confirm the performance of the proposed model The effect of CHP-based MG scheduling in the islanded and grid-connected modes, as well as the effectiveness of applying the proposed DR program is investigated in the case studies

247 citations


Journal ArticleDOI
TL;DR: A stochastic model predictive control-based energy management strategy using the vehicle location, traveling direction, and terrain information of the area for HEVs running in hilly regions with light traffic is proposed and shown that the developed method can help maintaining the battery SoC within its boundaries and achieve good energy consumption performance.
Abstract: The energy efficiency of parallel hybrid electric vehicles (HEVs) can degrade significantly when the battery state-of-charge (SoC) reaches its boundaries. The road grade has a great influence on the HEV battery charging and discharging processes, and therefore the HEV energy management can be benefited from the road grade preview. In real-world driving, the road grade ahead can be considered as a random variable because the future route is not always available to the HEV controller. This brief proposes a stochastic model predictive control-based energy management strategy using the vehicle location, traveling direction, and terrain information of the area for HEVs running in hilly regions with light traffic. The strategy does not require a determined route being known in advance. The road grade is modeled as a Markov chain and stochastic HEV fuel consumption and battery SoC models are developed. The HEV energy management problem is formulated as a finite-horizon Markov decision process and solved using stochastic dynamic programming. The proposed method is evaluated in simulation and compared with an equivalent consumption minimization strategy and the dynamic programming results. It is shown that the developed method can help maintaining the battery SoC within its boundaries and achieve good energy consumption performance.

204 citations


Journal ArticleDOI
TL;DR: A reinforcement learning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) and its capacity of reducing the computation time is compared with the stochastic dynamic programming (SDP)-based energy management for different driving schedules.
Abstract: A reinforcement learning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) in this paper. A control oriented model of the HETV is first established, in which the state-of-charge (SOC) of battery and the speed of generator are the state variables, and the engine's torque is the control variable. Subsequently, a transition probability matrix is learned from a specific driving schedule of the HETV. The proposed RLAEM decides appropriate power split between the battery and engine-generator set (EGS) to minimize the fuel consumption over different driving schedules. With the RLAEM, not only is driver's power requirement guaranteed, but also the fuel economy is improved as well. Finally, the RLAEM is compared with the stochastic dynamic programming (SDP)-based energy management for different driving schedules. The simulation results demonstrate the adaptability, optimality, and learning ability of the RLAEM and its capacity of reducing the computation time.

202 citations


Book ChapterDOI
01 Sep 2015
TL;DR: This tutorial presents two-stage models with distributional uncertainty using phi-divergences and ties them to risk-averse optimization and examines the value of collecting additional data.
Abstract: Most of classical stochastic programming assumes that the distribution of uncertain parameters are known, and this distribution is an input to the model. In many applications, however, the true distribution is unknown. An ambiguity set of distributions can be used in these cases to hedge against the distributional uncertainty. Phi-divergences (Kullback–Leibler divergence, χ-distance, etc.) provide a measure of distance between two probability distributions. They can be used in data-driven stochastic optimization to create an ambiguity set of distributions that are centered on a nominal distribution. The nominal distribution can be determined by collected observations, expert opinions, simulations, etc. Many phi-divergences are widely used in statistics; therefore they provide a natural way to create an ambiguity set of distributions from available data and expert opinions. In this tutorial, we present two-stage models with distributional uncertainty using phi-divergences and tie them to risk-averse optimization. We examine the value of collecting additional data. We present a classification of phi-divergences to elucidate their use for models with different sources of data and decision makers with different risk preferences. We illustrate these ideas on several examples.

188 citations


Journal ArticleDOI
TL;DR: A two-stage stochastic programming model for provision of flexible demand response (DR) based on thermal energy storage in the form of hot water storage and/or storage in building material and the utilization of the ETD metric to facilitate quantification of the expected total (energy and thermal discomfort) cost is demonstrated.
Abstract: This paper presents a two-stage stochastic programming model for provision of flexible demand response (DR) based on thermal energy storage in the form of hot water storage and/or storage in building material. Aggregated residential electro-thermal technologies (ETTs), such as electric heat pumps and (micro-) combined heat and power, are modeled in a unified nontechnology specific way. Day-ahead optimization is carried out considering uncertainty in outdoor temperature, electricity and hot water consumption, dwelling occupancy, and imbalance prices. Building flexibility is exploited through specification of a deadband around the set temperature or a price of thermal discomfort applied to deviations from the set temperature. A new expected thermal discomfort (ETD) metric is defined to quantify user discomfort. The efficacy of exploiting the flexibility of various residential ETT following the two approaches is analyzed. The utilization of the ETD metric to facilitate quantification of the expected total (energy and thermal discomfort) cost is also demonstrated. Such quantification may be useful in the determination of DR contracts set up by energy service companies. Case studies for a U.K. residential users’ aggregation exemplify the model proposed and quantify possible cost reductions that are achievable under different flexibility scenarios.

162 citations


Journal ArticleDOI
TL;DR: This paper approximate two-stage robust binary programs by their corresponding K -adaptability problems, in which the decision maker precommits to K second-stage policies, here -and-now, and implements the best of these policies once the uncertain parameters are observed.
Abstract: Over the last two decades, robust optimization has emerged as a computationally attractive approach to formulate and solve single-stage decision problems affected by uncertainty. More recently, robust optimization has been successfully applied to multistage problems with continuous recourse. This paper takes a step toward extending the robust optimization methodology to problems with integer recourse, which have largely resisted solution so far. To this end, we approximate two-stage robust binary programs by their corresponding K-adaptability problems, in which the decision maker precommits to K second-stage policies, here -and-now, and implements the best of these policies once the uncertain parameters are observed. We study the approximation quality and the computational complexity of the K-adaptability problem, and we propose two mixed-integer linear programming reformulations that can be solved with off-the-shelf software. We demonstrate the effectiveness of our reformulations for stylized instances o...

158 citations


Journal ArticleDOI
TL;DR: In this article, a stochastic programming based on the Monte Carlo approach is introduced for optimal planning of remote power systems, considering reliability criteria together with the investment and the operation costs.
Abstract: A majority of remote power systems are going to be supplied by diesel-renewable resources such as wind and photovoltaic energy in the future. However, the unpredictable nature of wind generation increases the concern about the reliable operation of these isolated microgrids. Using energy storage systems (ESSs) is recently accepted as an efficient solution to the volatility and intermittency of renewable energy sources. In this paper, a stochastic programming based on the Monte Carlo approach is introduced for optimal planning of remote systems. So far, most literatures have focused exclusively on the energy storage initial sizing. However, capacity expansion of ESS through the time span can result in significant cost saving and will be illustrated in this paper. Factors such as reliability criteria together with the investment and the operation costs are taken into account in the proposed methodology. This method utilizes practical operational constraints of ESS including efficiency and life cycle. Considering life cycle constraint reinforces the proposed method to completely investigate the difference between ESS technologies. The results of case study demonstrate that the proposed capacity expansion algorithm could lead to about 10% more profit over the traditional energy storage sizing.

153 citations


Journal ArticleDOI
TL;DR: This work proposes a novel reformulation of the problem that allows considering general polyhedral uncertainty sets and shows that the robust optimization model scales well with the size of the power system, which is promising in view of real-world applications of this approach.

Journal ArticleDOI
Lu Zhen1
TL;DR: This study finds that the robust method can derive a near optimal solution to the stochastic model in a fast way, and also has the benefit of limiting the worst-case outcome of the tactical BAP decisions.

Journal ArticleDOI
TL;DR: In this paper, a lifting technique was proposed to map a given stochastic program to an equivalent problem on a higher-dimensional probability space, and it was shown that solving the lifted problem in primal and dual linear decision rules provides tighter bounds than those obtained from applying linear decision rule to the original problem.
Abstract: Stochastic programming provides a versatile framework for decision-making under uncertainty, but the resulting optimization problems can be computationally demanding. It has recently been shown that primal and dual linear decision rule approximations can yield tractable upper and lower bounds on the optimal value of a stochastic program. Unfortunately, linear decision rules often provide crude approximations that result in loose bounds. To address this problem, we propose a lifting technique that maps a given stochastic program to an equivalent problem on a higher-dimensional probability space. We prove that solving the lifted problem in primal and dual linear decision rules provides tighter bounds than those obtained from applying linear decision rules to the original problem. We also show that there is a one-to-one correspondence between linear decision rules in the lifted problem and families of nonlinear decision rules in the original problem. Finally, we identify structured liftings that give rise to highly flexible piecewise linear and nonlinear decision rules, and we assess their performance in the context of a dynamic production planning problem.

Journal ArticleDOI
TL;DR: A novel distributed method for convex optimization problems with a certain separability structure based on the augmented Lagrangian framework is proposed and compares favorably to two augmentedlagrangian decomposition methods known in the literature.
Abstract: We propose a novel distributed method for convex optimization problems with a certain separability structure. The method is based on the augmented Lagrangian framework. We analyze its convergence and provide an application to two network models, as well as to a two-stage stochastic optimization problem. The proposed method compares favorably to two augmented Lagrangian decomposition methods known in the literature, as well as to decomposition methods based on the ordinary Lagrangian function.

Journal ArticleDOI
TL;DR: In this paper, a generic reverse logistics network design model under return quantity, sorting ratio (quality), and transportation cost uncertainties is proposed to maximize profit of a third party waste of electrical and electronic equipment recycling companies.
Abstract: In recent years, Reverse Logistics has received increasing attentions in supply chain management area. The reasons such as political, economic, green image and social responsibility etc. force firms to develop strategies to their current systems. The aim of this study is to propose a generic Reverse Logistics Network Design model under return quantity, sorting ratio (quality), and transportation cost uncertainties. We present a generic multi-echelon, multi-product and capacity constrained two stage stochastic programing model to take into consideration uncertainties in Reverse Logistics Network Design for a third party waste of electrical and electronic equipment recycling companies to maximize profit. We validated developed model by applying to a real world case study for waste of electrical and electronic equipment recycling firm in Turkey. Sample average approximation method was used to solve the model. Results show that the developed two stage stochastic programming model provides acceptable solutions to make efficient decisions under quantity, quality and transportation cost uncertainties.

Journal ArticleDOI
TL;DR: This paper presents a novel algorithmic approach to reformulate a joint chance constraint as a constraint on the expectation of a summation of indicator random variables, which can be incorporated into the cost function by considering a dual formulation of the optimization problem.
Abstract: Existing approaches to constrained dynamic programming are limited to formulations where the constraints share the same additive structure of the objective function (that is, they can be represented as an expectation of the summation of one-stage costs). As such, these formulations cannot handle joint probabilistic (chance) constraints, whose structure is not additive. To bridge this gap, this paper presents a novel algorithmic approach for joint chance-constrained dynamic programming problems, where the probability of failure to satisfy given state constraints is explicitly bounded. Our approach is to (conservatively) reformulate a joint chance constraint as a constraint on the expectation of a summation of indicator random variables, which can be incorporated into the cost function by considering a dual formulation of the optimization problem. As a result, the primal variables can be optimized by standard dynamic programming, while the dual variable is optimized by a root-finding algorithm that converges exponentially. Error bounds on the primal and dual objective values are rigorously derived. We demonstrate algorithm effectiveness on three optimal control problems, namely a path planning problem, a Mars entry, descent and landing problem, and a Lunar landing problem. All Mars simulations are conducted using real terrain data of Mars, with four million discrete states at each time step. The numerical experiments are used to validate our theoretical and heuristic arguments that the proposed algorithm is both (i) computationally efficient, i.e., capable of handling real-world problems, and (ii) near-optimal, i.e., its degree of conservatism is very low.

Journal ArticleDOI
TL;DR: In this article, a stochastic programming model is used to represent the uncertain parameters plaguing such a long-term planning exercise, and the transition from today to 2050 is represented by allowing investment in both production and transmission facilities, with the target of achieving a renewable-dominated minimum-cost system.
Abstract: Renewable energy sources are here to stay for a number of important reasons, including global warming and the depletion of fossil fuels. We explore in this paper how a thermal-dominated electric energy system can be transformed into a renewable-dominated one. This study relies on a stochastic programming model that allows representing the uncertain parameters plaguing such long-term planning exercise. Being the final year of our analysis 2050, we represent the transition from today to 2050 by allowing investment in both production and transmission facilities, with the target of achieving a renewable-dominated minimum-cost system. The methodology developed is illustrated using a realistic large-scale case study. Finally, policy conclusions are drawn.

Journal ArticleDOI
TL;DR: This work proves the almost-sure convergence of a class of sampling-based nested decomposition algorithms for multistage stochastic convex programs in which the stage costs are general convex functions of the decisions and uncertainty is modelled by a scenario tree.
Abstract: We prove the almost-sure convergence of a class of sampling-based nested decomposition algorithms for multistage stochastic convex programs in which the stage costs are general convex functions of the decisions and uncertainty is modelled by a scenario tree. As special cases, our results imply the almost-sure convergence of stochastic dual dynamic programming, cutting-plane and partial-sampling CUPPS algorithm, and dynamic outer-approximation sampling algorithms when applied to problems with general convex cost functions.

Journal ArticleDOI
TL;DR: A bi-objective optimization model for designing a closed loop supply chain (CLSC) network under uncertainty in which the total costs and the maximum waiting times in the queue of products are considered to minimize.

Journal ArticleDOI
TL;DR: In this paper, a risk-averse stochastic modeling approach for a pre-disaster relief network design problem under uncertain demand and transportation capacities is introduced, where the sizes and locations of the response facilities and the inventory levels of relief supplies at each facility are determined while guaranteeing a certain level of network reliability.
Abstract: This article introduces a risk-averse stochastic modeling approach for a pre-disaster relief network design problem under uncertain demand and transportation capacities. The sizes and locations of the response facilities and the inventory levels of relief supplies at each facility are determined while guaranteeing a certain level of network reliability. A probabilistic constraint on the existence of a feasible flow is introduced to ensure that the demand for relief supplies across the network is satisfied with a specified high probability. Responsiveness is also accounted for by defining multiple regions in the network and introducing local probabilistic constraints on satisfying demand within each region. These local constraints ensure that each region is self-sufficient in terms of providing for its own needs with a large probability. In particular, the Gale–Hoffman inequalities are used to represent the conditions on the existence of a feasible network flow. The solution method rests on two pillars. A ...

Journal ArticleDOI
TL;DR: The rate of convergence of the SBMD method along with its associated large-deviation results for solving general nonsmooth and stochastic optimization problems and some of the results seem to be new for block coordinate descent methods for deterministic optimization.
Abstract: In this paper, we present a new stochastic algorithm, namely, the stochastic block mirror descent (SBMD) method for solving large-scale nonsmooth and stochastic optimization problems. The basic idea of this algorithm is to incorporate block coordinate decomposition and an incremental block averaging scheme into the classic (stochastic) mirror descent method, in order to significantly reduce the cost per iteration of the latter algorithm. We establish the rate of convergence of the SBMD method along with its associated large-deviation results for solving general nonsmooth and stochastic optimization problems. We also introduce variants of this method and establish their rate of convergence for solving strongly convex, smooth, and composite optimization problems, as well as certain nonconvex optimization problems. To the best of our knowledge, all these developments related to the SBMD methods are new in the stochastic optimization literature. Moreover, some of our results seem to be new for block coordinate descent methods for deterministic optimization.

Journal ArticleDOI
TL;DR: In this paper, the authors assess the potential for flexible network technologies, such as phase-shifting transformers and non-network solutions, to constitute valuable interim measures within a long-term planning strategy.
Abstract: Significant uncertainty surrounds the future development of electricity systems, primarily in terms of size, location and type of new renewable generation to be connected. In this paper we assess the potential for flexible network technologies, such as phase-shifting transformers, and non-network solutions, such as energy storage and demand-side management, to constitute valuable interim measures within a long-term planning strategy. The benefit of such flexible assets lies not only in the transmission services provided but also in the way they can facilitate and de-risk subsequent decisions by deferring commitment to capital-intensive projects until more information on generation development becomes available. A novel stochastic formulation for transmission expansion planning is presented that includes consideration of investment in these flexible solutions. The proposed framework is demonstrated with a case study on the IEEE-RTS where flexible technologies are shown to constitute valuable investment options when facing uncertainties in future renewable generation development.

Journal ArticleDOI
TL;DR: In this paper, the authors present a robust optimization tool for storage investment on transmission networks, which employs robust optimization to minimize the investment in storage units, without load or renewable power curtailment, for all scenarios in the convex hull of a discrete uncertainty set.
Abstract: This paper discusses the need for the integration of storage systems on transmission networks having renewable sources, and presents a tool for energy storage planning. The tool employs robust optimization to minimize the investment in storage units that guarantee a feasible system operation, without load or renewable power curtailment, for all scenarios in the convex hull of a discrete uncertainty set; it is termed ROSION—Robust Optimization of Storage Investment On Networks. The computa- tional engine in ROSION is a specific tailored implementation of a column-and-constraint generation algorithm for two-stage robust optimization problems, where a lower and an upper bound on the optimal objective function value are successively calculated until convergence. The lower bound is computed using mixed-integer linear programming and the upper bound via linear programming applied to a sequence of similar problems. ROSION is demon- strated for storage planning on the IEEE 14-bus and 118-bus networks, and the robustness of the designs is validated via Monte Carlo simulation.

Journal ArticleDOI
TL;DR: This paper considers the continuous road network design problem with stochastic user equilibrium constraint that aims to optimize the network performance via road capacity expansion and transforms the formulation into a nonlinear nonconvex programming problem.
Abstract: In this paper, we consider the continuous road network design problem with stochastic user equilibrium constraint that aims to optimize the network performance via road capacity expansion. The network flow pattern is subject to stochastic user equilibrium, specifically, the logit route choice model. The resulting formulation, a nonlinear nonconvex programming problem, is firstly transformed into a nonlinear program with only logarithmic functions as nonlinear terms, for which a tight linear programming relaxation is derived by using an outer-approximation technique. The linear programming relaxation is then embedded within a global optimization solution algorithm based on range reduction technique, and the proposed approach is proved to converge to a global optimum.

Journal ArticleDOI
TL;DR: In this paper, a model to assist decision makers in the logistics of a flood emergency is presented, which attempts to optimize inventory levels for emergency supplies as well as vehicles availability, in order to deliver enough supplies to satisfy demands with a given probability.
Abstract: This article presents a model to assist decision makers in the logistics of a flood emergency. The model attempts to optimize inventory levels for emergency supplies as well as vehicles’ availability, in order to deliver enough supplies to satisfy demands with a given probability. A spatio-temporal stochastic process represents the flood occurrence. The model is approximately solved with sample average approximation. The article presents a method to quantify the impact of the various intervening logistics parameters. An example is provided and a sensitivity analysis is performed. The studied example shows large differences between the impacts of logistics parameters such as number of products, number of periods, inventory capacity and degree of demand fulfillment on the logistics cost and time. This methodology emerges as a valuable tool to help decision makers to allocate resources both before and after a flood occurs, with the aim of minimizing the undesirable effects of such events.

Journal ArticleDOI
TL;DR: This paper reports the results of some computational experiments on a large-scale hydrothermal scheduling model developed for Brazil and finds that the best improvements in computation time are obtained from an implementation that increases the number of scenarios in the forward pass with each iteration and selects cuts to be included in the stage problems in each iteration.

Journal ArticleDOI
TL;DR: In this article, a stochastic programming model with recourse is proposed to formulate the problem in which the expected penalty for late arrival at customers is considered, and a column generation algorithm is developed to solve the problem.
Abstract: Home health care (HHC) is defined as providing medical and paramedical services for patients at their own domicile. In the HHC industry, it is crucial for health care organisations to assign caregivers to patients and devise reasonable visiting routes to save total operational cost and improve the service quality. However, some special constraints make the problem hard to solve. For example, patients’ service times are usually stochastic due to their varying health conditions; caregivers are organised in a hierarchical structure according to their skills to satisfy patients’ demands. In this paper, we address a HHC scheduling and routing problem with stochastic service times and skill requirements. A stochastic programming model with recourse is proposed to formulate the problem in which the expected penalty for late arrival at customers is considered. To solve the problem, it is equivalently transformed into a master problem and a pricing sub-problem. A column generation algorithm is developed to solve t...

Journal ArticleDOI
TL;DR: This paper attempts to design a reverse supply chain network (SCN), add it to an existing multi-product forward SCN and simultaneously redesign the existing forward supply chain (SC) using Benders’ decomposition and Cholesky’s factorization method.
Abstract: This paper attempts to design a reverse supply chain network (SCN), add it to an existing multi-product forward SCN and simultaneously redesign the existing forward supply chain (SC). The problem considers uncertainty on products demand and and also returned products in multi-period context. Benders’ decomposition is applied to solve the stochastic mixed-integer model to optimality. The scenarios are generated based on the demand distribution function using Cholesky’s factorization method to consider correlation among different products’ demands. To decrease the computational effort, the number of scenarios is reduced using k-means clustering algorithm. The method is tested on a cell phone SC.

Journal ArticleDOI
TL;DR: This work considers a risk-averse multi-stage stochastic program using conditional value at risk as the risk measure, and proposes a new approach based on importance sampling, which yields improved upper bound estimators.
Abstract: We consider a risk-averse multi-stage stochastic program using conditional value at risk as the risk measure. The underlying random process is assumed to be stage-wise independent, and a stochastic dual dynamic programming (SDDP) algorithm is applied. We discuss the poor performance of the standard upper bound estimator in the risk-averse setting and propose a new approach based on importance sampling, which yields improved upper bound estimators. Modest additional computational effort is required to use our new estimators. Our procedures allow for significant improvement in terms of controlling solution quality in SDDP-style algorithms in the risk-averse setting. We give computational results for multi-stage asset allocation using a log-normal distribution for the asset returns.

Journal ArticleDOI
16 Jul 2015-Energies
TL;DR: Two RL-based algorithms, namely Q -learning and Dyna algorithms, are applied to generate optimal control solutions for a hybrid electric tracked vehicle and the results are compared to clarify the merits and demerits of these algorithms.
Abstract: This paper presents a reinforcement learning (RL)–based energy management strategy for a hybrid electric tracked vehicle. A control-oriented model of the powertrain and vehicle dynamics is first established. According to the sample information of the experimental driving schedule, statistical characteristics at various velocities are determined by extracting the transition probability matrix of the power request. Two RL-based algorithms, namely Q-learning and Dyna algorithms, are applied to generate optimal control solutions. The two algorithms are simulated on the same driving schedule, and the simulation results are compared to clarify the merits and demerits of these algorithms. Although the Q-learning algorithm is faster (3 h) than the Dyna algorithm (7 h), its fuel consumption is 1.7% higher than that of the Dyna algorithm. Furthermore, the Dyna algorithm registers approximately the same fuel consumption as the dynamic programming–based global optimal solution. The computational cost of the Dyna algorithm is substantially lower than that of the stochastic dynamic programming.