首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas. Editor: Prasad Tadepalli  相似文献   

2.
We consider the use of quadratic approximate value functions for stochastic control problems with input‐affine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in such cases requires the solution of an explicit convex optimization problem, such as a quadratic program, which can be carried out efficiently. We describe a simple and general method for approximate value iteration that also relies on our ability to solve convex optimization problems, in this case, typically a semidefinite program. Although we have no theoretical guarantee on the performance attained using our method, we observe that very good performance can be obtained in practice.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

3.
In modern computer games, bots - intelligent realistic agents play a prominent role in the popularity of a game in the market. Typically, bots are modeled using finite-state machine and then programmed via simple conditional statements which are hard-coded in bots logic. Since these bots have become quite predictable to an experienced games player, a player might lose interest in the game. We propose the use of a game theoretic based learning rule called fictitious play for improving behavior of these computer game bots which will make them less predictable and hence, more a enjoyable game.  相似文献   

4.
S. Sen  S. J. Yakowitz   《Automatica》1987,23(6):749-752
We develop a quasi-Newton differential dynamic programming algorithm (QDDP) for discrete-time optimal control problems. In the spirit of dynamic programming, the quasi-Newton approximations are performed in a stagewise manner. We establish the global convergence of the method and also show a superlinear convergence rate. Among other advantages of the QDDP method, second derivatives need not be calculated. In theory, the computational effort of each recursion grows proportionally to the number of stages N, whereas with conventional quasi-Newton techniques which do not take advantage of the optimal control problem structure, the growth is as N2. Computational results are also reported.  相似文献   

5.
The study of asset price characteristics of stochastic growth models such as the risk-free interest rate, equity premium, and the Sharpe-ratio has been limited by the lack of global and accurate methods to solve dynamic optimization models. In this paper, a stochastic version of a dynamic programming method with adaptive grid scheme is applied to compute the asset price characteristics of a stochastic growth model. The stochastic growth model is of the type as developed by [Brock and Mirman (1972), Journal of Economic Theory, 4, 479–513 and Brock (1979), Part I: The growth model (pp. 165–190). New York: Academic Press; The economies of information and uncertainty (pp. 165–192). Chicago: University of Chicago Press. (1982). It has become the baseline model in the stochastic dynamic general equilibrium literature. In a first step, in order to test our procedure, it is applied to this basic stochastic growth model for which the optimal consumption and asset prices can analytically be computed. Since, as shown, our method produces only negligible errors, as compared to the analytical solution, in a second step, we apply it to more elaborate stochastic growth models with adjustment costs and habit formation. In the latter model preferences are not time separable and past consumption acts as a constraint on current consumption. This model gives rise to an additional state variable. We here too apply our stochastic version of a dynamic programming method with adaptive grid scheme to compute the above mentioned asset price characteristics. We show that our method is very suitable to be used as solution technique for such models with more complicated decision structure.   相似文献   

6.
Approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical solution of the recursive Bellman equations. In this paper, low-discrepancy sequences, commonly employed for number-theoretic methods, are investigated as a sampling scheme in the ADP context when local models, such as the Nadaraya–Watson (NW) ones, are employed for the approximation of the value function. The analysis is carried out both from a theoretical and a practical point of view. In particular, it is shown that the combined use of low-discrepancy sequences and NW models enables the convergence of the ADP procedure. Then, the regular structure of the low-discrepancy sampling is exploited to derive a method for automatic selection of the bandwidth of NW models, which yields a significant saving in the computational effort with respect to the standard cross validation approach. Simulation results concerning an inventory management problem are presented to show the effectiveness of the proposed techniques.  相似文献   

7.
We model a multiperiod, single resource capacity reservation problem as a dynamic, stochastic, multiple knapsack problem with stochastic dynamic programming. As the state space grows exponentially in the number of knapsacks and the decision set grows exponentially in the number of order arrivals per period, the recursion is computationally intractable for large-scale problems, including those with long horizons. Our goal is to ensure optimal, or near optimal, decisions at time zero when maximizing the net present value of returns from accepted orders, but solving problems with short horizons introduces end-of-study effects which may prohibit finding good solutions at time zero. Thus, we propose an approximation approach which utilizes simulation and deterministic dynamic programming in order to allow for the solution of longer horizon problems and ensure good time zero decisions. Our computational results illustrate the effectiveness of the approximation scheme.  相似文献   

8.
Although reliability-based structural optimization (RBSO) is recognized as a rational structural design philosophy that is more advantageous to deterministic optimization, most common RBSO is based on straightforward two-level approach connecting algorithms of reliability calculation and that of design optimization. This is achieved usually with an outer loop for optimization of design variables and an inner loop for reliability analysis. A number of algorithms have been proposed to reduce the computational cost of such optimizations, such as performance measure approach, semi-infinite programming, and mono-level approach. Herein the sequential approximate programming approach, which is well known in structural optimization, is extended as an efficient methodology to solve RBSO problems. In this approach, the optimum design is obtained by solving a sequence of sub-programming problems that usually consist of an approximate objective function subjected to a set of approximate constraint functions. In each sub-programming, rather than direct Taylor expansion of reliability constraints, a new formulation is introduced for approximate reliability constraints at the current design point and its linearization. The approximate reliability index and its sensitivity are obtained from a recurrence formula based on the optimality conditions for the most probable failure point (MPP). It is shown that the approximate MPP, a key component of RBSO problems, is concurrently improved during each sub-programming solution step. Through analytical models and comparative studies over complex examples, it is illustrated that our approach is efficient and that a linearized reliability index is a good approximation of the accurate reliability index. These unique features and the concurrent convergence of design optimization and reliability calculation are demonstrated with several numerical examples.  相似文献   

9.
Quadratic knapsack problem (QKP) has a central role in integer and combinatorial optimization, while efficient algorithms to general QKPs are currently very limited. We present an approximate dynamic programming (ADP) approach for solving convex QKPs where variables may take any integer value and all coefficients are real numbers. We approximate the function value using (a) continuous quadratic programming relaxation (CQPR), and (b) the integral parts of the solutions to CQPR. We propose a new heuristic which adaptively fixes the variables according to the solution of CQPR. We report computational results for QKPs with up to 200 integer variables. Our numerical results illustrate that the new heuristic produces high-quality solutions to large-scale QKPs fast and robustly.  相似文献   

10.
In this paper, we introduce new methods for finding functions that lower bound the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and produces both a bound on the optimal objective, as well as a suboptimal policy that appears to works very well. These results extend and improve bounds obtained in a previous paper using a single Bellman inequality condition. We describe the methods in a general setting and show how they can be applied in specific cases including the finite state case, constrained linear quadratic control, switched affine control, and multi‐period portfolio investment. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
In recent years, supply chains have become increasingly globalized. As a consequence, the world's supply of all types of parts has become more susceptible to disruptions. Some of these disruptions are extreme and may have global implications. Our research is based on the supply risk management problem faced by a manufacturer. We model the problem as a dynamic program, design and implement approximate dynamic programming (ADP) algorithms to solve it, to overcome the well-known curses of dimensionality. Using numerical experiments, we compare the performance of different ADP algorithms. We then design a series of numerical experiments to study the performance of different sourcing strategies (single, dual, multiple, and contingent sourcing) under various settings, and to discover insights for supply risk management practice. The results show that, under a wide variety of settings, the addition of a third or more suppliers brings much less marginal benefits. Thus, managers can limit their options to a backup supplier (contingent sourcing) or an additional regular supplier (dual sourcing). Our results also show that, unless the backup supplier can supply with zero lead time, using dual sourcing appears to be preferable. Lastly, we demonstrate the capability of the proposed method in analyzing more complicated realistic supply chains.  相似文献   

12.
We use a stochastic dynamic programming (SDP) approach to solve the problem of determining the optimal routing policies in a stochastic dynamic network. Due to its long time for solving SDP, we propose three techniques for pruning stochastic dynamic networks to expedite the process of obtaining optimal routing policies. The techniques include: (1) use of static upper/lower bounds, (2) pre-processing the stochastic dynamic networks by using the start time and origin location of the vehicle, and (3) a mix of pre-processing and upper/lower bounds. Our experiments show that while finding optimal routing policies in stochastic dynamic networks, the last two of the three strategies have a significant computational advantage over conventional SDP. Our main observation from these experiments was that the computational advantage of the pruning strategies that depend on the start time of the vehicle varies according to the time input to the problem. We present the results of this variation in the experiments section. We recommend that while comparing the computational performances of time-dependent techniques, it is very important to test the performance of such strategies at various time inputs.  相似文献   

13.
求解强非线性动力系统响应的一种新方法   总被引:1,自引:5,他引:1  
将同伦理论和参数变换技术相结合提出了一种可适用于求解强非线性动力系统响应的新方法.即PE-HAM方法(基于参数展开的同伦分析技术).其主要思想是通过构造合适的同伦映射,将一非线性动力系统的求解问题,转化为一线性微分方程组的求解问题,然后借助于参数展开技术消除长期项,进而得到系统的解析近似解.为了检验所提方法的有效性,研究了具有精确周期的保守Duffing系统的响应,求出了其解析的近似解表达式.在与精确周期的比较中,可以得出:在非线性强度。很大,甚至在α→∞时,近似解的周期与原系统精确周期的误差也只有2.17%.数值模拟结果说明了新方法的有效性.  相似文献   

14.
随着工业过程对降低产品成本、改进产品质量、满足安全要求和环境规范,间歇反应过程的优化变得越来越重要.本文因此给出了一种有效的基于随机选点的间歇反应过程迭代动态规划算法,并给出了算法实现的详细步骤,能够有效实现间歇反应过程中温度、浓度等变量的动态优化问题.所述的迭代动态规划算法通过调节分段数P和离散点数(N和M)可以有效的避免计算量激增的问题,具有稳定可靠、易寻找到全局最优解的优点.以典型的间歇反应动态优化问题作为实例进行了研究,并与国际上公开报道结果进行了详细的比较研究,结果表明了所述方法的可靠有效性.  相似文献   

15.
The stochastic dynamic programming approach outlined here, makes use of the scenario tree in a back-to-front scheme. The multi-period stochastic problems, related to the subtrees whose root nodes are the starting nodes (i.e., scenario groups), are solved at each given stage along the time horizon. Each subproblem considers the effect of the stochasticity of the uncertain parameters from the periods of the given stage, by using curves that estimate the expected future value (EFV) of the objective function. Each subproblem is solved for a set of reference levels of the variables that also have nonzero elements in any of the previous stages besides the given stage. An appropriate sensitivity analysis of the objective function for each reference level of the linking variables allows us to estimate the EFV curves applicable to the scenario groups from the previous stages, until the curves for the first stage have been computed. An application of the scheme to the problem of production planning with logical constraints is presented. The aim of the problem consists of obtaining the planning of tactical production over the scenarios along the time horizon. The expected total cost is minimized to satisfy the product demand. Some computational experience is reported. The proposed approach compares favorably with a state-of-the-art optimization engine in instances on a very large scale.  相似文献   

16.
Y.  L.W.  E.K.P.  K.N.   《Digital Signal Processing》2009,19(6):978-989
The problem of sensor scheduling is to select the number and combination of sensors to activate over time. The goal is usually to trade off tracking performance and sensor usage. We formulate a version of this problem involving multiple targets as a partially observable Markov decision process, and use this formulation to develop a nonmyopic sensor-scheduling scheme. Our scheme integrates sequential multisensor joint probabilistic data association and particle filtering for belief-state estimation, and use a simulation-based Q-value approximation method called completely observable rollout for decision making. We illustrate the effectiveness of our approach by an example with multiple sensors activated simultaneously to track multiple targets. We also explore the trade-off between tracking error and sensor cost using our nonmyopic scheme.  相似文献   

17.
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.  相似文献   

18.
This study investigates the global optimality of approximate dynamic programming (ADP) based solutions using neural networks for optimal control problems with fixed final time. Issues including whether or not the cost function terms and the system dynamics need to be convex functions with respect to their respective inputs are discussed and sufficient conditions for global optimality of the result are derived. Next, a new idea is presented to use ADP with neural networks for optimization of non-convex smooth functions. It is shown that any initial guess leads to direct movement toward the proximity of the global optimum of the function. This behavior is in contrast with gradient based optimization methods in which the movement is guided by the shape of the local level curves. Illustrative examples are provided with single and multi-variable functions that demonstrate the potential of the proposed method.  相似文献   

19.
In this paper, the problem of intercepting a maneuvering target is formulated as a two-player zero-sum differential game framework affected by matched uncertainties. By introducing an appropriate cost function that reflects the uncertainties, the robust control is transformed into a two-player zero-sum differential game control problem and therefore ensures the compensation of the matched uncertainties. Additionally, the corresponding Hamilton--Jacobi--Isaacs (HJI) equation is solved by constructing a critic neural network (NN). The closed-loop system and the critic NN weight estimation error are proved to be uniform ultimate boundedness (UUB) by utilising Lyapunov approach. Finally, the effectiveness of the proposed robust guidance law is demonstrated by using a nonlinear two-dimensional kinematics, assuming first-order dynamics for the interceptor and the target.  相似文献   

20.
李顺新  杜辉 《计算机应用》2010,30(6):1550-1551
水库优化调度是一个典型的具有多约束条件的、动态的、非线性的优化问题。针对这些问题,利用动态规划-粒子群(DP-PSO)算法加以求解。利用动态规划中的多阶段最优策略原理,将水库优化调度问题转化为多阶段决策子问题,各个子问题采用粒子群算法优化求解。数值实验表明,在计算时段较多时,DP-PSO算法计算的可靠性明显优于一般的动态规划(DP)算法,在计算时间上,DP-PSO算法用时较动态规划-遗传算法(DP-GA)少。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号