首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Approximate dynamic programming (ADP) commonly employs value function approximation to numerically solve complex dynamic programming problems. A statistical perspective of value function approximation employs a design and analysis of computer experiments (DACE) approach, where the “computer experiment” yields points on the value function curve. The DACE approach has been used to numerically solve high-dimensional, continuous-state stochastic dynamic programming, and performs two tasks primarily: (1) design of experiments and (2) statistical modeling. The use of design of experiments enables more efficient discretization. However, identifying the appropriate sample size is not straightforward. Furthermore, identifying the appropriate model structure is a well-known problem in the field of statistics. In this paper, we present a sequential method that can adaptively determine both sample size and model structure. Number-theoretic methods (NTM) are used to sequentially grow the experimental design because of their ability to fill the design space. Feed-forward neural networks (NNs) are used for statistical modeling because of their adjustability in structure-complexity . This adaptive value function approximation (AVFA) method must be automated to enable efficient implementation within ADP. An AVFA algorithm is introduced, that increments the size of the state space training data in each sequential step, and for each sample size a successive model search process is performed to find an optimal NN model. The new algorithm is tested on a nine-dimensional inventory forecasting problem.  相似文献   

2.
Multistage optimization problems that are represented by Markov Decision Processes (MDPs) can be solved by the approach of Dynamic Programming (DP). However, in process control problems involving continuous state spaces, the classical DP formulation leads to computational intractability known as the ‘curse of dimensionality’. This issue can be overcome by the approach of Approximate Dynamic Programming (ADP) using simulation-based sampling in combination with value function approximators replacing the traditional value tables. In this paper, we investigate different approaches of ADP in the context of a deep cup drawing process, which is simulated by a finite element model. In applying ADP to the problem, Artificial Neural Networks (ANNs) are created as global parametric function approximators to represent the value functions as well as the state transitions. For each time step of the finite time horizon, time-indexed function approximations are built. We compare a classical DP approach to a backward ADP approach with batch learning of the ANNs and a forward ADP approach with incremental learning of the ANNs. In the batch learning mode, the ANNs are trained from temporary value tables constructed by exhaustive search backwards in time. In the incremental learning mode, on the other hand, the ANNs are initialized and then improved continually using data obtained by stochastic sampling of the simulation moving forward in time. For both learning modes, we obtain value function approximations with good performance. The cup deep drawing process under consideration is of medium model complexity and therefore allows us to apply all three methods and to perform a comparison with respect to the achieved efficiency and the associated computational effort as well as the decision behavior of the controllers.  相似文献   

3.
Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP.  相似文献   

4.
Coarse-to-fine dynamic programming   总被引:1,自引:0,他引:1  
We introduce an extension of dynamic programming, we call "coarse-to-fine dynamic programming" (CFDP), ideally suited to DP problems with large state space. CFDP uses dynamic programming to solve a sequence of coarse approximations which are lower bounds to the original DP problem. These approximations are developed by merging states in the original graph into "superstates" in a coarser graph which uses an optimistic arc cost between superstates. The approximations are designed so that CFDP terminates when the optimal path through the original state graph has been found. CFDP leads to significant decreases in the amount of computation necessary to solve many DP problems and can, in some instances, make otherwise infeasible computations possible. CFDP generalizes to DP problems with continuous state space and we offer a convergence result for this extension. We demonstrate applications of this technique to optimization of functions and boundary estimation in mine recognition  相似文献   

5.
Approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical solution of the recursive Bellman equations. In this paper, low-discrepancy sequences, commonly employed for number-theoretic methods, are investigated as a sampling scheme in the ADP context when local models, such as the Nadaraya–Watson (NW) ones, are employed for the approximation of the value function. The analysis is carried out both from a theoretical and a practical point of view. In particular, it is shown that the combined use of low-discrepancy sequences and NW models enables the convergence of the ADP procedure. Then, the regular structure of the low-discrepancy sampling is exploited to derive a method for automatic selection of the bandwidth of NW models, which yields a significant saving in the computational effort with respect to the standard cross validation approach. Simulation results concerning an inventory management problem are presented to show the effectiveness of the proposed techniques.  相似文献   

6.
We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program’s value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class.In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often outperforms state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments.  相似文献   

7.
This work addresses the problem of decision-making under uncertainty for robot navigation. Since robot navigation is most naturally represented in a continuous domain, the problem is cast as a continuous-state POMDP. Probability distributions over state space, or beliefs, are represented in parametric form using low-dimensional vectors of sufficient statistics. The belief space, over which the value function must be estimated, has dimensionality equal to the number of sufficient statistics. Compared to methods based on discretising the state space, this work trades the loss of the belief space’s convexity for a reduction in its dimensionality and an efficient closed-form solution for belief updates. Fitted value iteration is used to solve the POMDP. The approach is empirically compared to a discrete POMDP solution method on a simulated continuous navigation problem. We show that, for a suitable environment and parametric form, the proposed method is capable of scaling to large state-spaces.  相似文献   

8.
Test sequencing algorithms with unreliable tests   总被引:1,自引:0,他引:1  
In this paper, we consider imperfect test sequencing problems under a single fault assumption. This is a partially observed Markov decision problem (POMDP), a sequential multistage decision problem wherein a failure source must be identified using the results of imperfect tests at each stage. The optimal solution for this problem can be obtained by applying a continuous-state dynamic programming (DP) recursion. However, the DP recursion is computationally very expensive owing to the continuous nature of the state vector comprising the probabilities of faults. In order to alleviate this computational explosion, we present an efficient implementation of the DP recursion. We also consider various problems with special structure (parallel systems) and derive closed form solutions/index-rules without having to resort to DP. Finally, we present various top-down graph search algorithms for problems with no special structure, including multistep DP, multistep information heuristics, and certainty equivalence algorithms  相似文献   

9.
In this paper a scheme for approximating solutions of convection-diffusion-reaction equations by Markov jump processes is studied. The general principle of the method of lines reduces evolution partial differential equations to semi-discrete approximations consisting of systems of ordinary differential equations. Our approach is to use for this resulting system a stochastic scheme which is essentially a direct simulation of the corresponding infinitesimal dynamics. This implies automatically the time adaptivity and, in one space dimension, stable approximations of diffusion operators on non-uniform grids and the possibility of using moving cells for the transport part, all within the framework of an explicit method. We present several results in one space dimension including free boundary problems, but the general algorithm is simple, flexible and on uniform grids it can be formulated for general evolution partial differential equations in arbitrary space dimensions.  相似文献   

10.
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.   相似文献   

11.
We propose an efficient approach to pole placement for the problem of centralized control over a large scale power system with state feedback. Our approach is based on a specific homothetic transformation of the original system representation. The representation of a transformed system contains explicit elements that can be changed with feedback and provide for a given pole placement in the closed system. We give examples of solving the exact pole placement problem in systems whose state space dimension reaches several thousands.  相似文献   

12.
Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, most existing parametric continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switching-state dynamics model that can represent multi-modal state-dependent dynamics. We present the Switching Mode POMDP (SM-POMDP) planning algorithm for solving continuous-state POMDPs using this dynamics model. We also consider several procedures to approximate the value function as a mixture of a bounded number of Gaussians. Unlike the majority of prior work on approximate continuous-state POMDP planners, we provide a formal analysis of our SM-POMDP algorithm, providing bounds, where possible, on the quality of the resulting solution. We also analyze the computational complexity of SM-POMDP. Empirical results on an unmanned aerial vehicle collisions avoidance simulation, and a robot navigation simulation where the robot has faulty actuators, demonstrate the benefit of SM-POMDP over a prior parametric approach.  相似文献   

13.
三维人手运动跟踪是人机交互领域的一个重要研究方向.提出了一种新的基于模型的三维人手运动跟踪方法,该方法将层次优化嵌入到基于粒子滤波器的跟踪框架中,通过在隐状态空间中对粒子采样来提高粒子滤波器采样效率.首先,提出了采用低维隐状态来描述人手的配置状态,并根据人手的生理运动约束建立人手动态模型;其次,为提高粒子在隐状态空间的采样效率,提出了采用层次遗传优化来快速地在局部寻找好的粒子,并以此作为重要度采样函数修正粒子滤波的采样算法.实验结果表明,该方法可以在人手自遮挡存在时的复杂背景下快速地对人手运动进行跟踪.  相似文献   

14.
在庞大离散状态空间或连续状态空间中,强化学习(RL)需要进行值函数拟合以寻找最优策略.但函数拟合器的结构往往由设计者预先设定,在学习过程中不能动态调整缺乏自适应性.为了自动构建函数拟合器的结构,提出一种可以进行状态自动划分的模糊小脑模型关节控制(FCMAC)值函数拟合方法.该方法利用Bellman误差的变化趋势实现状态自动划分,并且探讨了两种选择划分区域的机制.汽车爬坡问题和机器人足球仿真平台中的实验结果表明新算法能有效拟合值函数,而且利用所提出的函数拟合器智能体可以进行有效的强化学习.  相似文献   

15.
This paper studies a group of basic state reduction based dynamic programming (DP) algorithms for the multi-objective 0–1 knapsack problem (MKP), which are related to the backward reduced-state DP space (BRDS) and forward reduced-state DP space (FRDS). The BRDS is widely ignored in the literature because it imposes disadvantage for the single objective knapsack problem (KP) in terms of memory requirements. The FRDS based DP algorithm in a general sense is related to state dominance checking, which can be time consuming for the MKP while it can be done efficiently for the KP. Consequently, no algorithm purely based on the FRDS with state dominance checking has ever been developed for the MKP. In this paper, we attempt to get some insights into the state reduction techniques efficient to the MKP. We first propose an FRDS based algorithm with a local state dominance checking for the MKP. Then we evaluate the relative advantage of the BRDS and FRDS based algorithms by analyzing their computational time and memory requirements for the MKP. Finally different combinations of the BRDS and FRDS based algorithms are developed on this basis. Numerical experiments based on the bi-objective KP instances are conducted to compare systematically between these algorithms and the recently developed BRDS based DP algorithm as well as the existing FRDS based DP algorithm without state dominance checking.  相似文献   

16.
Experience has shown that even carefully designed and tested robots may encounter anomalous situations. It is therefore important for robots to monitor their state so that anomalous situations may be detected in a timely manner. Robot fault diagnosis typically requires tracking a very large number of possible faults in complex non-linear dynamic systems with noisy sensors. Traditional methods either ignore the uncertainty or use linear approximations of non-linear system dynamics. Such approximations are often unrealistic, and as a result faults either go undetected or become confused with non-fault conditions.Probability theory provides a natural representation for uncertainty, but an exact Bayesian solution for the diagnosis problem is intractable. Monte Carlo approximations have demonstrated considerable success in application domains such as computer vision and robot localization and mapping. But, classical Monte Carlo methods, such as particle filters, can suffer from substantial computational complexity. This is particularly true with the presence of rare, yet important events, such as many system faults.This paper presents an algorithm that provides an approach for computationally tractable fault diagnosis. Taking advantage of structure in the domain it dynamically concentrates computation in the regions of state space that are currently most relevant without losing track of less likely states. Experiments with a dynamic simulation of a six-wheel rocker-bogie rover show a significant improvement in performance over the classical approach.  相似文献   

17.
In this paper we define and study digital manifolds of arbitrary dimension, and provide (in particular)a general theoretical basis for curve or surface tracing in picture analysis. The studies involve properties such as one-dimensionality of digital curves and (n-1)-dimensionality of digital hypersurfaces that makes them discrete analogs of corresponding notions in continuous topology. The presented approach is fully based on the concept of adjacency relation and complements the concept of dimension as common in combinatorial topology. This work appears to be the first one on digital manifolds based ona graph-theoretical definition of dimension. In particular, in the n-dimensional digital space, a digital curve is a one-dimensional object and a digital hypersurface is an (n-1)-dimensional object, as it is in the case of curves and hypersurfaces in the Euclidean space. Relying on the obtained properties of digital hypersurfaces, we propose a uniform approach for studying good pairs defined by separations and obtain a classification of good pairs in arbitrary dimension. We also discuss possible applications of the presented definitions and results.  相似文献   

18.
AUTOMATIC COMPLEXITY REDUCTION IN REINFORCEMENT LEARNING   总被引:1,自引:0,他引:1  
High dimensionality of state representation is a major limitation for scale-up in reinforcement learning (RL). This work derives the knowledge of complexity reduction from partial solutions and provides algorithms for automated dimension reduction in RL. We propose the cascading decomposition algorithm based on the spectral analysis on a normalized graph Laplacian to decompose a problem into several subproblems and then conduct parameter relevance analysis on each subproblem to perform dynamic state abstraction. The elimination of irrelevant parameters projects the original state space into the one with lower dimension in which some subtasks are projected onto the same shared subtasks. The framework could identify irrelevant parameters based on performed action sequences and thus relieve the problem of high dimensionality in learning process. We evaluate the framework with experiments and show that the dimension reduction approach could indeed make some infeasible problem to become learnable.  相似文献   

19.
20.
This paper presents a solution to the problem of minimizing the cost of moving a robotic manipulator along a specified geometric path subject to input torque/force constraints, taking the coupled, nonlinear dynamics of the manipulator into account. The proposed method uses dynamic programming (DP) to find the positions, velocities, accelerations, and torques that minimize cost. Since the use of parametric functions reduces the dimension of the state space from2nfor ann- jointed manipulator, to two, the DP method does not suffer from the "curse of dimensionality." While maintaining the elegance of our previous trajectory planning method, we have developed the DP method for the general case where 1) the actuator torque limits are dependent on one another, 2) the cost functions can have an arbitrary form, and 3) there are constraints on the jerk, or derivative of the acceleration. Also, we have shown that the DP solution converges as the grid size decreases. As numerical examples, the trajectory planning method is simulated for the first three joints of the PACS arm, which is a cylindrical arm manufactured by the Bendix Corporation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号