共查询到19条相似文献,搜索用时 109 毫秒
1.
2.
3.
4.
基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性.通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略.文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题.最后给出了一个仿真实例来说明算法的应用. 相似文献
5.
6.
7.
路径节点驱动的低代价最短路径树算法 总被引:2,自引:0,他引:2
Dijkstra算法是一个优秀的最短路径求解算法,同时也产生一棵最短路径树SPT(shortest path tree);该算法在网络计算与优化中得到了广泛的应用.为了对最短路径树进行代价优化,提出了路径节点驱动的思想.基于这种思想设计了路径节点驱动的最低代价最短路径树算法LCSPT(least-cost shortest path tree algorithm).通过LCSPT算法一个正计算节点能够最大化与当前最短路径树中的路径共享,因而进一步优化SPT树代价性能,生成高性能的SPT树.作为算法的重要组成部分,使用数学归纳法证明了算法的正确性;从理论上分析了LCSPT算法的代价性能,以及和同类算法相比如何取得最小代价性能;同时,对其时间复杂度和空间复杂度进行了分析.最后通过3个仿真实验验证了该算法在构建SPT时的正确性和其最小代价最短路径树特性. 相似文献
8.
智能电网弹性响应时间业务需求的接入控制 总被引:1,自引:0,他引:1
考虑智能电网多种类型业务需求的接入控制, 通过利用响应时间的弹性来平缓业务负荷的波动, 使得电网运行的长期平均代价最小. 针对业务需求和用户行为的随机分布特性, 建立连续时间Markov 控制过程的系统分析模型; 结合性能势基于样本轨道的估计, 提出一种基于仿真的策略迭代优化算法, 有效缓解了系统大状态空间导致的维数灾问题, 具有较快的收敛速度和良好的应用效果. 仿真实验结果验证了所提出方法的有效性.
相似文献9.
10.
11.
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular, where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the state space. Furthermore, under these conditions the sequence of relative value functions generated by the algorithm is bounded from below and “nearly” decreasing, from which it follows that the algorithm is always convergent. Under further conditions, it is shown that the algorithm does compute a solution to the optimality equations and hence an optimal average cost policy. These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost and recover known results for the standard linear-quadratic-Gaussian problem. In particular, in the control of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model 相似文献
12.
The current study examines the dynamic vehicle allocation problems of the automated material handling system (AMHS) in semiconductor manufacturing. With the uncertainty involved in wafer lot movement, dynamically allocating vehicles to each intrabay is very difficult. The cycle time and overall tool productivity of the wafer lots are affected when a vehicle takes too long to arrive. In the current study, a Markov decision model is developed to study the vehicle allocation control problem in the AMHS. The objective is to minimize the sum of the expected long-run average transport job waiting cost. An interesting exhaustive structure in the optimal vehicle allocation control is found in accordance with the Markov decision model. Based on this exhaustive structure, an efficient algorithm is then developed to solve the vehicle allocation control problem numerically. The performance of the proposed method is verified by a simulation study. Compared with other methods, the proposed method can significantly reduce the waiting cost of wafer lots for AMHS vehicle transportation. 相似文献
13.
Consider the Hidden Markov model where the realization of a single Markov chain is observed by a number of noisy sensors. The sensor scheduling problem for the resulting hidden Markov model is as follows: design an optimal algorithm for selecting at each time instant, one of the many sensors to provide the next measurement. Each measurement has an associated measurement cost. The problem is to select an optimal measurement scheduling policy, so as to minimize a cost function of estimation errors and measurement costs. The problem of determining the optimal measurement policy is solved via stochastic dynamic programming. Numerical results are presented. 相似文献
14.
Vivek S Borkar 《Systems & Control Letters》1998,34(4):5635
The ergodic or long-run average cost control problem for a partially observed finite-state Markov chain is studied via the associated fully observed separated control problem for the nonlinear filter. Dynamic programming equations for the latter are derived, leading to existence and characterization of optimal stationary policies. 相似文献
15.
16.
《Automatic Control, IEEE Transactions on》2008,53(6):1520-1526
17.
Boris M. Miller Author Vitae 《Automatica》2009,45(6):1423-1430
The problem of access and service rate control in queuing systems as a general optimization problem for controlled Markov process with finite state space is considered. By using the dynamic programming approach we obtain the explicit form of the optimal control in the case of minimizing cost given as a mixture of an average queue length, number of lost jobs, and service resources. The problem is considered on a finite time interval in the case of nonstationary input flow. In this case we suggest the general procedure of the numerical solution which can be applied to a problems with constraints. 相似文献
18.
This paper applies the Dantzig-Wolfe decomposition technique to control systems governed by Markov chains, and the three usual types of costs: 1) the average cost attained until a target state is reached, 2) discounted cost, 3) average cost per unit time. Additional systems constraints are allowed. A technique for subdividing or "essentially" decomposing the problem is developed, and a Markov interpretation is given to each subsystem. The special significance, for this problem, of the extreme points and rays of the subproblem, is discussed. 相似文献
19.
An algorithm which solves, numerically, the simultaneous stabilization problem using a constant gain decentralized control law is presented. The algorithm is determined from the necessary conditions for minimizing an optimal control problem. The optimal control problem consists of n plant models each with its own cost function. The costs are summed to create an average cost function and equality constraints are added to yield the decentralized control structure. The algorithm can start with any stabilizing full state feedback gain for each model and will converge to the optimal constant feedback gain for all models assuming a solution exists. Examples of the algorithm are given for Kharitonov synthesis and optimal gain scheduled control law synthesis using output feedback 相似文献