期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A policy approximation method for the UMTS connection admission control problem modelled as an MDP

Antonio Pietrabissa 《International journal of control》2013,86(10):1814-1827

This article presents a connection admission control (CAC) algorithm for UMTS networks based on the Markov decision process (MDP) approach. To deal with the non-stationary environment due to the time-varying statistical characteristics of the offered traffic, the admission policy has to be computed periodically based on on-line measurements, and the optimal policy computation is excessively time-consuming to be performed on-line. Thus, this article proposes a reduction of the policy space coupled with an aggregation of the state space for the fast computation of a sub-optimal admission policy. Theoretical results and numerical simulations show the effectiveness of the proposed approach. 相似文献

2.

基于观测的POMDP 优化算法及其仿真

黄静殷保群李俊《信息与控制》2008,37(3):1-1

在分析马尔可夫决策过程（Markov Decision Process, MDP）性能灵敏度的基础上,讨论了部分可观测马尔可夫决策过程（Partially Observable Markov Decision Process, POMDP）的性能优化问题．给出了POMDP 性能灵敏度分析公式,并以此为基础提出了两种基于观测的POMDP 优化算法：策略梯度优化算法和策略迭代优化算法．最后以准许控制问题为仿真实例,验证了这两个算法的有效性．相似文献

3.

A search algorithm for the traveling salesman problem

Jatinder N. D. Gupta 《Computers & Operations Research》1978,5(4):243-250

A search algorithm, based on the concepts of lexicographic search and sequential decision processes, is proposed for the solution of the traveling salesman problem. Starting with an initial trial solution, the search algorithm sequentially generates better tours until an optimal (least cost) tour is identified. The logical structure of the search algorithm is such that the computational effort required to solve a problem by the proposed approach is less than that by the branch and bound procedures. 相似文献

4.

Policy Gradient Approach of Event‐Based Optimization and Its Online Implementation

Li Xia 《Asian journal of control》2014,16(6):1735-1743

In the theory of event‐based optimization (EBO), the decision making is triggered by events, which is different from the traditional state‐based control in Markov decision processes (MDP). In this paper, we propose a policy gradient approach of EBO. First, an equation of performance gradient in the event‐based policy space is derived based on a fundamental quantity called Q‐factors of EBO. With the performance gradient, we can find the local optimum of EBO using the gradient‐based algorithm. Compared to the policy iteration approach in EBO, this policy gradient approach does not require restrictive conditions and it has a wider application scenario. The policy gradient approach is further implemented based on the online estimation of Q‐factors. This approach does not require the prior information about the system parameters, such as the transition probability. Finally, we use an EBO model to formulate the admission control problem and demonstrate the main idea of this paper. Such online algorithm provides an effective implementation of the EBO theory in practice. 相似文献

5.

Perturbation and stability theory for Markov control problems

Abbad M. Filar J.A. 《Automatic Control, IEEE Transactions on》1992,37(9):1415-1420

A unified approach to the asymptotic analysis of a Markov decision process disturbed by an ϵ-additive perturbation is proposed. Irrespective of whether the perturbation is regular or singular, the underlying control problem that needs to be understood is the limit Markov control problem. The properties of this problem are studied 相似文献

6.

A new LP formulation of the admission control problem modelled as an MDP under average reward criterion

Antonio Pietrabissa 《International journal of systems science》2013,44(12):2085-2096

The admission control problem can be modelled as a Markov decision process (MDP) under the average cost criterion and formulated as a linear programming (LP) problem. The LP formulation is attractive in the present and future communication networks, which support an increasing number of classes of service, since it can be used to explicitly control class-level requirements, such as class blocking probabilities. On the other hand, the LP formulation suffers from scalability problems as the number C of classes increases. This article proposes a new LP formulation, which, even if it does not introduce any approximation, is much more scalable: the problem size reduction with respect to the standard LP formulation is O((C?+?1)²/2 ^C ). Theoretical and numerical simulation results prove the effectiveness of the proposed approach. 相似文献

7.

一种可信的自适应服务组合机制 总被引：7，自引：0，他引：7

郭慧鹏怀进鹏邓婷李扬《计算机学报》2008,31(8)

提出一种可信的自适应服务组合机制.首先,将组合服务的可信性保证问题转换为自适应控制问题,可信性保证策略作为可调节控制器,组合服务作为被控对象,并设计了相应的系统结构;其次,在马尔可夫决策过程框架下建模和优化组合服务的可信维护过程和策略,并设计了相应的算法,实现了基于强化学习的直接自适应控制机制;最后,通过仿真实验,将组合服务的自适应维护与随机维护策略比较,表明组合服务的自适应维护具有明显的优越性. 相似文献

8.

Admission control with elastic QoS for video on demand systems

Fu-Shou Lin Bao-Qun Yin Jing Huang Xu-Min Wu 《国际自动化与计算杂志》2012,9(5):467-473

In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy. 相似文献

9.

Optimal control of infinite horizon partially observable decision processes modelled as generators of probabilistic regular languages

Ishanu Chattopadhyay Asok Ray 《International journal of control》2013,86(3):457-483

Decision processes with incomplete state feedback have been traditionally modelled as partially observable Markov decision processes. In this article, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalises the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a framework is far more computationally tractable to the classical alternative. In particular, we show that the infinite horizon decision problem under partial observation, modelled in the proposed framework, is λ-approximable and, in general, is not harder to solve compared to the fully observable case. The approach is illustrated via two simple examples. 相似文献

10.

Policy iteration based feedback control

Kan-Jian Zhang Author Vitae Yan-Kai Xu Author Vitae Xi Chen Author Vitae Xi-Ren Cao Author Vitae 《Automatica》2008,44(4):1055-1061

It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iteration approach in MDPs to the optimal control problem of stochastic systems. We first provide an optimality equation based on performance potentials and develop a policy iteration procedure. Then we apply policy iteration to the jump linear quadratic problem and obtain the coupled Riccati equations for their optimal solutions. The approach is applicable to linear as well as nonlinear systems and can be implemented on-line on real world systems without identifying all the system structure and parameters. 相似文献

11.

基于MDP支持弹性服务质量的接入控制

林福寿殷保群黄静巫旭敏《微计算机应用》2012,1(2):16-21

在网络服务系统中,满足业务请求的服务质量需求是系统要解决的主要问题之一。接入控制方法和资源分配策略常用来保证业务的服务质量要求。本文基于Markov决策过程(MDP)对视频点播(VOD)系统进行建模,同时考虑了弹性服务质量这一机制。弹性服务质量可以用一个QoS的需求范围来体现。策略梯度算法常用来解决MDP问题,它能够以比较好的速度收敛到最优解。通过算法实例对本文的接入控制方法进行性能分析,发现所采用的方法相对于一般的完全接入策略具有较优的性能。相似文献

12.

双轮驱动移动机器人的学习控制器设计方法* 总被引：1，自引：0，他引：1

张洪宇徐昕张鹏程刘春明宋金泽《计算机应用研究》2009,26(6):2310-2313

提出一种基于增强学习的双轮驱动移动机器人路径跟随控制方法,通过将机器人运动控制器的优化设计问题建模为Markov决策过程,采用基于核的最小二乘策略迭代算法(KLSPI)实现控制器参数的自学习优化。与传统表格型和基于神经网络的增强学习方法不同,KLSPI算法在策略评价中应用核方法进行特征选择和值函数逼近,从而提高了泛化性能和学习效率。仿真结果表明,该方法通过较少次数的迭代就可以获得优化的路径跟随控制策略,有利于在实际应用中的推广。相似文献

13.

燃料电池汽车最优氢耗马尔科夫决策控制

付江涛付主木宋书中《控制理论与应用》2021,38(8):1219-1228

本文基于马尔科夫决策过程提出一种燃料电池汽车最优等效氢燃料消耗控制策略.控制策略以部分观测量为基础,以马尔科夫转移概率矩阵为条件,采用基于蒙特卡洛马尔科夫(MCMC)算法的Metropolis-Hastings采样方法,获得平均奖励输出,进而通过最优氢燃料消耗代价函数的优化以控制在氢燃料电池系统和动力电池系统间进行能量分配.该策略避免了目前燃料电池汽车控制策略过度依赖未来需求功率的预测以及预测模型的准确性.在建立燃料电池汽车动力模型,燃料电池系统和动力电池系统模型的基础上,进行了包含自学习系统、基于MH采样的平均奖励过滤系统以及控制选择输出系统的控制策略设计.通过仿真和实验结果表明基于马尔科夫决策控制策略的有效性. 相似文献

14.

Dynamic vehicle allocation control for automated material handling system in semiconductor manufacturing

James T. Lin Cheng-Hung Wu Chih-Wei Huang 《Computers & Operations Research》2013

The current study examines the dynamic vehicle allocation problems of the automated material handling system (AMHS) in semiconductor manufacturing. With the uncertainty involved in wafer lot movement, dynamically allocating vehicles to each intrabay is very difficult. The cycle time and overall tool productivity of the wafer lots are affected when a vehicle takes too long to arrive. In the current study, a Markov decision model is developed to study the vehicle allocation control problem in the AMHS. The objective is to minimize the sum of the expected long-run average transport job waiting cost. An interesting exhaustive structure in the optimal vehicle allocation control is found in accordance with the Markov decision model. Based on this exhaustive structure, an efficient algorithm is then developed to solve the vehicle allocation control problem numerically. The performance of the proposed method is verified by a simulation study. Compared with other methods, the proposed method can significantly reduce the waiting cost of wafer lots for AMHS vehicle transportation. 相似文献

15.

线性混杂系统优化控制的Monte Carlo统计预测方法

宋春跃 WANG Hui 李平《自动化学报》2008,34(8):1028-1032

针对含扩散项的线性混杂切换系统优化控制问题, 为降低优化求解的计算复杂性, 提出了Monte Carlo统计预测方法. 首先通过数值求解技术把连续时间优化控制问题转化为离散时间的Markov决策过程问题; 然后在若干有限状态子空间内, 利用反射边界技术来求解相应子空间的最优控制策略; 最后根据最优控制策略的结构特性, 采用统计预测方法来预测出整个状态空间的最优控制策略. 该方法能有效降低求解涉及大状态空间及多维变量的线性混杂切换系统优化控制的计算复杂性, 文末的仿真结果验证了方法的有效性. 相似文献

16.

基于模型深度强化学习的数据中心主动地板控制

温建伟张立段彦夺李雷孝《控制理论与应用》2022,39(6):1051-1056

如何消除数据中心的局部热点是困扰数据中心行业的关键问题之一.本文采用主动地板(AVT)来抑制局部机架热点现象,并将数据中心AVT控制问题抽象为马尔可夫决策过程,设计了基于深度强化学习的主动地板最优控制策略.该策略基于模型深度强化学习方法,克服了传统无模型深度强化学习方法采样效率低的缺陷.大量仿真实验结果表明,与经典无模型(PPO)方法相比,所提出的方法可迅速收敛到最优控制策略,并可以有效抑制机架热点现象. 相似文献

17.

New Approaches to Solving Discrete Programming Problems on the Basis of Lexicographic Search

S.?V.?Chupov Email author 《Cybernetics and Systems Analysis》2016,52(4):536-545

New approaches are proposed to the solution of discrete programming problems on the basis of searching for lexicographic vector ordering for which the optimal solution to a problem coincides with the lexicographic extremum of the set of feasible solutions of the problem or is located rather close to it in the lexicographic sense. A generalized scheme of this lexicographic search and possibilities of its modification are described. Considerable advantages of this approach over the standard lexicographic search algorithm in efficiency are illustrated. 相似文献

18.

WLAN中基于混合模式的接纳控制算法 总被引：1，自引：0，他引：1

孟曼刘宴兵《计算机应用》2010,30(6):1451-1454

针对WLAN标准IEEE 802.11e中 EDCA不能提供定量的服务质量(QoS)的问题,提出了一种混合模式(基于模型和测量)的接纳控制算法。通过建立退避实例的各个状态转移的Markov模型,利用贝塞尔削减规则得出网络性能指标的解析表达式,并根据测量的信道的实时状况预测新业务流可获得的吞吐量,最后提出了一种基于吞吐量的接纳控制算法。实验结果表明,该算法保证了已经接入业务流的服务质量,同时接纳了更多的新业务流,提高了网络的吞吐量。相似文献

19.

Partially‐observable stochastic hybrid systems (poshss) state estimation and optimal control

下载免费PDF全文

Weiyi Liu Inseok Hwang 《Asian journal of control》2015,17(6):2072-2082

This paper discusses the state estimation and optimal control problem of a class of partially‐observable stochastic hybrid systems (POSHS). The POSHS has interacting continuous and discrete dynamics with uncertainties. The continuous dynamics are given by a Markov‐jump linear system and the discrete dynamics are defined by a Markov chain whose transition probabilities are dependent on the continuous state via guard conditions. The only information available to the controller are noisy measurements of the continuous state. To solve the optimal control problem, a separable control scheme is applied: the controller estimates the continuous and discrete states of the POSHS using noisy measurements and computes the optimal control input from the state estimates. Since computing both optimal state estimates and optimal control inputs are intractable, this paper proposes computationally efficient algorithms to solve this problem numerically. The proposed hybrid estimation algorithm is able to handle state‐dependent Markov transitions and compute Gaussian‐ mixture distributions as the state estimates. With the computed state estimates, a reinforcement learning algorithm defined on a function space is proposed. This approach is based on Monte Carlo sampling and integration on a function space containing all the probability distributions of the hybrid state estimates. Finally, the proposed algorithm is tested via numerical simulations. 相似文献

20.

SOLVING THE SAILING PROBLEM WITH A NEW PRIORITIZED VALUE ITERATION

M. G. Garcia-Hernandez J. Ruiz-Pinales E. Onaindia A. Reyes-Ballesteros 《Applied Artificial Intelligence》2013,27(6):571-587

In this paper we tackle the sailing strategies problem, a stochastic shortest-path Markov decision process. The problem of solving large Markov decision processes accurately and quickly is challenging. Because the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings, but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm, which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose improved value iteration algorithms based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. 相似文献