共查询到20条相似文献,搜索用时 15 毫秒
1.
Antonio Pietrabissa 《International journal of control》2013,86(10):1814-1827
This article presents a connection admission control (CAC) algorithm for UMTS networks based on the Markov decision process (MDP) approach. To deal with the non-stationary environment due to the time-varying statistical characteristics of the offered traffic, the admission policy has to be computed periodically based on on-line measurements, and the optimal policy computation is excessively time-consuming to be performed on-line. Thus, this article proposes a reduction of the policy space coupled with an aggregation of the state space for the fast computation of a sub-optimal admission policy. Theoretical results and numerical simulations show the effectiveness of the proposed approach. 相似文献
2.
3.
Jatinder N. D. Gupta 《Computers & Operations Research》1978,5(4):243-250
A search algorithm, based on the concepts of lexicographic search and sequential decision processes, is proposed for the solution of the traveling salesman problem. Starting with an initial trial solution, the search algorithm sequentially generates better tours until an optimal (least cost) tour is identified. The logical structure of the search algorithm is such that the computational effort required to solve a problem by the proposed approach is less than that by the branch and bound procedures. 相似文献
4.
Li Xia 《Asian journal of control》2014,16(6):1735-1743
In the theory of event‐based optimization (EBO), the decision making is triggered by events, which is different from the traditional state‐based control in Markov decision processes (MDP). In this paper, we propose a policy gradient approach of EBO. First, an equation of performance gradient in the event‐based policy space is derived based on a fundamental quantity called Q‐factors of EBO. With the performance gradient, we can find the local optimum of EBO using the gradient‐based algorithm. Compared to the policy iteration approach in EBO, this policy gradient approach does not require restrictive conditions and it has a wider application scenario. The policy gradient approach is further implemented based on the online estimation of Q‐factors. This approach does not require the prior information about the system parameters, such as the transition probability. Finally, we use an EBO model to formulate the admission control problem and demonstrate the main idea of this paper. Such online algorithm provides an effective implementation of the EBO theory in practice. 相似文献
5.
A unified approach to the asymptotic analysis of a Markov decision process disturbed by an ϵ-additive perturbation is proposed. Irrespective of whether the perturbation is regular or singular, the underlying control problem that needs to be understood is the limit Markov control problem. The properties of this problem are studied 相似文献
6.
Antonio Pietrabissa 《International journal of systems science》2013,44(12):2085-2096
The admission control problem can be modelled as a Markov decision process (MDP) under the average cost criterion and formulated as a linear programming (LP) problem. The LP formulation is attractive in the present and future communication networks, which support an increasing number of classes of service, since it can be used to explicitly control class-level requirements, such as class blocking probabilities. On the other hand, the LP formulation suffers from scalability problems as the number C of classes increases. This article proposes a new LP formulation, which, even if it does not introduce any approximation, is much more scalable: the problem size reduction with respect to the standard LP formulation is O((C?+?1)2/2 C ). Theoretical and numerical simulation results prove the effectiveness of the proposed approach. 相似文献
7.
8.
In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy. 相似文献
9.
Decision processes with incomplete state feedback have been traditionally modelled as partially observable Markov decision processes. In this article, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalises the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a framework is far more computationally tractable to the classical alternative. In particular, we show that the infinite horizon decision problem under partial observation, modelled in the proposed framework, is λ-approximable and, in general, is not harder to solve compared to the fully observable case. The approach is illustrated via two simple examples. 相似文献
10.
Kan-Jian Zhang Author Vitae Yan-Kai Xu Author Vitae Xi Chen Author Vitae Xi-Ren Cao Author Vitae 《Automatica》2008,44(4):1055-1061
It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iteration approach in MDPs to the optimal control problem of stochastic systems. We first provide an optimality equation based on performance potentials and develop a policy iteration procedure. Then we apply policy iteration to the jump linear quadratic problem and obtain the coupled Riccati equations for their optimal solutions. The approach is applicable to linear as well as nonlinear systems and can be implemented on-line on real world systems without identifying all the system structure and parameters. 相似文献
11.
12.
双轮驱动移动机器人的学习控制器设计方法* 总被引:1,自引:0,他引:1
提出一种基于增强学习的双轮驱动移动机器人路径跟随控制方法,通过将机器人运动控制器的优化设计问题建模为Markov决策过程,采用基于核的最小二乘策略迭代算法(KLSPI)实现控制器参数的自学习优化。与传统表格型和基于神经网络的增强学习方法不同,KLSPI算法在策略评价中应用核方法进行特征选择和值函数逼近,从而提高了泛化性能和学习效率。仿真结果表明,该方法通过较少次数的迭代就可以获得优化的路径跟随控制策略,有利于在实际应用中的推广。 相似文献
13.
本文基于马尔科夫决策过程提出一种燃料电池汽车最优等效氢燃料消耗控制策略.控制策略以部分观测量为基础,以马尔科夫转移概率矩阵为条件,采用基于蒙特卡洛马尔科夫(MCMC)算法的Metropolis-Hastings采样方法,获得平均奖励输出,进而通过最优氢燃料消耗代价函数的优化以控制在氢燃料电池系统和动力电池系统间进行能量分配.该策略避免了目前燃料电池汽车控制策略过度依赖未来需求功率的预测以及预测模型的准确性.在建立燃料电池汽车动力模型,燃料电池系统和动力电池系统模型的基础上,进行了包含自学习系统、基于MH采样的平均奖励过滤系统以及控制选择输出系统的控制策略设计.通过仿真和实验结果表明基于马尔科夫决策控制策略的有效性. 相似文献
14.
The current study examines the dynamic vehicle allocation problems of the automated material handling system (AMHS) in semiconductor manufacturing. With the uncertainty involved in wafer lot movement, dynamically allocating vehicles to each intrabay is very difficult. The cycle time and overall tool productivity of the wafer lots are affected when a vehicle takes too long to arrive. In the current study, a Markov decision model is developed to study the vehicle allocation control problem in the AMHS. The objective is to minimize the sum of the expected long-run average transport job waiting cost. An interesting exhaustive structure in the optimal vehicle allocation control is found in accordance with the Markov decision model. Based on this exhaustive structure, an efficient algorithm is then developed to solve the vehicle allocation control problem numerically. The performance of the proposed method is verified by a simulation study. Compared with other methods, the proposed method can significantly reduce the waiting cost of wafer lots for AMHS vehicle transportation. 相似文献
15.
针对含扩散项的线性混杂切换系统优化控制问题, 为降低优化求解的计算复杂性, 提出了Monte Carlo统计预测方法. 首先通过数值求解技术把连续时间优化控制问题转化为离散时间的Markov决策过程问题; 然后在若干有限状态子空间内, 利用反射边界技术来求解相应子空间的最优控制策略; 最后根据最优控制策略的结构特性, 采用统计预测方法来预测出整个状态空间的最优控制策略. 该方法能有效降低求解涉及大状态空间及多维变量的线性混杂切换系统优化控制的计算复杂性, 文末的仿真结果验证了方法的有效性. 相似文献
16.
17.
New approaches are proposed to the solution of discrete programming problems on the basis of searching for lexicographic vector ordering for which the optimal solution to a problem coincides with the lexicographic extremum of the set of feasible solutions of the problem or is located rather close to it in the lexicographic sense. A generalized scheme of this lexicographic search and possibilities of its modification are described. Considerable advantages of this approach over the standard lexicographic search algorithm in efficiency are illustrated. 相似文献
18.
WLAN中基于混合模式的接纳控制算法 总被引:1,自引:0,他引:1
针对WLAN标准IEEE 802.11e中 EDCA不能提供定量的服务质量(QoS)的问题,提出了一种混合模式(基于模型和测量)的接纳控制算法。通过建立退避实例的各个状态转移的Markov模型,利用贝塞尔削减规则得出网络性能指标的解析表达式,并根据测量的信道的实时状况预测新业务流可获得的吞吐量,最后提出了一种基于吞吐量的接纳控制算法。实验结果表明,该算法保证了已经接入业务流的服务质量,同时接纳了更多的新业务流,提高了网络的吞吐量。 相似文献
19.
Partially‐observable stochastic hybrid systems (poshss) state estimation and optimal control 下载免费PDF全文
This paper discusses the state estimation and optimal control problem of a class of partially‐observable stochastic hybrid systems (POSHS). The POSHS has interacting continuous and discrete dynamics with uncertainties. The continuous dynamics are given by a Markov‐jump linear system and the discrete dynamics are defined by a Markov chain whose transition probabilities are dependent on the continuous state via guard conditions. The only information available to the controller are noisy measurements of the continuous state. To solve the optimal control problem, a separable control scheme is applied: the controller estimates the continuous and discrete states of the POSHS using noisy measurements and computes the optimal control input from the state estimates. Since computing both optimal state estimates and optimal control inputs are intractable, this paper proposes computationally efficient algorithms to solve this problem numerically. The proposed hybrid estimation algorithm is able to handle state‐dependent Markov transitions and compute Gaussian‐ mixture distributions as the state estimates. With the computed state estimates, a reinforcement learning algorithm defined on a function space is proposed. This approach is based on Monte Carlo sampling and integration on a function space containing all the probability distributions of the hybrid state estimates. Finally, the proposed algorithm is tested via numerical simulations. 相似文献
20.
M. G. Garcia-Hernandez J. Ruiz-Pinales E. Onaindia A. Reyes-Ballesteros 《Applied Artificial Intelligence》2013,27(6):571-587
In this paper we tackle the sailing strategies problem, a stochastic shortest-path Markov decision process. The problem of solving large Markov decision processes accurately and quickly is challenging. Because the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings, but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm, which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose improved value iteration algorithms based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. 相似文献