首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 46 毫秒
1.
Markov 控制过程在紧致行动集上的迭代优化算法   总被引:5,自引:0,他引:5  
研究一类连续时间Markov控制过程(CTMCP)在紧致行动集上关于平均代价性能准则的优化算法。根据CTMCP的性能势公式和平均代价最优性方程,导出了求解最优或次最优平稳控制策略的策略迭代算法和数值迭代算法,在无需假设迭代算子是sp—压缩的条件下,给出了这两种算法的收敛性证明。最后通过分析一个受控排队网络的例子说明了这种方法的优越性。  相似文献   

2.
研究了一类具有可数状态空间的Markov控制过程在无限水平平均代价准则下的最优平稳策略问题.对此类过程,引入了折扣Poisson方程,运用无穷小矩阵和性能势的基本性质,导出了平均代价模型在紧致行动集上的最优性方程,并证明了其解的一个存在性定理.  相似文献   

3.
Markov控制过程基于性能势的平均代价最优策略   总被引:3,自引:1,他引:2       下载免费PDF全文
研究了一类离散时间Markov控制过程平均代价性能最优控制决策问题.应用Markov性能势的基本性质,在很一般性的假设条件下,直接导出了无限时间平均代价模型在紧致行动集上的最优性方程及其解的存在性定理.提出了求解最优平稳控制策略的迭代算法,并讨论了这种算法的收敛性问题.最后通过分析一个实例来说明这种算法的应用.  相似文献   

4.
半Markov 控制过程在折扣代价准则下的最优平稳策略   总被引:1,自引:1,他引:0  
讨论一类半Markov控制过程(SMCP)的折扣代价性能优化问题.通过引入一个矩阵,该矩阵可作为一个Markov过程的无穷小矩阵,对一个SMCP定义了折扣Poisson方程,并由这个方程定义了α-势.基于α-势,给出了由最优平稳策略所满足的最优性方程.最后给出一个求解最优平稳策略的迭代算法,并提供一个数值例子以表明该算法的应用.  相似文献   

5.
应用Markov决策过程与性能势相结合的方法,给出了呼叫接入控制的策略优化算法.所得到的最优策略是状态相关的策略,与基于节点已占用带宽决定行动的策略相比,状态相关策略具有更好的性能值,而且该算法具有很快的收敛速度.  相似文献   

6.
逃逸时间算法是生成Julia集最常用的算法,论文针对非线性复映射f(z)=zm+c为迭代函数的情形进行讨论。首先,根据逃逸时间算法的基本原理给出相应的算法步骤;然后,对迭代函数f(z)=zm+c进行了详细研究,从而合理地确定了算法中需要控制的变量Rmax和B(Rmax:判断{fn(z0)}n∞=1有界与否的界限值;B:初始迭代点z0的取值范围)的取值,这样就大大地减少了迭代次数,从而提高了算法的运算效率。  相似文献   

7.
数据库即服务(database as a service, DaaS)作为一种新型的数据存储提供模式被广泛应用.随着大数据时代的到来,数据量急剧增加,DaaS模式下的数据布局问题显得更加重要,即服务提供商如何根据应用中不同数据的性能需求对数据进行合理布局,将会对提高服务质量、增强用户体验和降低自身服务成本产生重要影响.然而对于服务提供者来说提高服务质量和降低服务成本是一对矛盾的目标.提出DaaS模式下的数据布局图概念,应用Pareto最优思想适合于解决多目标矛盾性问题的特点,给出一个基于性能-代价均衡的多节点DaaS数据布局策略.通过与随机策略和贪婪策略等传统策略的实验比较,方法能保证DaaS服务提供商用尽可能少的代价为用户提供更好的服务质量,实现服务质量与资源代价两个目标的均衡.  相似文献   

8.
程奇峰  马奥运 《控制与决策》2016,31(10):1884-1888

针对有界状态干扰下的线性时变系统, 提出一种新的时间最优模型预测控制算法. 在离线情况下通过求解一系列的线性优化问题确定次优的多面体N 步可达集, 根据这些可达集在线优化计算得到的输入量使系统状态尽快收敛到稳定区域. 离线求解多面体可达集的方法可处理非对称约束, 相比于以往的方法避免了在N 增加时顶点数可能呈指数增多的问题, 同时省去了过多复杂的多面体间的运算, 因而便于在实际问题中应用.

  相似文献   

9.
针对子项集时间序列提出一种模式挖掘的数学模型.此模型计算并更新子项的平均频率,并以模式考察时间阈值为周期,计算当前实时频率矢量和模式集中现有实时频率矢量的皮尔松相关性.如果相关系数大,则说明当前模式已经存在于模式集中;如果相关系数小,则说明当前模式是一个新模式,继而加入模式集.此过程持续运行,直至当模式集趋于稳定.另外,本文考察了模式之间的顺序关系,即模式之间的模式.通过设置一个窗口寄存器,并在模式序列矩阵中的对应位置计数加1,模型可以计算出任两个模式之间顺序的支持度和信任度.此模型关注的是提取出子项集的模式、子项集模式之间的模式.此外,通过调节考察时间的阈值,此模型也能提取出子项集模式之中的模式.在实验中,通过模拟子项集序列,我们证明了理论模型的有效性和普适性.结合实践,运用此模型到Web安全上,通过对新浪门户网站的考察和检验,此模型对于防御Web异常问题非常高效.  相似文献   

10.
任意平面域上离散点集的三角化方法   总被引:20,自引:0,他引:20  
本文提出了一种快速、有效的三角化算法,实现了任意平面域上散乱数据的三角化,生成的网络符合Delaunay准则,网格的优化是在网格生成过程中完成的,算法复杂度与点数呈近似线性关系.该算法运用于石油地质勘探领域,成功地解决了包含复杂断层的大规模数据点的三角化问题.  相似文献   

11.
We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.  相似文献   

12.
    
Control of nonlinear systems is challenging in realtime. Decision making, performed many times per second, must ensure system safety. Designing input to perform a task often involves solving a nonlinear system of differential equations, which is a computationally intensive, if not intractable problem. This article proposes sampling-based task learning for controlaffine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs. A quadratic negative definite state-value function implies the existence of a unique maximum of the action-value function at any state. This allows the replacement of the standard greedy policy with a computationally efficient policy approximation that guarantees progression to a goal state without knowledge of the system dynamics. The policy approximation is consistent, i.e., it does not depend on the action samples used to calculate it. This method is appropriate for mechanical systems with high-dimensional input spaces and unknown dynamics performing Constraint-Balancing Tasks. We verify it both in simulation and experimentally for an Unmanned Aerial Vehicles (UAVs) carrying a suspended load, and in simulation, for the rendezvous of heterogeneous robots.   相似文献   

13.
The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper [Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains.  相似文献   

14.
15.
We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronous, model-free algorithm (which can be used on large-scale problems) that hinges on the idea of computing the value function of a given policy and searching over policy space. In the applied operations research community, RL has been used to derive good solutions to problems previously considered intractable. Hence in this paper, we have tested the proposed algorithm on a commercially significant case study related to a real-world problem from the airline industry. It focuses on yield management, which has been hailed as the key factor for generating profits in the airline industry. In the experiments conducted, we use our algorithm with a nearest-neighbor approach to tackle a large state space. We also present a convergence analysis of the algorithm via an ordinary differential equation method.  相似文献   

16.
Design concept is an important wealth-creating activity in companies and infrastructure. However, the process of designing is very complex. Besides, the information required during the conceptual stage is incomplete, imprecise, and fuzzy. Hence, fuzzy set theory should be used to handle linguistic problem at this stage. This paper presents a fuzzy integrated approach to assess the performance of design concepts. And those criteria rating, relative weights and performance levels are captured by fuzzy numbers, and the overall performance of each alternative is calculated through an enhanced fuzzy weighted average (FWA) approach. A practical numerical example is provided to demonstrate the usefulness of this study. In addition, this paper, in order to make computing and ranking results easier to increase the recruiting productivity, develops a computer-based decision support system to help make decisions more efficiently.  相似文献   

17.
The paper considers the problem of finding an optimal control region for a pursuer. This region guarantees that the pursuer wins the game starting from some point that belongs to a prescribed set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号