排序方式: 共有11条查询结果,搜索用时 31 毫秒
1.
分层强化学习研究进展* 总被引:1,自引:0,他引:1
首先介绍了半马尔可夫决策过程、分层与抽象等分层强化学习的理论基础;其次,较全面地比较HAM、options、MAXQ和HEXQ四种典型的学习方法,从典型学习方法的拓展、学习分层、部分感知马尔可夫决策过程、并发和多agent合作等方面讨论分层强化学习的研究现状;最后指出分层强化学习未来的发展方向。 相似文献
2.
下一代网络(NGN)将融合多种异构无线接入网络。为了在满足QoS限制下,最大化网络收益,在对WLAN/CDMA等效带宽的研究基础上,提出一种基于SMDP(半马尔可夫决策规划)的最优的联合呼叫接入控制(JCAC)方案,方案考虑了WLAN和CDMA网络间的相互影响,并将网络连接的联合呼叫控制问题等效成一个半马尔可夫决策过程,仿真表明方案相对于离散时间的MDP和在MDP基础上的JCAC算法具有明显的优势。 相似文献
3.
4.
运用基于性能势的M步向前(look-ahead)异步策略迭代算法研究了半Markov决策过程(SMDP)优化问题。首先给出了基于性能势理论求解的一种M步向前策略迭代算法。该算法不仅对标准策略迭代算法和一般的异步策略迭代算法都适用,而且对SMDP在折扣和平均准则下的优化也是统一的;另外给出了两种性能准则下基于即时差分学习的M步向前仿真策略迭代。最后通过一个数值算例比较了各种算法的特点。 相似文献
5.
6.
In heterogeneous wireless networks, both terminal heterogeneity and network heterogeneity give rise to the fairness problem of resource allocation. Due to the capability of exploiting the resources of multiple networks, the behavior of multi-mode terminals will have a great effect on single-mode terminals, and this influence becomes more severe when considering the different demands of different traffic. In this article, we propose a novel joint call admission control (JCAC) scheme to address this problem. The JCAC problem is modeled as a semi-Markov decision process (SMDP) with the aim of maximizing the average network revenue under tile constraints of the fairness among different terminals and traffic classes. Based on the SMDP, we design an algorithm to achieve a good tradeoff between revenue and fairness by dynamically adjusting the threshold of fairness constraints imposed on heterogeneous terminals. Simulation results show that the proposed scheme can significantly improve the fairness among heterogeneous terminals and guarantee the priority and fairness among different traffic classes with little loss of network revenue compared with other schemes. 相似文献
7.
考虑含光伏发电装置、储能装置和柴油发电机组的独立微网系统,以提高微网长期运行经济性为目标,研究微网能量管理优化问题。首先对系统的随机动态特性进行建模,即针对光伏发电和负荷变化的随机特性,将微网系统的能量控制建模为半马尔可夫决策过程(SMDP);然后采用随机动态规划算法对最优策略进行求解,得到微网在不同的光伏发电功率、负荷需求、储能荷电状态等级和柴油发电机组运行数量下对柴油发电机组和储能装置的最优控制行动。仿真结果说明了所建随机模型的合理性和优化方法的有效性。 相似文献
8.
准入控制是码分多址(CDMA)蜂窝网络中服务质量保证的一个关键技术.该文提出了一个基于半马尔可夫决策过程理论的最优准入控制策略来支持有服务质量要求的多类业务的无线CDMA网络.用线性规划方法求解最优策略,从而在满足服务质量约束要求的同时最大化信道利用率.另外,还使用了加权公平阻塞约束来灵活地实现服务质量要求.数值结果表明此最优策略可以获得比基于阈值的准入控制方案更好的性能. 相似文献
9.
We consider the look-ahead control of a conveyor-serviced production station (CSPS) in the context of semi-Markov decision process (SMDP) model, and our goal is to find an optimal control policy under either average- or discounted-cost criteria. Policy iteration (PI), combined with the concept of performance potential, can be applied to provide a unified optimisation framework for both criteria. However, a major difficulty arises in the exact solution scheme, that is, it requires not only the full knowledge of model parameters, but also a considerable amount of work to obtain and process the necessary system and performance matrices. To overcome this difficulty, we propose a potential-based online PI algorithm in this article. During implementation, by analysing and utilising the historic information of all the past operation of a practical CSPS system, the potentials and state-action values are learned on line through an effective exploration scheme. We finally illustrate the successful application of this learning-based technique in CSPS systems by an example. 相似文献
10.
为了异构无线网络中不同无线技术的融合协同工作可以为终端接人提供随时随地的高质量的网络服务,联合呼叫接纳控制机制是适用于异构无线网络的判决呼叫准入的机制.提出一种优化的联合呼叫接纳控制机制用于异构无线网络,将新的呼叫和切换呼叫作为触发接纳控制的事件,对影响服务质量的参数和接纳控制开销进行网络效用建模分析,使用改进的值迭代算法来降低计算复杂度,将阈值函数分成多个域,来获得最优决策策略.提出的联合接入控制策略可提供优化服务质量保证,节约整个网络的能耗成本,并且有效降低呼叫阻塞率和切换掉线率.实验结果表明:算法具有较高的求解有效性,而且求解速度快,具有较高的推广应用价值. 相似文献