首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
在功率受限的机会频谱接入(OSA)研究中,大多使用完全可观测马尔可夫决策过程(MDP)对环境建模,以提高物理层或介质访问控制(MAC)层指标,但由于感知设备的限制,无法保证用户对环境完全感知。为解决该问题,提出一种基于部分可观测马尔可夫决策过程(POMDP)与Sarsa(λ)的跨层OSA优化设计方案。结合MAC层和物理层,采用POMDP对功率受限且有感知误差的次用户频谱感知和接入过程进行建模,并将其转换为信念状态MDP(BMDP),使用Sarsa(λ)算法对其进行求解。仿真结果表明,在功率受限条件下,该Sarsa(λ)-BMDP方案的有效传输容量、吞吐量和频谱利用率分别比完全可观测Q-MDP方案低9%、7%和3%左右,其误比特率比基于点的值迭代PBVI-POMDP方案低20%左右,比Q-MDP方案高16%左右。  相似文献   

2.
黄永皓  陈曦 《控制与决策》2013,28(11):1643-1649

研究机会式频谱接入技术中次用户对可利用频谱进行探测和接入策略的优化问题. 通过引入事件的概念, 将含有可数无限状态的原问题转化为包含有限个事件的决策问题. 从性能灵敏度的角度出发, 分析不同策略下平均传输率的差异, 给出了基于事件策略的性能差分公式. 以此为基础, 通过合理的近似, 设计了基于事件的策略迭代算法. 仿真示例验证了所提出算法的有效性和近似处理的合理性.

  相似文献   

3.
研究了分布式短波机会频谱接入系统中的信道探测问题。由于频谱资源的稀缺性,将认知无线电技术应用到短波通信得到了广泛关注。多个次级用户按序感知授权信道,根据感知结果决策出授权信道是否可用,利用频谱聚合技术实现数据传输。然而频谱聚合的能力受到无线通信设备的约束。本文提出一种在硬件受限条件下,考虑次级用户间相互影响的动态的停止方法。在该方法中,信道空闲概率能够随着信道探测过程而改变,并且次级用户能够定期地释放先前时隙感知的信道。仿真结果表明,所提的动态停止方法能够有效提高短波通信系统的网络性能。  相似文献   

4.
在认知网络中,为了克服频谱接入方案中系统吞吐量普遍偏低的缺点,运用动态跳频技术,提出一种可以使次用户顺利切换到其他信道同时增加系统吞吐量的频谱接入新方案,并利用部分可观察马尔可夫决策过程描述这一问题;最后进行了数值验证。  相似文献   

5.
基于Q-learning的机会频谱接入信道选择算法   总被引:1,自引:0,他引:1  
针对未知环境下机会频谱接入的信道选择问题进行研究。将智能控制中的Q-learning理论应用于信道选择问题, 建立次用户信道选择模型, 提出了一种基于Q-learning的信道选择算法。该算法通过不断与环境进行交互和学习, 引导次用户尽量选择累积回报最大的信道, 最大化次用户吞吐量。引入Boltzmann学习规则在信道探索与利用之间获得折中。仿真结果表明, 与随机选择算法相比, 该算法在不需要信道环境先验知识或预测模型下, 能够自适应地选择可用性较好的信道, 有效提高次用户吞吐量, 且收敛速度较快。  相似文献   

6.
应用Markov决策过程与性能势相结合的方法,给出了呼叫接入控制的策略优化算法。所得到的最优策略是状态相关的策略,与基于节点已占用带宽决定行动的策略相比,状态相关策略具有更好的性能值,而且该算法具有很快的收敛速度。  相似文献   

7.
应用Markov决策过程与性能势相结合的方法,给出了呼叫接入控制的策略优化算法.所得到的最优策略是状态相关的策略,与基于节点已占用带宽决定行动的策略相比,状态相关策略具有更好的性能值,而且该算法具有很快的收敛速度.  相似文献   

8.
基于MIMD的动态频谱接入方案   总被引:2,自引:2,他引:0  
针对基于退让机制(BCM)的频谱接入方案存在的不足,提出一种基于干扰最小化、需求最大化(MIMD)的动态频谱接入方案,给出相应的MIMD动态频谱接入算法(MIMD-DSA)。使感知用户(Cu)通过学习以往频段的接入经验来选择候选频段,在主用户出现时通过MIMD-DSA算法切换到这些频段。仿真和分析结果表明,与BCM方案相比,该方案能进一步提高频谱利用率。  相似文献   

9.
基于认知无线电(CR)的动态频谱接入技术,近年成为研究热点,但大部分算法都是根据当前授权用户的频谱活动情况来指导认知用户的频谱接入。本文研究了一种基于模糊马尔可夫链预测模型的动态频谱接入算法,该算法结合授权频段空闲时长的统计规律和模糊理论对信道状态进行预测,认知用户结合自身要求和信道状态的预测结果选择接入。仿真结果验证...  相似文献   

10.
【目的】在车载网络边缘计算中,合理地分配频谱资源对改善车辆通讯质量具有重要意义。频谱资源稀缺是影响车辆通讯质量的重要原因之一,车辆的高移动性以及在基站处准确收集信道状态信息的困难给频谱资源分配带来了挑战性。【方法】针对以上问题,优化目标设定为车对车(Vehicle-to-Vehicle,V2V)链路传输速率和车对基础设施(Vehicle-to-Infrastructure,V2I)容量大小,提出一种基于近端策略优化(Proximal Policy Optimization,PPO)强化学习算法的多智能体频谱资源动态分配方案。【结果】面对多个V2V链路共享V2I链路所占用的频谱资源从而缓解频谱稀缺问题。这一问题被进一步制定为马尔可夫决策过程(Markov Decision Process,MDP),并对状态、动作和奖励进行了设计,以优化频谱分配策略。【结论】仿真结果表明,在信道传输速率和车辆信息传递成功率方面,所提出的基于PPO算法的优化方案与基线算法相比具有更优的效果。  相似文献   

11.
We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.  相似文献   

12.
    
This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not greater than a given initial threshold value, and the second is a probability that the total reward is less than it. Our first (resp. second) optimizing problem is to minimize the first (resp. second) threshold probability. These problems suggest that the threshold value is a permissible level of the total reward to reach a goal (the target set), that is, we would reach this set over the level, if possible. For the both problems, we show that 1) the optimal threshold probability is a unique solution to an optimality equation, 2) there exists an optimal deterministic stationary policy, and 3) a value iteration and a policy space iteration are given. In addition, we prove that the first (resp. second) optimal threshold probability is a monotone increasing and right (resp. left) continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the first problem by using them in the second problem.  相似文献   

13.
The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper [Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains.  相似文献   

14.
对于一类利用集中式构架和分布式构架各自优点的分层非结构化P2P系统,通过定义一种Markov切换空间模型来描述其动态分组切换行为.在Markov决策过程理论的基础上,给出了关于性能指标的策略迭代和在线策略迭代算法,并通过实例仿真说明该方法的优越性.  相似文献   

15.
16.
Markov 控制过程在紧致行动集上的迭代优化算法   总被引:5,自引:0,他引:5  
研究一类连续时间Markov控制过程(CTMCP)在紧致行动集上关于平均代价性能准则的优化算法。根据CTMCP的性能势公式和平均代价最优性方程,导出了求解最优或次最优平稳控制策略的策略迭代算法和数值迭代算法,在无需假设迭代算子是sp—压缩的条件下,给出了这两种算法的收敛性证明。最后通过分析一个受控排队网络的例子说明了这种方法的优越性。  相似文献   

17.
This communique provides an exact iterative search algorithm for the NP-hard problem of obtaining an optimal feasible stationary Markovian pure policy that achieves the maximum value averaged over an initial state distribution in finite constrained Markov decision processes. It is based on a novel characterization of the entire feasible policy space and takes the spirit of policy iteration (PI) in that a sequence of monotonically improving feasible policies is generated and converges to an optimal policy in iterations of the size of the policy space at the worst case. Unlike PI, an unconstrained MDP needs to be solved at iterations involved with feasible policies and the current best policy improves all feasible policies included in the union of the policy spaces associated with the unconstrained MDPs.  相似文献   

18.
    
We compare the computational performance of linear programming (LP) and the policy iteration algorithm (PIA) for solving discrete-time infinite-horizon Markov decision process (MDP) models with total expected discounted reward. We use randomly generated test problems as well as a real-life health-care problem to empirically show that, unlike previously reported, barrier methods for LP provide a viable tool for optimally solving such MDPs. The dimensions of comparison include transition probability matrix structure, state and action size, and the LP solution method.  相似文献   

19.
It is known that the performance potentials (or equivalently, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systerns, the changes in parameters may only affect some states, and the explicit transition probability matrix may not be known. In this paper, we use an example to show that we can use potentials to construct performance sensitivities m a more flexible way; only the potentials at the affected states need to be estimated, and the transition probability matrix need not be known. Policy iteration algorithms, which are simpler than the standard one, can be established.  相似文献   

20.
把POMDP作为激励学习(Reinforcement Leaming)问题的模型,对于具有大状态空间问题的求解有比较好的适应性和有效性。但由于其求解的难度远远地超过了一般的Markov决策过程(MDP)的求解,因此还有许多问题有待解决。该文基于这样的背景,在给定一些特殊的约束条件下提出的一种求解POMDP的方法,即求解POMDP的动态合并激励学习算法。该方法利用区域的概念,在环境状态空间上建立一个区域系统,Agent在区域系统的每个区域上独自并行地实现其最优目标,加快了运算速度。然后把各组成部分的最优值函数按一定的方式整合,最后得出POMDP的最优解。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号