首页 | 本学科首页   官方微博 | 高级检索  
     

频分多址系统分布式强化学习功率控制方法
引用本文:李烨,司轲.频分多址系统分布式强化学习功率控制方法[J].计算机应用研究,2023,40(12).
作者姓名:李烨  司轲
作者单位:上海理工大学,上海理工大学
基金项目:华为技术有限公司合作资助项目(YBN2019115054)
摘    要:近年来,深度强化学习作为一种无模型的资源分配方法被用于解决无线网络中的同信道干扰问题。然而,基于常规经验回放策略的网络难以学习到有价值的经验,导致收敛速度较慢;而人工划定探索步长的方式没有考虑算法在每个训练周期上的学习情况,使得对环境的探索存在盲目性,限制了系统频谱效率的提升。对此,提出一种频分多址系统的分布式强化学习功率控制方法,采用优先经验回放策略,鼓励智能体从环境中学习更重要的数据,以加速学习过程;并且设计了一种适用于分布式强化学习、动态调整步长的探索策略,使智能体得以根据自身学习情况探索本地环境,减少人为设定步长带来的盲目性。实验结果表明,相比于现有算法,所提方法加快了收敛速度,提高了移动场景下的同信道干扰抑制能力,在大型网络中具有更高的性能。

关 键 词:分布式强化学习    频分多址系统    功率控制    贪心策略    优先经验回放    动态步长调整
收稿时间:2023/3/15 0:00:00
修稿时间:2023/6/13 0:00:00

Distributed reinforcement learning based power control for frequency division multiple access systems
LiYe and SiKe.Distributed reinforcement learning based power control for frequency division multiple access systems[J].Application Research of Computers,2023,40(12).
Authors:LiYe and SiKe
Affiliation:University of Shanghai for Science and Technology,
Abstract:In recent years, deep reinforcement learning has been used as a model-free resource allocation method to solve the problem of co-channel interference in wireless networks. However, networks based on conventional experience replay strategies are difficult to learn valuable experiences, resulting in slower convergence speed; The manual method of determining the exploration step size does not take into account the learning situation of the algorithm in each training cycle, resulting in blind exploration of the environment and limited improvement of the system spectral efficiency. This paper proposed a distributed reinforcement learning power control method for frequency division multiple access systems, which adopted a priority experience replay strategy to encourage agents to learn more important data from the environment to accelerate the learning process. Moreover, this paper designed an exploration strategy with dynamic adjustment of step size suitable for distributed reinforcement learning. The strategy allowed agents to explore the local environment based on their own learning situation and hence reduced the blindness caused by manually setting step sizes. The experimental results show that compared to existing algorithms, the proposed method accelerates the convergence speed, improves the ability of co-channel interference suppression in mobile scenarios, and gains higher performance in large networks.
Keywords:distributed reinforcement learning  frequency division multiple access system  power control  greedy strategy  priority experience replay  dynamic step size adjustment
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号