首页 | 本学科首页   官方微博 | 高级检索  
     

聚类多Agent强化学习认知无线电资源分配
引用本文:伍,春,江,虹,易克初.聚类多Agent强化学习认知无线电资源分配[J].北京邮电大学学报,2014,37(1):80-84.
作者姓名:        易克初
作者单位:1. 西安电子科技大学 综合业务网理论及关键技术国家重点实验室, 西安 710071;
2. 西南科技大学 国防科技学院, 四川 绵阳 621000
基金项目:国家自然科学基金项目(61379005);国家重点基础研究发展计划项目(2009CB320403);国家科技重大专项基金项目(2009ZX03007-004);西安电子科技大学ISN实验室开放课题(ISN10-09)
摘    要:针对认知无线电多用户的信道和功率资源分配问题,提出一种基于用户聚类和可变学习速率的多Agent强化学习方法. 首先使用分层处理分离信道选择与功率控制,采用快速最优搜索结合用户数均衡调节实现信道分配;其次,使用随机博弈框架对多用户功率控制问题进行建模,通过K均值用户聚类减少博弈参与用户数量和降低单个用户的环境复杂度,并使用可变Q学习速率和策略学习速率的方法进一步促进多Agent强化学习的收敛. 仿真结果表明,该方法能使多个用户的功率状态和总收益有效收敛,并且使整体性能达到次优.

关 键 词:认知无线电  多Agent强化学习  聚类  功率控制  可变学习速率  
收稿时间:2013-03-13

Cognitive Radio Resource Allocation by Clustering Multi-Agent Enforcement Learning
WU Chun,JIANG Hong,YI Ke-chu.Cognitive Radio Resource Allocation by Clustering Multi-Agent Enforcement Learning[J].Journal of Beijing University of Posts and Telecommunications,2014,37(1):80-84.
Authors:WU Chun  JIANG Hong  YI Ke-chu
Affiliation:1. State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an 710071, China;
2. School of National Defense Technology, Southwest University of Science and Technology, Sichuan Mianyang 621000, China
Abstract:A multi-agent enforcement learning method based on user clustering as well as a variable learning rate was proposed for solving the problem of channel allocation and power control within multi cognitive radio users. Firstly, a hierarchy processing method was used to separate channel selection and power control. The channel allocation was implemented by fast optimal search combined with user-number balance. Secondly, stochastic game framework was adopted to model the multiuser power control issue. In subsequent multi-agent enforcement learning, K-means user clustering method was employed to reduce the user number in game and single user's environment complexity, and a variable learning rate scheme for Q learning and policy learning was proposed to promote the convergence of multiuser learning. Simulation shows that the method can make multiuser's power status and global reward converging effectively, moreover the whole performance can reach sub-optimal.
Keywords:cognitive radio  multi-agent enforcement learning  clustering  power control  variable learning rate  
本文献已被 CNKI 等数据库收录!
点击此处可从《北京邮电大学学报》浏览原始摘要信息
点击此处可从《北京邮电大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号