聚类多Agent强化学习认知无线电资源分配 Cognitive Radio Resource Allocation by Clustering Multi-Agent Enforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

聚类多Agent强化学习认知无线电资源分配

引用本文：	伍,春,江,虹,易克初.聚类多Agent强化学习认知无线电资源分配[J].北京邮电大学学报,2014,37(1):80-84.

作者姓名：	伍春江虹易克初

作者单位：	1. 西安电子科技大学综合业务网理论及关键技术国家重点实验室, 西安 710071; 2. 西南科技大学国防科技学院, 四川绵阳 621000

基金项目：	国家自然科学基金项目(61379005);国家重点基础研究发展计划项目(2009CB320403);国家科技重大专项基金项目(2009ZX03007-004);西安电子科技大学ISN实验室开放课题(ISN10-09)

摘要：	针对认知无线电多用户的信道和功率资源分配问题，提出一种基于用户聚类和可变学习速率的多Agent强化学习方法. 首先使用分层处理分离信道选择与功率控制，采用快速最优搜索结合用户数均衡调节实现信道分配；其次，使用随机博弈框架对多用户功率控制问题进行建模，通过K均值用户聚类减少博弈参与用户数量和降低单个用户的环境复杂度，并使用可变Q学习速率和策略学习速率的方法进一步促进多Agent强化学习的收敛. 仿真结果表明，该方法能使多个用户的功率状态和总收益有效收敛，并且使整体性能达到次优.
关键词：	认知无线电多Agent强化学习聚类功率控制可变学习速率
收稿时间：	2013-03-13
Cognitive Radio Resource Allocation by Clustering Multi-Agent Enforcement Learning

WU Chun,JIANG Hong,YI Ke-chu.Cognitive Radio Resource Allocation by Clustering Multi-Agent Enforcement Learning[J].Journal of Beijing University of Posts and Telecommunications,2014,37(1):80-84.

Authors:	WU Chun JIANG Hong YI Ke-chu

Affiliation:	1. State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an 710071, China; 2. School of National Defense Technology, Southwest University of Science and Technology, Sichuan Mianyang 621000, China

Abstract:	A multi-agent enforcement learning method based on user clustering as well as a variable learning rate was proposed for solving the problem of channel allocation and power control within multi cognitive radio users. Firstly, a hierarchy processing method was used to separate channel selection and power control. The channel allocation was implemented by fast optimal search combined with user-number balance. Secondly, stochastic game framework was adopted to model the multiuser power control issue. In subsequent multi-agent enforcement learning, K-means user clustering method was employed to reduce the user number in game and single user's environment complexity, and a variable learning rate scheme for Q learning and policy learning was proposed to promote the convergence of multiuser learning. Simulation shows that the method can make multiuser's power status and global reward converging effectively, moreover the whole performance can reach sub-optimal.

Keywords:	cognitive radio multi-agent enforcement learning clustering power control variable learning rate
本文献已被 CNKI 等数据库收录！
	点击此处可从《北京邮电大学学报》浏览原始摘要信息
	点击此处可从《北京邮电大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏