一种融合噪声网络的深度强化学习通信干扰资源分配算法 A Deep Reinforcement Learning Communication Jamming Resource Allocation Algorithm Fused with Noise Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种融合噪声网络的深度强化学习通信干扰资源分配算法

引用本文：	彭翔,许华,蒋磊,饶宁,宋佰霖.一种融合噪声网络的深度强化学习通信干扰资源分配算法[J].电子与信息学报,2023,45(3):1043-1054.

作者姓名：	彭翔许华蒋磊饶宁宋佰霖

作者单位：	空军工程大学信息与导航学院西安 710077

摘要：	针对传统干扰资源分配算法在处理非线性组合优化问题时需要较完备的先验信息，同时决策维度小，无法满足现代通信对抗要求的问题，该文提出一种融合噪声网络的深度强化学习通信干扰资源分配算法(FNNDRL)。借鉴噪声网络的思想，该算法设计了孪生噪声评估网络，在避免Q值高估的基础上，通过提升评估网络的随机性，保证了训练过程的探索性；基于概率熵的物理意义，设计了基于策略分布熵改进的策略网络损失函数，在最大化累计奖励的同时最大化策略分布熵，避免策略优化过程中收敛到局部最优。仿真结果表明，该算法在解决干扰资源分配问题时优于所对比的平均分配和强化学习方法，同时算法稳定性较高，对高维决策空间适应性强。
关键词：	干扰资源分配深度强化学习噪声网络策略分布熵
收稿时间：	2022-01-13
A Deep Reinforcement Learning Communication Jamming Resource Allocation Algorithm Fused with Noise Network

PENG Xiang,XU Hua,JIANG Lei,RAO Ning,SONG Bailin.A Deep Reinforcement Learning Communication Jamming Resource Allocation Algorithm Fused with Noise Network[J].Journal of Electronics & Information Technology,2023,45(3):1043-1054.

Authors:	PENG Xiang XU Hua JIANG Lei RAO Ning SONG Bailin

Affiliation:	Information and Navigation College, Air Force Engineering University, Xi’an 710077, China

Abstract:	To solve the problem that the traditional jamming resource allocation algorithm needs relatively complete prior information when dealing with nonlinear combinatorial optimization problems, and meanwhile, the decision dimension is small, which can not meet the requirements of modern communication countermeasures, a Deep Reinforcement Learning communication jamming resource allocation algorithm Fused with Noise Network (FNNDRL) is proposed. Using the idea of noise network for reference, twin noise evaluation network, which can avoid the overestimation of Q value and improve the randomness of evaluation network to ensure the exploration of training process is designed by the algorithm. Based on the physical significance of the probability entropy, an improved strategy network loss function based on the strategy distribution entropy is designed to maximize the cumulative reward and the strategy distribution entropy to avoid convergence to local optimal in the process of strategy optimization. The simulation results show that the proposed algorithm is superior to the average allocation and reinforcement learning methods in solving the problem of jamming resource allocation. Meanwhile, the algorithm has high stability and strong adaptability to high-dimensional decision space.

Keywords:

	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏