首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进指针网络的卫星对地观测任务规划方法
引用本文:马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401. DOI: 10.3785/j.issn.1008-973X.2021.02.020
作者姓名:马一凡  赵凡宇  王鑫  金仲和
作者单位:1. 浙江大学 微小卫星研究中心,浙江 杭州 3100272. 浙江省微纳卫星研究重点实验室,浙江 杭州 310027
基金项目:国家杰出青年科学基金资助项目(61525403)
摘    要:针对卫星观测任务规划问题约束复杂、求解空间大和输入任务序列长度不固定的特点,使用深度强化学习(DRL)方法对卫星观测任务规划问题进行求解. 综合考虑时间窗口约束、任务间转移机动时间和卫星电量、存储约束,对卫星观测任务规划问题进行建模. 基于指针网络(PN)的运行机制建立序列决策算法模型,使用Mask向量来考虑卫星观测任务规划问题中的各类约束,并通过Actor Critic强化学习算法对模型进行训练,以获得最大的收益率. 借鉴多头注意力(MHA)机制的思想对PN进行改进,提出多头注意力指针网络(MHA-PN)算法. 根据实验结果可以看出,MHA-PN算法显著提高了模型的训练速度和泛化性能,训练好的MHA-PN算法模型可以直接对输入序列进行端到端的推理,避免传统启发式算法迭代求解的过程,具有较高的求解效率.

关 键 词:卫星观测任务规划  组合优化问题  深度强化学习  指针网络(PN)  Actor Critic  多头注意力指针网络(MHA-PN)  

Satellite earth observation task planning method based on improved pointer networks
Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks[J]. Journal of Zhejiang University(Engineering Science), 2021, 55(2): 395-401. DOI: 10.3785/j.issn.1008-973X.2021.02.020
Authors:Yi-fan MA  Fan-yu ZHAO  Xin WANG  Zhong-he JIN
Abstract:The satellite observation task planning has the characteristics of complex constraints, large solution space, and unfixed length of input task sequence. The deep reinforcement learning (DRL) method was used to solve the problems. The satellite observation task planning problem was modeled by taking into account the constraints of time windows, transfer time between tasks, and satellite power and memory constraints. A sequence decision algorithm model was established based on the operating mechanism of pointer networks (PN), Mask vector was used to consider various constraints in the satellite observation task planning problem, and the model was trained by Actor Critic reinforcement learning algorithm to obtain the maximum reward. The PN was improved by referring to the multi-head attention (MHA) mechanism, and the multi-head attention pointer networks (MHA-PN) algorithm was proposed. Experimental results show that the MHA-PN algorithm significantly improves the training speed and the generalization performance of the model, and the trained MHA-PN algorithm model can carry out end-to-end reasoning on the input sequence, avoiding the iterative solution process of traditional heuristic algorithm, has a high efficiency of solution.
Keywords:satellite observation task planning  combinatorial optimization problem  deep reinforcement learning  pointer networks (PN)  Actor Critic  multi-head attention pointer networks (MHA-PN)  
点击此处可从《浙江大学学报(工学版)》浏览原始摘要信息
点击此处可从《浙江大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号