首页 | 本学科首页   官方微博 | 高级检索  
     

深度强化学习复原多目标航迹的TOC奖励函数
引用本文:贺亮,徐正国,贾愚,李赟,沈超.深度强化学习复原多目标航迹的TOC奖励函数[J].计算机应用研究,2020,37(6):1626-1632.
作者姓名:贺亮  徐正国  贾愚  李赟  沈超
作者单位:盲信号处理重点实验室,成都 610041;西安交通大学 智能网络与网络安全教育部重点实验室,西安710049
摘    要:针对航迹探测领域中探测器获得的目标地理位置通常是同一帧下无法区分的多目标场景,需要利用目标位置信息还原各航迹并区分各目标的问题进行研究,提出采用深度强化学习复原目标航迹的方法。依据目标航迹的物理特点,提取数学模型,结合目标航迹的方向、曲率等提出轨迹曲率圆(TOC)奖励函数,使深度强化学习能够有效复原多目标航迹并区分各目标。首先描述多目标航迹复原问题,并将问题建模成深度强化学习能够处理的模型;结合TOC奖励函数对多目标航迹复原问题进行实验;最后给出该奖励函数的数学推导和物理解释。实验结果表明,TOC奖励函数驱动下的深度强化网络能够有效还原目标的航迹,在航向和航速方面切合实际目标航迹。

关 键 词:深度强化学习  序贯决策  Q函数  轨迹密切圆
收稿时间:2018/12/27 0:00:00
修稿时间:2020/4/27 0:00:00

Design of TOC reward function in multi-target trajectory recovery with deep reinforcement learning
He Liang,Xu Zhengguo,Jia Yu,Li Yun and Shen Chao.Design of TOC reward function in multi-target trajectory recovery with deep reinforcement learning[J].Application Research of Computers,2020,37(6):1626-1632.
Authors:He Liang  Xu Zhengguo  Jia Yu  Li Yun and Shen Chao
Affiliation:National Key Laboratory of Science and Technology on Blind Signal Processing,,,,
Abstract:It attracts lots of attention in the field of object trajectory detection that detectors always receive several geographical locations without any other information about the targets, and furthermore it comes into a problem to use the geographical location information received by the sensors to reconstruct the trajectories of each target as well as to distinguish the targets in each frame, which is called multi-target trajectory recovery and can be solved by deep reinforcement learning(DRL). This paper implemented a trajectory osculating circle(TOC) reward function based on the mathematical model of the direction and trajectory curvature according to the peculiarity of trajectories in actual. Firstly, it switched the issue of the multi-target trajectory reconstruction into a model which could be appropriate for DRL. Then, it tested DRL with the proposed reward function. Finally, it introduced a mathematical derivation and physical interpretation of the proposed TOC reward function. The experimental result shows that DRL with the TOC reward function can reverse the trajectory effectively, and the trace corresponds well with the actual trajectory.
Keywords:deep reinforcement learning(DRL)  sequential decision  q-function  trajectory osculating circle(TOC)
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号