首页 | 本学科首页   官方微博 | 高级检索  
     

旋翼无人机在移动平台降落的控制参数自学习调节方法
引用本文:张鹏鹏,魏长赟,张恺睿,欧阳勇平.旋翼无人机在移动平台降落的控制参数自学习调节方法[J].智能系统学报,2022,17(5):931-940.
作者姓名:张鹏鹏  魏长赟  张恺睿  欧阳勇平
作者单位:河海大学 机电工程学院,江苏 常州 213022
摘    要:无人机设备能够适应复杂地形,但由于电池容量等原因,无人机无法长时间执行任务。无人机与其他无人系统(无人车、无人船等)协同能够有效提升无人机的工作时间,完成既定任务,当无人机完成任务后,将无人机迅速稳定地降落至移动平台上是一项必要且具有挑战性的工作。针对降落问题,文中提出了基于矫正纠偏COACH(corrective advice communicated humans)方法的深度强化学习比例积分微分(proportional-integral-derivative, PID)方法,为无人机降落至移动平台提供了最优路径。首先在仿真环境中使用矫正纠偏框架对强化学习模型进行训练,然后在仿真环境和真实环境中,使用训练后的模型输出控制参数,最后利用输出参数获得无人机位置控制量。仿真结果和真实无人机实验表明,基于矫正纠偏COACH方法的深度强化学习PID方法优于传统控制方法,且能稳定完成在移动平台上的降落任务。

关 键 词:自主降落  强化学习  路径规划  COACH框架  确定性策略梯度  空地协同  无人机  最优控制

Self-learning approach to control parameter adjustment for quadcopter landing on a moving platform
ZHANG Pengpeng,WEI Changyun,ZHANG Kairui,OUYANG Yongping.Self-learning approach to control parameter adjustment for quadcopter landing on a moving platform[J].CAAL Transactions on Intelligent Systems,2022,17(5):931-940.
Authors:ZHANG Pengpeng  WEI Changyun  ZHANG Kairui  OUYANG Yongping
Affiliation:College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China
Abstract:Unmanned Aerial Vehicle (UAV) is a type of robot that performs well in mapping without being affected by the terrain. However, a UAV cannot perform its tasks for long due to its small battery capacity and several other reasons. The collaboration between UAVs and other unmanned ground vehicles (UGVs) is considered a crucial solution to this concern as it can save up the time taken by UAVs effectively when completing a scheduled task. When deploying a team of UAVs and UGVs, it is both important and challenging to land a UAV on a mobile platform quickly and stably. To circumvent the UAV landing issue, this study proposes a reinforcement learning PID method based on the correction COACH method, thereby providing an optimal path for the UAV to land on a mobile platform. First, the reinforcement learning agent is trained using the rectification framework in a simulated environment. Next, the trained agent is used for output control parameters in the simulated and true environments, and subsequently, the output parameters are utilized to obtain the control variables of the UAV’s position. The simulation and real UAV experiment results show that the deep reinforcement learning PID method based on the correction COACH method is superior to the traditional control method and can accomplish the task of a stable landing on a mobile platform.
Keywords:autonomous landing  reinforcement learning  path planning  COACH frame  deterministic policy gradient  air-ground cooperation  UAV  optimal control
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号