首页 | 本学科首页   官方微博 | 高级检索  
     

基于主动风险防御机制的多机器人强化学习协同对抗策略
引用本文:孙辉辉,胡春鹤,张军国.基于主动风险防御机制的多机器人强化学习协同对抗策略[J].控制与决策,2023,38(5):1420-1429.
作者姓名:孙辉辉  胡春鹤  张军国
作者单位:北京林业大学 工学院,北京 100083;华北科技学院 机电工程学院,河北 廊坊 065201;北京林业大学 工学院,北京 100083;国家林业和草原局林业装备与自动化重点实验室,北京 100083
基金项目:国家自然科学基金项目(61703047);河北省高等学校科学技术研究项目(QN2021312).
摘    要:深度强化学习因其在多机器人系统中的高效表现,已经成为多机器人领域的研究热点.然而,当遭遇连续时变、风险未知的非结构场景时,传统方法暴露出风险防御能力差、系统安全性能脆弱的问题,未知风险将以对抗攻击的形式给多机器人的状态空间带来非线性入侵.针对这一问题,提出一种基于主动风险防御机制的多机器人强化学习方法(APMARL).首先,基于局部可观察马尔可夫博弈模型,建立多机记忆池共享的风险判别机制,通过构建风险状态指数提前预测当前行为的安全性,并根据风险预测结果自适应执行与之匹配的风险处理模式;特别地,针对有风险侵入的非安全状态,提出基于增强型注意力机制的Actor-Critic主动防御网络架构,实现对重点信息的分级增强和危险信息的有效防御.最后,通过广泛的多机协作对抗任务实验表明,具有主动风险防御机制的强化学习策略可以有效降低敌对信息的入侵风险,提高多机器人协同对抗任务的执行效率,增强策略的稳定性和安全性.

关 键 词:深度强化学习  多机器人  风险防御  协同对抗  事件驱动

Cooperative countermeasure strategy based on active risk defense multi-agent reinforcement learning
SUN Hui-hui,HU Chun-he,ZHANG Jun-guo.Cooperative countermeasure strategy based on active risk defense multi-agent reinforcement learning[J].Control and Decision,2023,38(5):1420-1429.
Authors:SUN Hui-hui  HU Chun-he  ZHANG Jun-guo
Affiliation:School of Technology,Beijing Forestry University,Beijing 100083,China;School of Mechanical and Electrical Engineering,North China Institute of Science and Technology,Langfang 065201,China;School of Technology,Beijing Forestry University,Beijing 100083,China;Key Lab of State Forestry and Grassland Administration for Forestry Equipment and Automation,Beijing 100083,China
Abstract:Deep reinforcement learning(DRL) has become a hotspot in the field of multi-robot systems due to its efficient performance. However, when encountering unstructured environment with time-varying and unknown risks, the traditional DRL methods exposes the disadvantage of poor risk defense ability and fragile system security. The unknown risk will bring nonlinear intrusion to the state space of multi-robot systems in the form of anti attack, which will pose a serious threat to the estimation of robot motion strategy. To solve this problem, this paper proposes a multi-agent reinforcement learning method based on active risk defense mechanism(ARD-MARL). Firstly, based on the locally observable Markov game model, a risk discrimination mechanism with global communication information is established to predict the current behavior state. Secondly, in the strategy deployment stage, we build an event-triggered multi risk processing scheme to implement the matching security strategy for different levels of risk prediction. Then, aiming at the dangerous state with risk intrusion, an active defense Actor-Critic network architecture based on the enhanced attention mechanism is designed. Through magnifying the important information and restraining the threat information, a safer and more efficient motion strategy is generated. Finally, extensive experiments are carried out in multi-agent cooperative and confrontation tasks. The results show that the multi-robot reinforcement learning method with active security defense mechanism can effectively enhance the stability and anti risk ability, and improve the security of information transmissions.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号