首页 | 本学科首页   官方微博 | 高级检索  
     

基于互信息最大化的意图强化学习方法的研究
引用本文:赵婷婷,吴帅,杨梦楠,陈亚瑞,王嫄,杨巨成.基于互信息最大化的意图强化学习方法的研究[J].计算机应用研究,2022,39(11).
作者姓名:赵婷婷  吴帅  杨梦楠  陈亚瑞  王嫄  杨巨成
作者单位:天津科技大学 人工智能学院,天津科技大学 人工智能学院,天津科技大学 人工智能学院,天津科技大学 人工智能学院,天津科技大学 人工智能学院,天津科技大学 人工智能学院
基金项目:国家自然科学基金资助项目(61976156);天津市企业科技特派员项目(20YDTPJC00560)
摘    要:强化学习主要研究智能体如何根据环境作出较好的决策,其核心是学习策略。基于传统策略模型的动作选择主要依赖于状态感知、历史记忆及模型参数等,其智能体行为很难受到控制。然而,当人类智能体完成任务时,通常会根据自身的意愿或动机选择相应的行为。受人类决策机制的启发,为了让强化学习中的行为选择可控,使智能体能够根据意图选择动作,将意图变量加入到策略模型中,提出了一种基于意图控制的强化学习策略学习方法。具体地,通过意图变量与动作的互信息最大化使两者产生高相关性,使得策略能够根据给定意图变量选择相关动作,从而达到对智能体的控制。最终,通过复杂的机器人控制仿真任务Mujoco验证了所提方法能够有效地通过意图变量控制机器人的移动速度和移动角度。

关 键 词:强化学习    互信息    意图控制    近端策略优化算法
收稿时间:2022/3/20 0:00:00
修稿时间:2022/10/23 0:00:00

Intention based reinforcement learning by information maximization
Zhao Tingting,Wu Shuai,Yang Mengnan,Chen Yarui,Wang Yuan and Yang Jucheng.Intention based reinforcement learning by information maximization[J].Application Research of Computers,2022,39(11).
Authors:Zhao Tingting  Wu Shuai  Yang Mengnan  Chen Yarui  Wang Yuan and Yang Jucheng
Affiliation:Tianjin University of science and technology,College of artificial intelligence,,,,,
Abstract:Reinforcement learning studies how an agent makes decisions through the interaction with the unknown environment, its core is to learn the policy. The action selection of traditional policy model mainly depends on state perception, historical memory and model parameters, which are difficult to control. However, when human fulfill a task, they usually make decisions according to their own intention or motivation. Inspired by the human decision-making mechanism, in order to make the behavior selection mechanism controllable and enable the agent to choose the action according to the intention, this paper proposed to incorporate the intention variable to the policy model and obtain an intention motivated reinforcement learning method. More specifically, the proposed method maximized the mutual information between the intention variables and the actions, so that the policy could select the action related to the intention variable. Finally, the effectiveness of the proposed intention-motivated control was demonstrated through the complex Mujoco environment in simulated robot control task.
Keywords:reinforcement learning(RL)  mutual information  intentional control  proximal policy optimization
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号