首页 | 本学科首页   官方微博 | 高级检索  
     


Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model
Authors:Akira Kinose  Tadahiro Taniguchi
Affiliation:1. Department of Human and Computer Intelligence, Ritsumeikan University, Kusatsu, Japan akira.kinose@em.ci.ritsumei.ac.jp;3. Department of Human and Computer Intelligence, Ritsumeikan University, Kusatsu, Japan
Abstract:The integration of reinforcement learning (RL) and imitation learning (IL) is an important problem that has long been studied in the field of intelligent robotics. RL optimizes policies to maximize the cumulative reward, whereas IL attempts to extract general knowledge about the trajectories demonstrated by experts, i.e, demonstrators. Because each has its own drawbacks, many methods combining them and compensating for each set of drawbacks have been explored thus far. However, many of these methods are heuristic and do not have a solid theoretical basis. This paper presents a new theory for integrating RL and IL by extending the probabilistic graphical model (PGM) framework for RL, control as inference. We develop a new PGM for RL with multiple types of rewards, called probabilistic graphical model for Markov decision processes with multiple optimality emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning method of RL and IL can be formulated as a probabilistic inference of policies on pMDP-MO by considering the discriminator in generative adversarial imitation learning (GAIL) as an additional optimality emission. We adapt the GAIL and task-achievement reward to our proposed framework, achieving significantly better performance than policies trained with baseline methods.
Keywords:Imitation learning  reinforcement learning  probabilistic inference  control as inference  generative adversarial imitation learning
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号