首页 | 本学科首页   官方微博 | 高级检索  
     


Reinforcement learning of dynamic behavior by using recurrent neural networks
Authors:Ahmet Onat  Hajime Kita  Yoshikazu Nishikawa
Affiliation:(1) Department of Electrical Engineering, Graduate School of Kyoto University, Yoshida Honmachi, Sakyo, 606-01 Kyoto, Japan;(2) Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan;(3) Faculty of Information Science, Osaka Institute of Technology, Osaka, Japan
Abstract:Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy. In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states. This work was presented, in part, at the International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1996
Keywords:Reinforcement learning  Hidden state  Q-learning  Recurrent neural networks
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号