Reinforcement learning of dynamic behavior by using recurrent neural networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Reinforcement learning of dynamic behavior by using recurrent neural networks

Authors:	Ahmet Onat Hajime Kita Yoshikazu Nishikawa

Affiliation:	(1) Department of Electrical Engineering, Graduate School of Kyoto University, Yoshida Honmachi, Sakyo, 606-01 Kyoto, Japan;(2) Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan;(3) Faculty of Information Science, Osaka Institute of Technology, Osaka, Japan

Abstract:	Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy. In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states. This work was presented, in part, at the International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1996

Keywords:	Reinforcement learning Hidden state Q-learning Recurrent neural networks
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏