Reinforcement learning of dynamic behavior by using recurrent neural networks |
| |
Authors: | Ahmet Onat Hajime Kita Yoshikazu Nishikawa |
| |
Affiliation: | (1) Department of Electrical Engineering, Graduate School of Kyoto University, Yoshida Honmachi, Sakyo, 606-01 Kyoto, Japan;(2) Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan;(3) Faculty of Information Science, Osaka Institute of Technology, Osaka, Japan |
| |
Abstract: | Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy. In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states. This work was presented, in part, at the International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1996 |
| |
Keywords: | Reinforcement learning Hidden state Q-learning Recurrent neural networks |
本文献已被 SpringerLink 等数据库收录! |
|