ADAPTIVE MODEL LEARNING BASED ON DYNA-Q LEARNING期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

ADAPTIVE MODEL LEARNING BASED ON DYNA-Q LEARNING

Authors:	Kao-Shing Hwang Wei-Cheng Jiang Yu-Jen Chen

Affiliation:	1. Department of Electrical Engineering , National Sun Yat-sen University , Kaohsiung , Taiwan , R.O.C. hwang@ccu.edu.tw;3. Department of Electrical Engineering , National Sun Yat-sen University , Kaohsiung , Taiwan , R.O.C.

Abstract:	Dyna-Q, a well-known model-based reinforcement learning (RL) method, interplays offline simulations and action executions to update Q functions. It creates a world model that predicts the feature values in the next state and the reward function of the domain directly from the data and uses the model to train Q functions to accelerate policy learning. In general, tabular methods are always used in Dyna-Q to establish the model, but a tabular model needs many more samples of experience to approximate the environment concisely. In this article, an adaptive model learning method based on tree structures is presented to enhance sampling efficiency in modeling the world model. The proposed method is to produce simulated experiences for indirect learning. Thus, the proposed agent has additional experience for updating the policy. The agent works backwards from collections of state transition and associated rewards, utilizing coarse coding to learn their definitions for the region of state space that tracks back to the precedent states. The proposed method estimates the reward and transition probabilities between states from past experience. Because the resultant tree is always concise and small, the agent can use value iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. The effectiveness and generality of our method is further demonstrated in two numerical simulations. Two simulations, a mountain car and a mobile robot in a maze, are used to verify the proposed methods. The simulation result demonstrates that the training rate of our method can improve obviously.

Keywords:	decision tree Dyna-Q agent model learning reinforcement learning

设为首页 | 免责声明 | 关于勤云 | 加入收藏