一种基于智能调度的可扩展并行强化学习方法 A Scalable Parallel Reinforcement Learning Method Based on Intelligent Scheduling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于智能调度的可扩展并行强化学习方法

引用本文：	刘全, 傅启明, 杨旭东, 荆玲, 李瑾, 李娇. 一种基于智能调度的可扩展并行强化学习方法[J]. 计算机研究与发展, 2013, 50(4): 843-851.

作者姓名：	刘全傅启明杨旭东荆玲李瑾李娇

作者单位：	1(苏州大学计算机科学与技术学院江苏苏州 215006) 2(南京大学计算机科学与技术系南京 210093) 3(符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012) (quanliu@suda.edu.cn)

基金项目：	国家自然科学基金项目，江苏省自然科学基金项目，江苏省高校自然科学基金项目，吉林大学符号计算与知识工程教育部重点实验室基金项目

摘要：	针对强化学习在大状态空间或连续状态空间中存在的“维数灾”问题，提出一种基于智能调度的可扩展并行强化学习方法——IS-SRL，并从理论上进行分析，证明其收敛性.该方法采用分而治之策略对大状态空间进行分块，使得每个分块能够调入内存独立学习.在每个分块学习了一个周期之后交换到外存上，调入下一个分块继续学习.分块之间在换入换出的过程中交换信息，以使整个学习任务收敛到最优解.同时针对各分块之间的学习顺序会显著影响学习效率的问题，提出了一种新颖的智能调度算法，该算法利用强化学习值函数更新顺序的分布特点，基于多种调度策略加权优先级的思想，把学习集中在能产生最大效益的子问题空间，保障了IS-SRL方法的学习效率.在上述调度算法中融入并行调度框架，利用多Agent同时学习，得到了IS-SRL方法的并行版本——IS-SPRL方法.实验结果表明，IS-SPRL方法具有较快的收敛速度和较好的扩展性能.
关键词：	强化学习分而治之并行计算可扩展性智能调度
A Scalable Parallel Reinforcement Learning Method Based on Intelligent Scheduling

Liu Quan, Fu Qiming, Yang Xudong, Jing Ling, Li Jin, Li Jiao. A Scalable Parallel Reinforcement Learning Method Based on Intelligent Scheduling[J]. Journal of Computer Research and Development, 2013, 50(4): 843-851.

Authors:	Liu Quan Fu Qiming Yang Xudong Jing Ling Li Jin Li Jiao

Affiliation:	1(Institute of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006) 2(Department of Computer Science and Technology, Nanjing University, Nanjing 210093) 3(Key Laboratory of Symbolic Computation and Knowledge Engineering(Jilin University),Ministry of Education, Changchun 130012)

Abstract:	Aiming at the “curse of dimensionality” problem of reinforcement learning in large state space or continuous state space, a scalable reinforcement learning method, IS-SRL method, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem with large state space or continuous state space is divided into smaller subproblems so that each subproblem can be learned independently in memory. After a cycle of learning, next subproblem will be swapped in to continue the learning process. Information exchanges between the subproblems during the process of swap so that the learning process will converge to optima eventually. The order of subproblems’ executing significantly affects the efficiency of learning. Therefore, we propose an efficient scheduling algorithm which takes advantage of the distribution of value function’s backup in reinforcement learning and the idea of weighting the priorities of multiple scheduling strategies. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a parallel scheduling architecture, which can flexibly allocate learning tasks between learning agents, is proposed. A new method, IS-SPRL, is obtained after we blended the proposed architecture into the IS-SRL method. The experimental results show that learning based on this scheduling architecture has faster convergence speed and good scalability.

Keywords:	reinforcement learning divide-and-conquer strategy parallel computing scalability intelligent scheduling
本文献已被万方数据等数据库收录！
	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏