首页 | 本学科首页   官方微博 | 高级检索  
     

基于并行强化学习的云机器人任务调度策略
引用本文:沙宗轩,薛菲,朱杰. 基于并行强化学习的云机器人任务调度策略[J]. 计算机应用, 2019, 39(2): 501-508. DOI: 10.11772/j.issn.1001-9081.2018061406
作者姓名:沙宗轩  薛菲  朱杰
作者单位:北京物资学院信息学院,北京,101149;北京物资学院信息学院,北京,101149;北京物资学院信息学院,北京,101149
基金项目:国家自然科学基金资助项目(71371033);北京市教委科技计划面上项目(KM201810037002);北京市智能物流系统协同创新中心资助项目(0351701301)。
摘    要:为了解决机器人完成大规模状态空间强化学习任务时收敛慢的问题,提出一种基于优先级的并行强化学习任务调度策略。首先,证明Q学习在异步并行计算模式下的收敛性;然后,将复杂问题根据状态空间进行分割,调度中心根据所提策略将子问题和计算节点匹配,各计算节点完成子问题的强化学习任务并向调度中心反馈结果,实现在计算机集群中的并行强化学习;最后,以CloudSim为软件基础搭建实验环境,求解最优步长、折扣率和子问题规模等参数,并通过对实际问题求解证明在不同计算节点数的情况下所提策略的性能。在使用64个计算节点的情况下所提策略相比轮询调度和随机调度的效率分别提升了61%和86%。实验结果表明,该策略在并行计算情况下有效提高了收敛速度,并进一步验证了该策略得到百万级状态空间控制问题的最优策略需要约1. 6×10~5s。

关 键 词:云机器人  强化学习  Q学习  并行计算  任务调度  CloudSim
收稿时间:2018-07-06
修稿时间:2018-08-16

Scheduling strategy of cloud robots based on parallel reinforcement learning
SHA Zongxuan,XUE Fei,ZHU Jie. Scheduling strategy of cloud robots based on parallel reinforcement learning[J]. Journal of Computer Applications, 2019, 39(2): 501-508. DOI: 10.11772/j.issn.1001-9081.2018061406
Authors:SHA Zongxuan  XUE Fei  ZHU Jie
Affiliation:School of Information, Beijing Wuzi University, Beijing 101149, China
Abstract:In order to solve the problem of slow convergence speed of reinforcement learning tasks with large state space, a priority-based parallel reinforcement learning task scheduling strategy was proposed. Firstly, the convergence of Q-learning in asynchronous parallel computing mode was proved. Secondly, complex problems were divided according to state spaces, then sub-problems and computing nodes were matched at the scheduling center, and each computing node completed the reinforcement learning tasks of sub-problems and gave feedback to the center to realize parallel reinforcement learning in the computer cluster. Finally, the experimental environment was built based on CloudSim, the parameters such as optimal step length, discount rate and sub-problem size were solved and the performance of the proposed strategy with different computing nodes was proved by solving practical problems. With 64 computing nodes, compared with round-robin scheduling and random scheduling, the efficiency of the proposed strategy was improved by 61% and 86% respectively. Experimental results show that the proposed scheduling strategy can effectively speed up the convergence under parallel computing, and it takes about 1.6×105 s to get the optimal strategy for the control probelm with 1 million state space.
Keywords:cloud robot  reinforcement learning  Q-Learning  parallel computing  task scheduling  CloudSim  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号