基于蒙特卡罗学习的多机器人自组织协作 Self-organizing coordination of multi-robot based on Monte Carlo learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于蒙特卡罗学习的多机器人自组织协作

引用本文：	周彤,洪炳镕,朴松昊,周洪玉.基于蒙特卡罗学习的多机器人自组织协作[J].计算机工程与应用,2007,43(30):23-25.

作者姓名：	周彤洪炳镕朴松昊周洪玉

作者单位：	[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨理工大学机械动力学院,哈尔滨150080

基金项目：	国家自然科学基金 , 国家高技术研究发展计划(863计划)

摘要：	强化学习是提高机器人完成任务效率的有效方法,目前比较流行的学习方法一般采用累积折扣回报方法,但平均值回报在某些方面更适于多机器人协作。累积折扣回报方法在机器人动作层次上可以提高性能,但在多机器人任务层次上却不会得到很好的协作效果,而采用平均回报值的方法,就可以改变这种状态。本文把基于平均值回报的蒙特卡罗学习应用于多机器人合作中,得到很好的学习效果,实际机器人实验结果表明,采用平均值回报的方法优于累积折扣回报方法。
关键词：	强化学习多机器人协作蒙特卡罗学习 Q学习
文章编号：	1002-8331（2007）30-0023-03
修稿时间：	2007-07
Self-organizing coordination of multi-robot based on Monte Carlo learning

ZHOU Tong,HONG Bing-rong,PIAO Song-hao,ZHOU Hong-yu.Self-organizing coordination of multi-robot based on Monte Carlo learning[J].Computer Engineering and Applications,2007,43(30):23-25.

Authors:	ZHOU Tong HONG Bing-rong PIAO Song-hao ZHOU Hong-yu

Affiliation:	1.School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China; 2.School of Mechanical and Power Engineering,Harbin University of Science and Technology,Harbin 150080,China

Abstract:	Reinforcement learning is an effective way for accomplishing task in multi-robot system.While much of the work has focused On optimizing discounted cumtilative reward,optimizing average reward is sometimes a more suitable criterion for multi-robot coordination.Learning algorithms based on discounted rewards,such as Q learning,can attain a well result at the action-level,but it cannot perform well at the task-level.However,learning methods based on average reward,such as the Monte Carlo algorithm,are capable of achieving the optimal result through cooperation at the task-level.Real robot experiment shows that the algorithm adopting the average reward is superior to the one adopting the discounted cumulative reward.

Keywords:	reinforcement learning multi-robot coordination Monte Carlo learning Q learning
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏