首页 | 本学科首页   官方微博 | 高级检索  
     

多机器人动态编队的强化学习算法研究
引用本文:王醒策,张汝波,顾国昌.多机器人动态编队的强化学习算法研究[J].计算机研究与发展,2003,40(10):1444-1450.
作者姓名:王醒策  张汝波  顾国昌
作者单位:1. 哈尔滨工程大学计算机科学与技术学院,哈尔滨,150001
2. 哈尔滨工程大学计算机科学与技术学院,哈尔滨,150001;中国科学院沈阳自动化研究所机器人学重点实验室,沈阳,110015
基金项目:中国科学院沈阳自动化研究所机器人学重点实验室基金(RL2 0 0 10 6),国防基础研究项目基金
摘    要:在人工智能领域中,强化学习理论由于其自学习性和自适应性的优点而得到了广泛关注.随着分布式人工智能中多智能体理论的不断发展,分布式强化学习算法逐渐成为研究的重点.首先介绍了强化学习的研究状况,然后以多机器人动态编队为研究模型,阐述应用分布式强化学习实现多机器人行为控制的方法.应用SOM神经网络对状态空间进行自主划分,以加快学习速度;应用BP神经网络实现强化学习,以增强系统的泛化能力;并且采用内、外两个强化信号兼顾机器人的个体利益及整体利益.为了明确控制任务,系统使用黑板通信方式进行分层控制.最后由仿真实验证明该方法的有效性.

关 键 词:多机器人  编队  强化学习  行为控制

Research on Dynamic Team Formation of Multi-Robots Reinforcement Learning
WANG Xing Ce ,ZHANG Ru Bo ,and GU Guo Chang.Research on Dynamic Team Formation of Multi-Robots Reinforcement Learning[J].Journal of Computer Research and Development,2003,40(10):1444-1450.
Authors:WANG Xing Ce  ZHANG Ru Bo    and GU Guo Chang
Affiliation:WANG Xing Ce 1,ZHANG Ru Bo 1,2,and GU Guo Chang 1 1
Abstract:In the field of artificial intelligence, the reinforcement learning theory is receiving more and more attention with the advantage of its self learning and self adaptability With the development of the multi agent theory in distributed artificial intelligence, the distributed reinforcement learning is becoming the focus of this research In this paper, the research status of the reinforcement learning algorithm is illustrated first Then the multi robots' dynamic team formation is used as the study model to illuminate the hierarchical behavior control of the robots system with the usage of the reinforcement learning In the algorithm explained here, the SOM neural network is used to partition the state space automatically to speed up the learning rate The BP neural network is adopted to realize the reinforcement learning to strengthen the generalization ability The inside reinforcement signal and outside reinforcement signal are employed to represent the interest of the individual robot and the group robots respectively In order to define the task, the multi layer control and the blackboard communication are used in the system Finally, the simulation results are provided to show the validity of the algorithm
Keywords:multi  robots  team formation  reinforcement learning  behavior control
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号