首页 | 本学科首页   官方微博 | 高级检索  
     

增强Q学习在非确定马尔可夫系统寻优问题中的应用
引用本文:郭锐,彭军,吴敏.增强Q学习在非确定马尔可夫系统寻优问题中的应用[J].计算机工程与应用,2005,41(13):36-38,146.
作者姓名:郭锐  彭军  吴敏
作者单位:中南大学信息科学与工程学院,长沙,410075
基金项目:国家863高技术研究发展计划项目(编号:2001AA4422200)
摘    要:增强学习属于机器学习的一种,它通过与环境的交互获得策略的改进,其在线学习和自适应学习的特点使其成为解决策略寻优问题有力的工具。多智能体系统是人工智能领域的一个研究热点,对于多智能体学习技术的研究需要建立在系统环境模型的基础之上,由于多个智能体的存在,智能体之间的相互影响使得多智能体系统高度复杂,多智能体系统环境属于非确定马尔可夫模型,因此直接把基于马尔可夫模型的增强学习技术引入多智能体系统是不合适的。论文基于智能体间独立的学习机制,提出了一种改进的多智能体Q学习算法,使其适用于非确定马尔可夫环境,并对该学习技术在多智能体系统RoboCup中的应用进行了研究,实验证明了该学习技术的有效性与泛化能力,最后简要给出了多智能体增强学习研究的方向及进一步的工作。

关 键 词:多智能体  增强学习  非确定马尔可夫系统  策略寻优
文章编号:1002-8331-(2005)13-0036-03

The Application of Reinforcement Learning in Nondeterministic MDPs Policy Finding Question
Guo Rui,Peng Jun,Wu Min.The Application of Reinforcement Learning in Nondeterministic MDPs Policy Finding Question[J].Computer Engineering and Applications,2005,41(13):36-38,146.
Authors:Guo Rui  Peng Jun  Wu Min
Abstract:Reinforcement learning belongs to machine learning,with it an autonomous learning agent can improve its action policy by interacting with environment.Owing to on-line learning ability and self-adapted ability reinforcement learning becomes a powerful tool for optimal policy finding questions.Multi-Agent System(MAS)is an active subfield of AI,for the presence of other agents,it is difficult to find an optimal action policy even for a single agent,obviously the environment of MAS is an nondeterministic Markov Decision Processes(MDPs)one,the study of multi-agent learning is a challenge to current reinforcement learning which based on MDPs.Based on agent's independent learning ability this article firstly proposes a MAS reinforcement Q learning algorithm that match the nondeterministic MDPs environment,then applies this algorithm in RoboCup which is a typical MAS.The result of experiments has proved the algorithm's efficiency.Finally,we have briefly pointed out some directions of multi-agent reinforcement learning and further work.
Keywords:Multi-Agent Systems  reinforcement learning  nondeterministic MDPs  optimal policy finding
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号