增强Q学习在非确定马尔可夫系统寻优问题中的应用 The Application of Reinforcement Learning in Nondeterministic MDPs Policy Finding Question期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

增强Q学习在非确定马尔可夫系统寻优问题中的应用

引用本文：	郭锐,彭军,吴敏.增强Q学习在非确定马尔可夫系统寻优问题中的应用[J].计算机工程与应用,2005,41(13):36-38,146.

作者姓名：	郭锐彭军吴敏

作者单位：	中南大学信息科学与工程学院,长沙,410075

基金项目：	国家863高技术研究发展计划项目(编号:2001AA4422200)

摘要：	增强学习属于机器学习的一种,它通过与环境的交互获得策略的改进,其在线学习和自适应学习的特点使其成为解决策略寻优问题有力的工具。多智能体系统是人工智能领域的一个研究热点,对于多智能体学习技术的研究需要建立在系统环境模型的基础之上,由于多个智能体的存在,智能体之间的相互影响使得多智能体系统高度复杂,多智能体系统环境属于非确定马尔可夫模型,因此直接把基于马尔可夫模型的增强学习技术引入多智能体系统是不合适的。论文基于智能体间独立的学习机制,提出了一种改进的多智能体Q学习算法,使其适用于非确定马尔可夫环境,并对该学习技术在多智能体系统RoboCup中的应用进行了研究,实验证明了该学习技术的有效性与泛化能力,最后简要给出了多智能体增强学习研究的方向及进一步的工作。
关键词：	多智能体增强学习非确定马尔可夫系统策略寻优
文章编号：	1002-8331-(2005)13-0036-03
The Application of Reinforcement Learning in Nondeterministic MDPs Policy Finding Question

Guo Rui,Peng Jun,Wu Min.The Application of Reinforcement Learning in Nondeterministic MDPs Policy Finding Question[J].Computer Engineering and Applications,2005,41(13):36-38,146.

Authors:	Guo Rui Peng Jun Wu Min

Abstract:	Reinforcement learning belongs to machine learning,with it an autonomous learning agent can improve its action policy by interacting with environment.Owing to on-line learning ability and self-adapted ability reinforcement learning becomes a powerful tool for optimal policy finding questions.Multi-Agent System(MAS)is an active subfield of AI,for the presence of other agents,it is difficult to find an optimal action policy even for a single agent,obviously the environment of MAS is an nondeterministic Markov Decision Processes(MDPs)one,the study of multi-agent learning is a challenge to current reinforcement learning which based on MDPs.Based on agent's independent learning ability this article firstly proposes a MAS reinforcement Q learning algorithm that match the nondeterministic MDPs environment,then applies this algorithm in RoboCup which is a typical MAS.The result of experiments has proved the algorithm's efficiency.Finally,we have briefly pointed out some directions of multi-agent reinforcement learning and further work.

Keywords:	Multi-Agent Systems reinforcement learning nondeterministic MDPs optimal policy finding
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏