U-Clustering:基于效用聚类的激励学习算法 Uclustering: A Reinforcement Learning Algorithm Based on Utility Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

U-Clustering:基于效用聚类的激励学习算法

引用本文：	陈焕文,殷苌茗,谢丽娟.U-Clustering:基于效用聚类的激励学习算法[J].计算机工程与应用,2005,41(26):37-42,74.

作者姓名：	陈焕文殷苌茗谢丽娟

作者单位：	1. 长沙理工大学计算机与通信工程学院,长沙,410076;湖南信息职业技术学院,长沙,410200 2. 长沙理工大学计算机与通信工程学院,长沙,410076

基金项目：	国家自然科学基金（编号：60075019）资助

摘要：	提出了一个新的效用聚类激励学习算法U-Clustering。该算法完全不用像U-Tree算法那样进行边缘节点的生成和测试,它首先根据实例链的观测动作值对实例进行聚类,然后对每个聚类进行特征选择,最后再进行特征压缩,经过压缩后的新特征就成为新的状态空间树节点。通过对NewYorkDriving2,13]的仿真和算法的实验分析,表明U-Clustering算法对解决大型部分可观测环境问题是比较有效的算法。
关键词：	激励学习效用聚类部分可观测 Markov 决策过程
文章编号：	1002-8330-（2005）26-0037-06
收稿时间：	2005-03
修稿时间：	2005-03
Uclustering: A Reinforcement Learning Algorithm Based on Utility Clustering

Chen Huanwen,Yin Changming,Xie Lijuan.Uclustering: A Reinforcement Learning Algorithm Based on Utility Clustering[J].Computer Engineering and Applications,2005,41(26):37-42,74.

Authors:	Chen Huanwen Yin Changming Xie Lijuan

Abstract:	That presented in this paper is a new utility clustering based reinforcement learning algorithm called U-Clustering.Unlike the U-Tree,it does not use fringe and related statistical test at all.The U-Clustering algorithm groups the instances that have matching history up to a certain length into a cluster based on the observation-action value of them,and makes the feature selecting and feature compressing for each cluster.The new features become new nodes in the agent's internal state space tree.Experimental results in a difficult partially observable driving task called New York Driving show that the U-Clustering algorithm is an effective one for solving the large partially observable problems.

Keywords:	reinforcement learning utility clustering partially observable Markov decision processes(POMDPs)
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏