首页 | 本学科首页   官方微博 | 高级检索  
     

基于状态聚类的多站点CSPS系统的协同控制方法
引用本文:唐昊, 裴荣, 周雷, 谭琦. 基于状态聚类的多站点CSPS系统的协同控制方法. 自动化学报, 2014, 40(5): 901-908. doi: 10.3724/SP.J.1004.2014.00901
作者姓名:唐昊  裴荣  周雷  谭琦
作者单位:1.合肥工业大学电气与与自动化工程学院 合肥 230009;;;2.合肥工业大学计算机与信息学院 合肥 230009
基金项目:国家自然科学基金(61174186,71231004),国家国际科技合作项目(2011FA10440),教育部新世纪优秀人才计划项目(NCET-11-0626),高等学校博士学科点专项科研基金(20130111110007)资助
摘    要:单站点传送带给料加工站(Conveyor-serviced production station,CSPS)系统中,可运用强化学习对状态——行动空间进行有效探索,以搜索近似最优的前视距离控制策略.但是多站点CSPS系统的协同控制问题中,系统状态空间的大小会随着站点个数的增加和缓存库容量的增加而成指数形式(或几何级数)增长,从而导致维数灾,影响学习算法的收敛速度和优化效果.为此,本文在站点局域信息交互机制的基础上引入状态聚类的方法,以减小每个站点学习空间的大小和复杂性.首先,将多个站点看作相对独立的学习主体,且各自仅考虑邻近下游站点的缓存库的状态并纳入其性能值学习过程;其次,将原状态空间划分成多个不相交的子集,每个子集用一个抽象状态表示,然后,建立基于状态聚类的多站点反馈式Q学习算法.通过该方法,可在抽象状态空间上对各站点的前视距离策略进行优化学习,以寻求整个系统的生产率最大.仿真实验结果说明,与一般的多站点反馈式Q学习方法相比,基于状态聚类的多站点反馈式Q学习方法不仅具有收敛速度快的优点,而且还在一定程度上提高了系统生产率.

关 键 词:多站点CSPS系统   局域信息交互   状态聚类   反馈式Q学习
收稿时间:2013-01-28
修稿时间:2013-05-11

Coordinate Control of Multiple CSPS System Based on State Aggregation Method
TANG Hao, PEI Rong, ZHOU Lei, TAN Qi. Coordinate Control of Multiple CSPS System Based on State Aggregation Method. ACTA AUTOMATICA SINICA, 2014, 40(5): 901-908. doi: 10.3724/SP.J.1004.2014.00901
Authors:TANG Hao  PEI Rong  ZHOU Lei  TAN Qi
Affiliation:1. School of Electrical Engineering and Automation, Hefei University of Technology, Hefei 230009;;;2. School of Computer and Information, Hefei University of Technology, Hefei 230009
Abstract:In a single conveyor-serviced production station (CSPS) system, we can learn an approximate optimal look-ahead policy by reinforcement learning (RL) through exploring the state-action space. However, for the coordinate control problem in a multiple CSPS system, the state space will grow exponentially or geometrically as the number of stations and the capacity of buffer increase. As a result, the learning process will suffer from the curse of dimensionality, which may have a negative influence on convergence speed and optimized value. Therefore, by combining a local information interaction mechanism among stations, we introduce a state aggregation method to reduce the size and complexity of each station's leaning space. Firstly, each station is regarded as an independent learning agent that incorporates only the buffer state of its nearest downstream station into its own learning process. Secondly, the original state space is divided into several disjoint sets and each set is represented by an abstract state, and a multiple-agent state aggregation feedback Q-learning (SAFQL) algorithm is proposed afterwards. Through our proposed approach, the agent can learn an optimized look-ahead policy over the abstract state space to improve the entire system's processing rate. Finally, we demonstrate by a numerical example that, in comparison to general feedback Q-learning algorithm, SAFQL algorithm can not only fasten the convergence speed, but also improve the processing rate in some degree.
Keywords:Multiple conveyor-serviced production station (CSPS)  local information interaction  state aggregation  feedback Q-learning (SAFQL)
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号