首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进Q-学习算法的多阶段群体决策模型
引用本文:张峰,刘凌云,郭欣欣.基于改进Q-学习算法的多阶段群体决策模型[J].控制与决策,2019,34(9):1917-1922.
作者姓名:张峰  刘凌云  郭欣欣
作者单位:河北大学数学与信息科学学院,河北保定,071002;河北省机器学习与计算智能重点实验室,河北保定,071002
基金项目:国家自然科学基金项目(61672205);河北省自然科学面上基金项目(F2017201020,F2018201115);河北省教育厅青年基金项目(QN2015026,QN2017019).
摘    要:多阶段群体决策问题是一类典型的动态群体决策问题,主要针对离散的确定状态下的最优群体决策问题求解.但由于现实环境面临的大部分是不确定状态空间,甚至是未知环境空间(例如状态转移概率矩阵完全未知),为了寻求具有较高共识度的多阶段群体最优策略,决策者需要通过对环境的动态交互来获得进一步的信息.针对该问题,利用强化学习技术,提出一种求解多阶段群体决策的最优决策算法,以解决在不确定状态空间下的多阶段群体决策问题.结合强化学习中的Q-学习算法,建立多阶段群体决策Q-学习基本算法模型,并改进该算法的迭代过程,从中学习得到群体最优策略.同时证明基于Q-学习得到的多阶段群体最优策略也是群体共识度最高的策略.最后,通过一个计算实例说明算法的合理性及可行性.

关 键 词:群体决策  多阶段群体决策  强化学习  Q-学习  群体共识  不确定性

A multi-stage group decision model based on improved Q-learning
ZHANG Feng,LIU Ling-yun and GUO Xin-xin.A multi-stage group decision model based on improved Q-learning[J].Control and Decision,2019,34(9):1917-1922.
Authors:ZHANG Feng  LIU Ling-yun and GUO Xin-xin
Affiliation:College of Mathematics and Information Science, Hebei University,Baoding071002,China,College of Mathematics and Information Science, Hebei University,Baoding071002,China and College of Mathematics and Information Science, Hebei University,Baoding071002,China
Abstract:The multi-stage group decision making problem is a typical sequential group decision making problem. It is normally utilized to find the optimal solution to the group decision problems in discrete deterministic environment. However, the real life environments faced by decision-makers are usually full of uncertainty, even unknown environments (with unknown state transition matrix). Therefore, it is essential for the decision-makers to obtain more information by interacting with the environment dynamically to achieve an optimal decision strategy with high consensus degree. Due to the advantage of reinforcement learning in handling the sequential decision-making problems, the classical reinforcement learning algorithm (Q-learning) is improved to discover the optimal solution of multi-stage group decision making problems under uncertain environment. Additionally, a theorem is proposed to show that the optimal group decision obtained by using the improved Q-learning algorithm is the group decision with the highest degree of group consensus. Finally, an illustrative example is presented to verify the rationality and feasibility of the proposed algorithm.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号