首页 | 本学科首页   官方微博 | 高级检索  
     

基于Expectimax搜索与Double DQN的非完备信息博弈算法
引用本文:雷捷维,王嘉旸,任航,闫天伟,黄伟.基于Expectimax搜索与Double DQN的非完备信息博弈算法[J].计算机工程,2021,47(3):304-310,320.
作者姓名:雷捷维  王嘉旸  任航  闫天伟  黄伟
作者单位:1. 南昌大学 信息工程学院, 南昌 330031;2. 江西农业大学 软件学院, 南昌 330000
基金项目:江西省自然科学基金;国家自然科学基金
摘    要:麻将作为典型的非完备信息博弈游戏主要通过传统Expectimax搜索算法实现,其剪枝策略与估值函数基于人工先验知识设计,存在假设不合理等问题。提出一种结合Expectimax搜索与Double DQN强化学习算法的非完备信息博弈算法。在Expectimax搜索树扩展过程中,采用Double DQN输出的估值设计估值函数并在限定搜索层数内获得分支估值,同时设计剪枝策略对打牌动作进行排序与部分扩展实现搜索树剪枝。在Double DQN模型训练过程中,将麻将信息编码为特征数据输入神经网络获得估值,使用Expectimax搜索算法得到最优动作以改进探索策略。实验结果表明,与Expectimax搜索算法、Double DQN算法等监督学习算法相比,该算法在麻将游戏上胜率与得分更高,具有更优异的博弈性能。

关 键 词:Double  DQN算法  Expectimax搜索  非完备信息博弈  麻将  强化学习  
收稿时间:2020-02-01
修稿时间:2020-03-04

Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN
LEI Jiewei,WANG Jiayang,REN Hang,YAN Tianwei,HUANG Wei.Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN[J].Computer Engineering,2021,47(3):304-310,320.
Authors:LEI Jiewei  WANG Jiayang  REN Hang  YAN Tianwei  HUANG Wei
Affiliation:1. School of Information Engineering, Nanchang University, Nanchang 330031, China;2. School of Software Engineering, Jiangxi Agricultural University, Nanchang 330000, China
Abstract:As a typical incomplete information game,mahjong is mainly realized by the traditional Expectimax search algorithm,whose pruning strategy and valuation function design based on artificial prior knowledge and thus cause unreasonable assumptions and other problems.This paper proposes an incomplete information game algorithm combining Expectimax search and Double DQN reinforcement learning algorithm.In the process of expanding the Expectimax search tree,the Double DQN output is used to design the estimation function to obtain the branch estimation within the limited number of search layers,and the pruning strategy is designed to sort and expand the card playing actions to realize the pruning of the search tree.In the training process of the Double DQN model,the mahjong information is encoded as feature data to input to neural network to obtain the estimation,and the Expectimax search algorithm is used to obtain the optimal action to improve the exploration strategy.Experimental results show that compared with Expectimax search algorithm,Double DQN algorithm and other supervised learning algorithms,the proposed algorithm has better game performance with a higher winning rate and score in mahjong gam.
Keywords:Double DQN algorithm  Expectimax search  incomplete information game  mahjong  reinforcement learning
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号