基于Expectimax搜索与Double DQN的非完备信息博弈算法 Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Expectimax搜索与Double DQN的非完备信息博弈算法

引用本文：	雷捷维,王嘉旸,任航,闫天伟,黄伟.基于Expectimax搜索与Double DQN的非完备信息博弈算法[J].计算机工程,2021,47(3):304-310,320.

作者姓名：	雷捷维王嘉旸任航闫天伟黄伟

作者单位：	1. 南昌大学信息工程学院, 南昌 330031;2. 江西农业大学软件学院, 南昌 330000

基金项目：	江西省自然科学基金;国家自然科学基金

摘要：	麻将作为典型的非完备信息博弈游戏主要通过传统Expectimax搜索算法实现，其剪枝策略与估值函数基于人工先验知识设计，存在假设不合理等问题。提出一种结合Expectimax搜索与Double DQN强化学习算法的非完备信息博弈算法。在Expectimax搜索树扩展过程中，采用Double DQN输出的估值设计估值函数并在限定搜索层数内获得分支估值，同时设计剪枝策略对打牌动作进行排序与部分扩展实现搜索树剪枝。在Double DQN模型训练过程中，将麻将信息编码为特征数据输入神经网络获得估值，使用Expectimax搜索算法得到最优动作以改进探索策略。实验结果表明，与Expectimax搜索算法、Double DQN算法等监督学习算法相比，该算法在麻将游戏上胜率与得分更高，具有更优异的博弈性能。
关键词：	Double DQN算法 Expectimax搜索非完备信息博弈麻将强化学习
收稿时间：	2020-02-01
修稿时间：	2020-03-04
Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN

LEI Jiewei,WANG Jiayang,REN Hang,YAN Tianwei,HUANG Wei.Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN[J].Computer Engineering,2021,47(3):304-310,320.

Authors:	LEI Jiewei WANG Jiayang REN Hang YAN Tianwei HUANG Wei

Affiliation:	1. School of Information Engineering, Nanchang University, Nanchang 330031, China;2. School of Software Engineering, Jiangxi Agricultural University, Nanchang 330000, China

Abstract:	As a typical incomplete information game,mahjong is mainly realized by the traditional Expectimax search algorithm,whose pruning strategy and valuation function design based on artificial prior knowledge and thus cause unreasonable assumptions and other problems.This paper proposes an incomplete information game algorithm combining Expectimax search and Double DQN reinforcement learning algorithm.In the process of expanding the Expectimax search tree,the Double DQN output is used to design the estimation function to obtain the branch estimation within the limited number of search layers,and the pruning strategy is designed to sort and expand the card playing actions to realize the pruning of the search tree.In the training process of the Double DQN model,the mahjong information is encoded as feature data to input to neural network to obtain the estimation,and the Expectimax search algorithm is used to obtain the optimal action to improve the exploration strategy.Experimental results show that compared with Expectimax search algorithm,Double DQN algorithm and other supervised learning algorithms,the proposed algorithm has better game performance with a higher winning rate and score in mahjong gam.

Keywords:	Double DQN algorithm Expectimax search incomplete information game mahjong reinforcement learning
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏