首页 | 本学科首页   官方微博 | 高级检索  
     

基于粒子群优化的德州扑克在线对手利用
引用本文:胡振震,陈少飞,袁唯淋,李鹏,陈璟. 基于粒子群优化的德州扑克在线对手利用[J]. 控制与决策, 2024, 39(5): 1687-1696
作者姓名:胡振震  陈少飞  袁唯淋  李鹏  陈璟
作者单位:国防科技大学 智能科学学院, 长沙 410073
基金项目:国家自然科学基金项目(61806212,62376280).
摘    要:德州扑克中,相比于采用均衡策略求解的方法,对手利用是针对存在弱点的对手以获取更大收益的更有效方法.然而在面对一个全新对手时,在线条件下如何高效利用对手仍然是一大难题.现有方法常采用离线训练在线适应的方式来避开这一问题,即利用学习、演化等方法,通过海量离线训练来获得具有对手适应性的模型,使其能在比赛中适应不同的对手,而不是在比赛中针对一个新对手在线主动地优化自身策略.对此,以在线主动策略优化实现有效对手利用为目的,基于时间维的粒子定义提出一种基于粒子群优化的策略优化方法,将在线策略优化的思路引入德州扑克这种具有强随机性的博弈问题中,开展对手利用并实现在线比赛收益最大化.针对适应度计算受随机运气影响以及部分对手针对性策略难以优化的问题,提出一种基于局部最优解替代、全局最优解替代的改进粒子群优化算法(BR-PSO).实验结果表明,对于标准PSO方法难以针对的对手,所提出的方法能有效获得对手的针对性策略以实现最大化对手利用,而且优化策略的收益能够媲美基于手牌预测AI的收益.

关 键 词:粒子群优化  策略优化  最优解替代  对手利用  在线比赛  德州扑克

Online opponent exploitation method based on particle swarm optimization for Texas Holdém
HU Zhen-zhen,CHEN Shao-fei,YUAN Wei-lin,LI Peng,CHEN Jing. Online opponent exploitation method based on particle swarm optimization for Texas Holdém[J]. Control and Decision, 2024, 39(5): 1687-1696
Authors:HU Zhen-zhen  CHEN Shao-fei  YUAN Wei-lin  LI Peng  CHEN Jing
Affiliation:College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073, China
Abstract:In Texas Holdém, opponent exploitation is the more effective method to obtain larger income from opponents with weakness in contrast to the Nash equilibrium searching method. However, how to effectively exploit the brand new opponent under the condition of online competitions is still a challenge. The existing methods usually use offline training and online adaptation to avoid this problem, that is, using like learning, evolution methods to obtain a model with opponent adaptability through massive offline training, so that it can adapt to different opponents in competitions, instead of actively optimizing its own policy for a new opponent in the online competition. For the purpose of online active policy optimizing to achieve effective opponent exploitation, a policy optimization method based on particle swarm optimization(PSO) is proposed to maximize the competition income, which introduces the idea of online optimization into Texas Holdém regarded as an game problem with strong randomness. Aiming to the problems that fitness computation is affected by random luck and targeted policies for some opponents are hard to optimize with the standard PSO, a modified PSO method called BR-PSO(best replacement-PSO) is proposed based on local optimal solution replacement and global optimal solution replacement. The result of experiments indicates the proposed method can find targeted policies to maximize opponent exploitation of the opponents that are hard to counter with the standard PSO, and the income of the optimized policy is comparable to that of AI based on the hand prediction method.
Keywords:particle swarm optimization;policy optimization;optimal solution replacement;opponent exploitation;online competition;Texas Holdém
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号