首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度/单目融合视觉及强化学习的机器人定位棋局与行棋策略
引用本文:吴启宇,谢非,黄磊,刘宗熙,赵静,刘锡祥. 基于深度/单目融合视觉及强化学习的机器人定位棋局与行棋策略[J]. 控制与决策, 2022, 37(12): 3278-3288
作者姓名:吴启宇  谢非  黄磊  刘宗熙  赵静  刘锡祥
作者单位:南京师范大学 电气与自动化工程学院,南京 210023;南京林业大学 机械电子工程学院,南京 210037;南京邮电大学 自动化学院、人工智能学院,南京 210003;东南大学 仪器工程与科学学院,南京 210018
基金项目:国家自然科学基金项目(41974033);江苏省科技成果转化项目(BA2020004);江苏省省级工业和信息产业转型升级专项资金项目(JITC-2000AX0676-71);南京市优势产业关键技术突破招标项目(2018003).
摘    要:中国象棋对弈机器人系统实现的关键包括棋局识别定位和自主行棋策略.:首先,针对棋局识别与定位问题,提出一种基于单目相机与深度相机视觉融合的棋局识别定位方法.:该方法利用立体棋子三维特征获取棋子位置,与二维图像识别结果融合计算定位,以提高棋子的识别定位精度.:其次,针对行棋策略问题,提出一种基于深度神经网络与蒙特卡洛树搜索的决策方法.:该方法利用具有终局特征判断的蒙特卡洛树进行搜索,使用优化的随机行棋策略指导模拟行棋,训练具有多尺度及残差结构的策略价值网络模型.:最后,通过自对弈获取训练数据,通过智能体对抗验证、更新模型参数.:实验表明,相较于单目视觉识别,所提出方法具有更高的精确度和稳定性,识别率达到97%;相较于基准剪枝搜索算法,所提出方法对弈时最多赢得82%的对局,且所需运算时间缩短41%.

关 键 词:中国象棋  行棋策略  目标检测  深度图像  蒙特卡洛树搜索  强化学习

Chess positioning and playing strategy of robot based on integrated depth/mono vision and reinforcement learning
WU Qi-yu,XIE Fei,HUANG Lei,LIU Zong-xi,ZHAO Jing,LIU Xi-xiang. Chess positioning and playing strategy of robot based on integrated depth/mono vision and reinforcement learning[J]. Control and Decision, 2022, 37(12): 3278-3288
Authors:WU Qi-yu  XIE Fei  HUANG Lei  LIU Zong-xi  ZHAO Jing  LIU Xi-xiang
Affiliation:School of Electrical and Automation Engineering,Nanjing Normal University,Nanjing 210023,China;School of Mechanical and Electronic Engineering, Nanjing Forestry University,Nanjing 210037,China;College of Automation & College of Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210003,China; College of Instrument Science and Engineering,Southeast University,Nanjing 210018,China
Abstract:The key to the realization of the Chinese chess system lies in the board recognition and chess strategy. Firstly, for the problem of chessboard recognition, a method based on the fusion of mono vision and depth vision is proposed. This method designs a chess piece grid recognition network, uses the three-dimensional characteristics of chess pieces to convert the depth image into a chessboard grid, and integrates the chess piece coordinates with the chessboard grid to effectively improve the recognition accuracy of the chessboard. Secondly, aiming at the problem of the chess strategy, a method based on the deep neural network and Monte-Carlo tree search is proposed. This method uses the improved random search strategy with end-game feature judgment to guide the simulation of chess, which trains a policy and value network with residual structure. Finally, the training data is obtained through self-playing, and the parameters are updated and verified through the agent confrontation. Experiments show that compared with mono-only visual recognition, this method has higher accuracy and stability, and the recognition rate reaches 97%. Compared with the pruning search algorithm baseline, this method wins 82% of the games, and the computing time is reduced by 41%.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号