虚拟乒乓球手的强化学习训练方法 Training a Virtual Tabletennis Player Based on Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

虚拟乒乓球手的强化学习训练方法

引用本文：	黎子聪,曾繁忠,吴自辉,聂勇伟,冼楚华,李桂清.虚拟乒乓球手的强化学习训练方法[J].计算机辅助设计与图形学学报,2020,32(6):997-1008.

作者姓名：	黎子聪曾繁忠吴自辉聂勇伟冼楚华李桂清

作者单位：	华南理工大学计算机科学与工程学院广州 510006;华南理工大学计算机科学与工程学院广州 510006;华南理工大学计算机科学与工程学院广州 510006;华南理工大学计算机科学与工程学院广州 510006;华南理工大学计算机科学与工程学院广州 510006;华南理工大学计算机科学与工程学院广州 510006

基金项目：	国家自然科学基金;广东省自然科学基金重点项目

摘要：	沉浸感是虚拟现实应用的重要特征之一,而虚拟场景中角色行为的智能性与真实性对虚拟现实应用的沉浸感有着显著影响.利用强化学习方法对球拍的击球策略进行训练,根据乒乓球游戏规则设计了一系列奖励函数,使之能根据来球起点位置与初始速度生成球拍的有效击球轨迹;进而以球拍轨迹约束虚拟球员的持拍手腕关节,采取逆向运动学与强化学习相结合的方法估计出球拍击球时虚拟球手的击球动作,得到能用合理姿态进行成功击球的虚拟球手.消融分析实验验证了所提出的奖励函数的合理性与有效性;测试实验则表明,所设计的虚拟球手能以合理的击球动作击球,成功率达到93%以上,与基于模仿学习的方法相仿,且高于其他方法;但基于模仿学习的方法需要物理采集训练数据,而训练强化学习网络所需的数据只需随机生成,所需代价更低.
关键词：	智能角色强化学习逆向运动学
Training a Virtual Tabletennis Player Based on Reinforcement Learning

Li Zicong,Zeng Fanzhong,Wu Zihui,Nie Yongwei,Xian Chuhua,Li Guiqing.Training a Virtual Tabletennis Player Based on Reinforcement Learning[J].Journal of Computer-Aided Design & Computer Graphics,2020,32(6):997-1008.

Authors:	Li Zicong Zeng Fanzhong Wu Zihui Nie Yongwei Xian Chuhua Li Guiqing

Affiliation:	(School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006)

Abstract:	It is essential for virtual reality(VR)applications to be of high reality and immersion while the intelligence and rationality of behaviors taken by virtual agents in the virtual scene can significantly improve the authenticity and immersion of VR applications.We employ reinforcement learning to train the hitting ball strategy of rackets and design a set of rewarding functions under the guidance of table tennis rules in order to generate a rational racket trajectory with starting position and initial velocity of the ball given.We further bind the racket to the hand of a virtual player and then solve the hitting action of the player by combining inverse kinematics and reinforcement learning.This makes the virtual player be able to hit the ball with a sequence of reasonable postures.Careful ablation analysis was conducted to show the necessity and effectiveness of our rewarding policies and testing experiments demonstrate that our approach can successfully hit the ball with more than 93%accuracy,which is comparable to that of the imitation learning based method and higher than those of other methods.However,compared to imitation learning,reinforcement learning is less expensive due to using random generated training data.

Keywords:	intelligence agents reinforcement learning inverse kinematics
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏