首页 | 本学科首页   官方微博 | 高级检索  
     


Policy gradient learning for a humanoid soccer robot
Authors:A. Cherubini  F. Giannone  L. Iocchi  M. Lombardo  G. Oriolo
Affiliation:1. Aix Marseille Université, CNRS, EFS-AM, ADES UMR7268, 13344 Marseille, France;2. Université Paul Sabatier, CNRS, AMIS UMR5288, 31073 Toulouse, France;3. Service de Parasitologie-Mycologie, Centre Hospitalier Universitaire de Toulouse/INSERM UMR1043/CNRS UMR5282/Université de Toulouse UPS, Centre de Physiopathiologie de Toulouse Purpan (CPTP), 31300 Toulouse, France;4. CIC-EC Antilles Guyane CIE 802 Inserm, Centre Hospitalier Andrée Rosemon, France;5. Equipe EPaT EA 3593, Université des Antilles et de la Guyane, Cayenne, Guyane française, France;6. Établissement Français du Sang Alpes Méditerranée, 13005 Marseille, France;2. CHU (Centre Hospitalier Universitaire) de Toulouse, Laboratoire d’Hématologie, Toulouse, France
Abstract:In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at high-level (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号