Affiliation: | (1) National Institution for Academic Degrees and University Evaluation, 1-29-1 Gakuen-nishimachi, Kodaira, 187-8587 Tokyo, Japan;(2) Toshiba, Kawasaki, Kanagawa, Japan;(3) Tokyo Institute of Technology, Yokohama, Kanagawa, Japan |
Abstract: | The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY. This work was presented, in part, at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002 |