2023 Fiscal Year Final Research Report

Evaluation and Improvement of AlphaGo Method for Imperfect-Information, Stochastic, and Multiplayer Games

Research Project

PDF

Project/Area Number	20K12124
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 62040:Entertainment and game informatics-related
Research Institution	Kochi University of Technology
Principal Investigator	Matsuzaki Kiminori 高知工科大学, 情報学群, 教授 (30401243)
Project Period (FY)	2020-04-01 – 2024-03-31
Keywords	AlphaGo / 深層強化学習 / モンテカルロ木探索 / 確率的ゲーム / 不完全情報ゲーム / 非対称二人ゲーム
Outline of Final Research Achievements	In this study, we first investigated the impact of different evaluation functions on overall performance of the PUCT search in the AlphaGo method, using the game "Othello". Next, we developed computer players using deep reinforcement learning for various games: a stochastic single-player game "2048", an imperfect information game "Geister", an asymmetric two-player game "Two-player 2048", and a multiplayer imperfect information game "DouDizhu". In particular, for "2048", we explored the possibility of applying a more lightweight learning method from the perspective of the necessity of the Policy function in the AlphaGo method.
Free Research Field	ゲーム情報学
Academic Significance and Societal Importance of the Research Achievements	AlphaGo手法 (後継であるAlphaGo Zero, AlphaZero, MuZeroを含む) は，チェス・将棋・囲碁のような二人完全情報ゲームにおいて人間を超える強さのプレイヤーを実現した．本研究は，不完全情報ゲームや確率的ゲームといったより困難なゲームに対してAlphaGo手法（または一般に深層強化学習）を適用する上で遭遇しうる問題点をいくつか明らかにした．特に，確率的ゲーム「2048」における深層強化学習において，確率的要素が学習を悪化させることを明らかにし，その対応方法につながる課題の発見に至った．