2021 Fiscal Year Final Research Report
AlphaZero toward Theoretical Values and Optimal Plays of Perfect Information Games
Project/Area Number |
20K19946
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 62040:Entertainment and game informatics-related
|
Research Institution | Japan Advanced Institute of Science and Technology |
Principal Investigator |
HSUEH Chu Hsuan 北陸先端科学技術大学院大学, 先端科学技術研究科, 助教 (30847497)
|
Project Period (FY) |
2020-04-01 – 2022-03-31
|
Keywords | AlphaZero / ゲームの解析 / 最適戦略 / 理論値 / Tabular / ニューラルネットワーク / 完全情報ゲーム / 確率的なゲーム |
Outline of Final Research Achievements |
AlphaZero outperformed professionals by learning from scratch based on self-play games, which only needed to know game rules. However, it is unclear whether AlphaZero can learn the optimal policies or theoretical values. In addition, there are only a few applications to games involving uncertainty. This research targeted games on small scales at first, where each position’s optimal policy and theoretical value can be obtained. The results showed that the learning of AlphaZero under many settings could converge to the optimal policies or theoretical values. In addition, for a game on a larger scale and involving uncertainty, it was also confirmed that the program based on AlphaZero was strong enough to obtain the silver medal in a tournament.
|
Free Research Field |
ゲーム情報学
|
Academic Significance and Societal Importance of the Research Achievements |
AlphaZero のパラメータを丁寧に調べ,学習結果への影響を明らかにしたことは学術的意義があった.AlphaZero を適用する研究者には,パラメータに関する試行錯誤のコストが減ることを期待する.また,サイコロを振るような不確定要素を含むゲームにおいても,AlphaZero の適用に成功したことの示しに貢献した. さらに,AlphaZero で学習した戦略と局面評価の質がいいことを示したことで,それらの戦略や局面評価の参考価値をより深めた.人間プレイヤ(特に強いプレイヤ)の上達に利用できることを考える.利用価値を深めたことは学術的意義にも社会的意義にも貢献したと考える.
|