DeepMind's large-scale experiment on shogi artificial intelligence and verification of its knowledge acquisition process

Research Project

Project/Area Number	20K12120
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 62040:Entertainment and game informatics-related
Research Institution	The University of Electro-Communications
Principal Investigator	Hoki Kunihito 電気通信大学, 大学院情報理工学研究科, 准教授 (00436081)
Co-Investigator(Kenkyū-buntansha)	伊藤毅志電気通信大学, 大学院情報理工学研究科, 教授 (40262373)
Project Period (FY)	2020-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2022: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2020: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	AlphaZero / 深層学習 / 強化学習 / 将棋 / グラフィカル・プロセッシング・ユニット / ゲーム人工知能 / 人工知能 / ヒューリスティック探索
Outline of Research at the Start	2019年にSilverらは、囲碁・将棋などの知識を自己対局形式により獲得する人工知能プレイヤの強化学習アルゴリズムAlphaZeroとその実験結果をサイエンス誌で発表した。本研究では、この先行研究の将棋における大規模実験の追試を市場に出回っているハードウエアを用いて行い、強化学習過程や生成された人工知能の性能を観測し、AlphaZeroが将棋知識を獲得する過程を分析する。知識獲得過程の分析は、将棋などのボードゲームプレイヤの認知学の専門家 (研究分担者) と協調して行う。
Outline of Final Research Achievements	This research conducted a follow-up test of large-scale deep reinforcement learning for shogi in previous research using hardware on the market, and observed the learning process and the performance of the generated artificial intelligence. In order to conduct follow-up experiments, the presenter pursued the computational efficiency of self-game generation using graphics processing units (GPUs). Using NVIDIA's GPU, which costs about 150,000 yen, we achieved an efficiency of about 10,000 gameplays per day. The shogi player, which was constructed using only commercially available hardware, achieved performance comparable to previous research. In addition, inspired by the process of organizing the large amount of game records generated in this research, we developed a new method to represent the state space of a board game as a sparse set of combinations.
Academic Significance and Societal Importance of the Research Achievements	将棋人工知能と本研究課題で公開するプログラムを比較することにより、現在の日本における将棋人工知能技術と国際標準となったAlphaZeroの技術とを性能・コスト・プレイスタイルなどの面で比較検討することが可能となる。本研究の成果物であるプログラムコードAobaZeroは、GitHubリポジトリ「AobaZero」(https://github.com/kobanium/aobazero)にて公開している。インターネットの検索エンジンにて、二つのキーワード「将棋」と「AobaZero」で検索すると、これが多数の Web ページで紹介されていることが確認できる。