Construction of Go Algorithm Based on Modern Heuristics and Playout
Project/Area Number |
16K00510
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Entertainment and game informatics 1
|
Research Institution | Aichi Institute of Technology |
Principal Investigator |
ITOH Masaru 愛知工業大学, 情報科学部, 教授 (80221026)
|
Research Collaborator |
ITOH arito
KAWAI makoto
YAMAKAWA yushi
|
Project Period (FY) |
2016-04-01 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2018: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2017: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2016: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
|
Keywords | 囲碁アルゴリズム / プレイアウト / モダンヒューリスティックス / ニューラルネットワーク / 深層学習 / モンテカルロ木探索 / アクション値 / ヒューリスティック / 畳み込みニューラルネットワーク / 碁テキストプロトコル |
Outline of Final Research Achievements |
The recently major algorithm in computer go is based on the Monte-Carlo tree search. The search method generates a large amount of playouts to obtain the best next move. However, those playouts are rarely reused until the end. So, I proposed a method to get the best next move by making game tree from past playout history. It was confirmed that the proposed method can be applied to small problems such as tsumego. I proposed another algorithm based on deep learning and playouts. Deep learning was realized by multi-layer convolutional neural networks. Neural network is one of the representative methods of modern heuristics. In the Value-MCTS, the search method did not use a win/loss of rollout proposed by AlphaGo, and instead substituted this part with playout. For node evaluation, not the UCB1 value but the action value proposed by AlphaGo was adopted. It was shown statistically that the proposed method is superior to the existing Go software if the parameter values are set correctly.
|
Academic Significance and Societal Importance of the Research Achievements |
過去のプレイアウト履歴からゲーム木を作成して最善手を求める方法は詰碁のような狭い探索空間で有効に機能した。しかし、9路盤囲碁では完全に無効であった。アイデアは興味深いが、モンテカルロ木探索と同等以上の手法にはなり得ないことが判明した。 もうひとつの深層学習とプレイアウトに基づく囲碁アルゴリズムは少資源環境下で動作させることに成功した。既存のオープンソース囲碁に統計的に有意に勝利できることを確認した。Value-MCTSで必要となるノード展開閾値やMixingパラメータといった各種パラメータ値を適切に設定する必要があり、それらの調整は容易でない。さらに、実行時間の短縮という大きな課題も残った。
|
Report
(4 results)
Research Products
(10 results)