Reinforcement Learning Using Deep Learning in Continuous Space Games

Research Project

Project/Area Number	18K11600
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 62040:Entertainment and game informatics-related
Research Institution	The University of Tokyo
Principal Investigator	Tanaka Tetsuro 東京大学, 情報基盤センター, 准教授 (60251360)
Project Period (FY)	2018-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000) Fiscal Year 2020: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2019: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2018: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords	連続空間ゲーム / 強化学習 / 不完全情報ゲーム / ナッシュ均衡 / 強解決 / 連続空間 / 深層学習
Outline of Final Research Achievements	As a foundation for research using digital curling, we proposed "deterministic digital curling," which eliminates the uncertainty of curling and obtained valuable insights into the game's outcomes. To validate the effectiveness of hierarchical reinforcement learning for handling incomplete information games, we conducted an evaluation using Mahjong and confirmed the effectiveness of hyperparameter automatic optimization frameworks like Optuna. Additionally, we verified the effectiveness of using GANs for the automatic generation of tower defense games and identified Nash equilibrium strategies for several incomplete information games. These research achievements have been made publicly available through published programs, making them accessible to future researchers.
Academic Significance and Societal Importance of the Research Achievements	本来の研究目的である連続空間ゲームにおける深層学習を利用した強化学習における有効な学習手法の提案は実現できなかったため，学術的には大きな成果をあげることはできなかったといえる．一方で，社会的意義としては，連続空間ゲームであるカーリングの性質を考察することにより，学習アルゴリズムにおいて考慮すべき点などを指摘した点，連続空間ゲームと深い関連を持つ，不完全情報ゲームのいくつかについて，強解決をおこなったり，ナッシュ均衡戦略を求め，その解析結果を公開することにより，それらのゲームを題材に深層学習を利用した強化学習をおこなう際の評価の指標となる「正解」を与えた点など，一定の成果を果たした．

Report

(6 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report
2020 Research-status Report
2019 Research-status Report
2018 Research-status Report

Research Products
(15 results)

All 2021 2020 2019 2018 Other

All Presentation (8 results) Remarks (7 results)

[Presentation] R-Rivals のナッシュ均衡戦略2021
- Author(s)
  田中哲朗
- Organizer
  第27回ゲームプログラミングワークショップ 2021
- Related Report
  2021 Research-status Report
[Presentation] Procedural Content Generation for Tower Defense Games:a Preliminary Experiment with Reinforcement Learning2021
- Author(s)
  Yueming Xu, Tetsuro Tanaka
- Organizer
  第27回ゲームプログラミングワークショップ 2021
- Related Report
  2021 Research-status Report
[Presentation] 深層強化学習を用いた麻雀プレイヤの構築2020
- Author(s)
  清水大志, 田中哲朗
- Organizer
  第26回ゲームプログラミングワークショップ 2020
- Related Report
  2020 Research-status Report
[Presentation] 量子「アンパンマンのはじめてしょうぎ」の強解決2020
- Author(s)
  田中哲朗
- Organizer
  第26回ゲームプログラミングワークショップ 2020
- Related Report
  2020 Research-status Report
[Presentation] 十六むさしの強解決2020
- Author(s)
  田中哲朗
- Organizer
  第26回ゲームプログラミングワークショップ 2020
- Related Report
  2020 Research-status Report
[Presentation] 麻雀のポリシー関数に適したネットワークモデルの構築と評価2019
- Author(s)
  清水大志 , 田中哲朗
- Organizer
  情報処理学会ゲームプログラミングワークショップ2019
- Related Report
  2019 Research-status Report
[Presentation] グリッド世界を用いた階層型強化学習の評価2019
- Author(s)
  高岡峻 , 田中哲朗
- Organizer
  情報処理学会ゲームプログラミングワークショップ2019
- Related Report
  2019 Research-status Report
[Presentation] 決定的なデジタルカーリングの戦略2018
- Author(s)
  田中哲朗
- Organizer
  カーリング科学ワークショップ
- Related Report
  2018 Research-status Report
[Remarks] r-rivals検証コード
- URL
  https://github.com/tanakat01/r-rivals
- Related Report
  2021 Research-status Report
[Remarks] すずめ雀強化学習実験プログラム
- URL
  https://github.com/minnsou/suzume-jong
- Related Report
  2020 Research-status Report
[Remarks] 量子「アンパンマンのはじめてしょうぎ」の後退解析プログラム
- URL
  https://github.com/tanakat01/quantum_anpanman
- Related Report
  2020 Research-status Report
[Remarks] 十六むさし後退解析プログラム
- URL
  https://github.com/tanakat01/16musashi
- Related Report
  2020 Research-status Report
[Remarks] 十六むさし局面検索
- URL
  https://gps.tanaka.ecc.u-tokyo.ac.jp/16musashi/
- Related Report
  2020 Research-status Report
[Remarks] ミニ麻雀環境
- URL
  https://github.com/u-tokyo-gps-tanaka-lab/mini_mahjong
- Related Report
  2019 Research-status Report
[Remarks] 「グリッド世界を用いた階層型強化学習の評価」実験コード
- URL
  https://github.com/u-tokyo-gps-tanaka-lab/gridworld_for_HRL
- Related Report
  2019 Research-status Report

Reinforcement Learning Using Deep Learning in Continuous Space Games

Principal Investigator

Tanaka Tetsuro 東京大学, 情報基盤センター, 准教授 (60251360)

¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)

Report

Research Products

[Presentation] R-Rivals のナッシュ均衡戦略2021

Author(s)

Organizer

Related Report

[Presentation] Procedural Content Generation for Tower Defense Games:a Preliminary Experiment with Reinforcement Learning2021

Author(s)

Organizer

Related Report

[Presentation] 深層強化学習を用いた麻雀プレイヤの構築2020

Author(s)

Organizer

Related Report

[Presentation] 量子「アンパンマンのはじめてしょうぎ」の強解決2020

Author(s)

Organizer

Related Report

[Presentation] 十六むさしの強解決2020

Author(s)

Organizer

Related Report

[Presentation] 麻雀のポリシー関数に適したネットワークモデルの構築と評価2019

Author(s)

Organizer

Related Report

[Presentation] グリッド世界を用いた階層型強化学習の評価2019

Author(s)

Organizer

Related Report

[Presentation] 決定的なデジタルカーリングの戦略2018

Author(s)

Organizer

Related Report

[Remarks] r-rivals検証コード

URL

Related Report

[Remarks] すずめ雀強化学習実験プログラム

URL

Related Report

[Remarks] 量子「アンパンマンのはじめてしょうぎ」の後退解析プログラム

URL

Related Report

[Remarks] 十六むさし後退解析プログラム

URL

Related Report

[Remarks] 十六むさし局面検索

URL

Related Report

[Remarks] ミニ麻雀環境

URL

Related Report

[Remarks] 「グリッド世界を用いた階層型強化学習の評価」 実験コード

URL

Related Report

[Remarks] 「グリッド世界を用いた階層型強化学習の評価」実験コード