2020 Fiscal Year Research-status Report

不完全情報かつ多人数参加環境に適した構造を持つ深層強化学習手法の開発

Research Project

Project/Area Number	18K19832
Research Institution	The University of Tokyo
Principal Investigator	金子知適東京大学, 大学院情報学環・学際情報学府, 准教授 (00345068)
Project Period (FY)	2018-06-29 – 2022-03-31
Keywords	ゲームプログラミング
Outline of Annual Research Achievements	人工知能分野の強化学習では，環境の中で自律的に行動するAIエージェントを想定し，そのエージェントが試行錯誤を通じて振る舞いを学習する技術を研究する．本研究の目的は，不完全情報かつ多人数のゲームを題材に，モデルを持つ深層強化学習に関する基盤技術を開発することで，AIエージェントの劇的な性能向上をより広い分野で実現することにある．強化学習は，汎用性の高い枠組みで，エージェントと環境の相互作用を扱う．エージェントが「行動」することで「環境」に働きかけ，環境はエージェント行動と他の要因により確率的に「状態」を変え，エージェントは状態の一部を観測し，ときおり報酬(ペナルティを含む)を得る．ここで，どのような状態でどう行動するとどのような結果につながるかは事前に分からないだけでなく，確率的に結果が異なることもあるとする．そのため，エージェントは試行錯誤を繰り返して環境を理解する必要がある．本研究ではその対象をさらに広げて，現実に近い複雑さを持つ問題の例として，不完全情報かつ多人数のゲームを扱う．不完全情報とは，観測できない状態が存在することであり，多人数とは，状況によって敵にも味方にもなりうる他者が存在することである．問題が複雑になるほど，エージェントの学習は困難になる．そこで本研究では，既存技術である深層学習に加えて，不完全情報かつ多人数を扱うことに適したモデルの獲得と精密化を行う学習フレームワークを，新たに提唱し，核となる技術の確率を目指す．研究の三年度目として，本年度は昨年度までの成果を踏まえて，プレイヤが1人または2人の不完全情報ゲームにおける強化学習技術を拡張して，エージェントが3人以上の環境での学習技術に取り組んだ．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 研究計画に照らして，三年間で概ね妥当な技術を開発し評価している．
Strategy for Future Research Activity	今年度までに取り組んだ成果をより発展させるとともに，マルチエージェント環境での強化学習の評価実験を拡充し論文を出版する．
Causes of Carryover	補助事業の目的をより精緻に達成するための研究の実施(追加実験の実施や学会参加、論文投稿など)にあてるために補助事業期間の延長を申請し受理されたことによる．次年度に、追加実験の実施や学会参加、論文投稿を行う経費として使用する．

Research Products
(10 results)

All 2020

All Journal Article (6 results) (of which Peer Reviewed: 5 results, Open Access: 4 results) Presentation (4 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Playing catan with cross-dimensional neural network2020
- Author(s)
  Gendre and Kaneko
- Journal Title
  
  ICONIP
  
  Volume: 12533 Pages: 580-592
- DOI
  10.1007/978-3-030-63833-7_49
- Peer Reviewed
[Journal Article] Evaluation of loss function for stable policy learning in dobutsu shogi2020
- Author(s)
  Nakayashiki and Kaneko
- Journal Title
  
  International conference on technologies and applications of artificial intelligence
  
  Volume: N/A Pages: 175-180
- Peer Reviewed
[Journal Article] Ceramic: A research environment based on the multi-player strategic board game azul2020
- Author(s)
  Gendre and Kaneko
- Journal Title
  
  25th game programming workshop
  
  Volume: 978-4-907626-46-4 C3804 Pages: 155-160
- Peer Reviewed / Open Access
[Journal Article] Diverse exploration via infomax options2020
- Author(s)
  Kanagawa and Kaneko
- Journal Title
  
  Arxiv
  
  Volume: 978-4-907626-46-4 C3804 Pages: 1-21
- Open Access
[Journal Article] 離散行動空間における soft actor-critic の評価2020
- Author(s)
  合田金子
- Journal Title
  
  第25回ゲームプログラミングワークショップ
  
  Volume: 978-4-907626-46-4 C3804 Pages: 175-180
- Peer Reviewed / Open Access
[Journal Article] 逆転の余地を考慮した評価関数の設計とどうぶつしょうぎによる評価2020
- Author(s)
  中屋敷金子
- Journal Title
  
  第25回ゲームプログラミングワークショップ
  
  Volume: 978-4-907626-46-4 C3804 Pages: 22-29
- Peer Reviewed / Open Access
[Presentation] Improve counterfactual regret minimization for card game cheat2020
- Author(s)
  Yi and Kaneko
- Organizer
  25th game programming workshop
- Int'l Joint Research
[Presentation] Application of dream to the board game geister2020
- Author(s)
  Chen and Kaneko
- Organizer
  25th game programming workshop
- Int'l Joint Research
[Presentation] Training japanese mahjong agent with two dimension feature representation2020
- Author(s)
  Honghai and Kaneko
- Organizer
  25th game programming workshop
- Int'l Joint Research
[Presentation] ProcgenBenchmark における汎化性能を高める強化学習2020
- Author(s)
  徐金子
- Organizer
  第25回ゲームプログラミングワークショップ

2020 Fiscal Year Research-status Report

不完全情報かつ多人数参加環境に適した構造を持つ深層強化学習手法の開発

Principal Investigator

金子 知適 東京大学, 大学院情報学環・学際情報学府, 准教授 (00345068)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Playing catan with cross-dimensional neural network2020

Author(s)

Journal Title

DOI

[Journal Article] Evaluation of loss function for stable policy learning in dobutsu shogi2020

Author(s)

Journal Title

[Journal Article] Ceramic: A research environment based on the multi-player strategic board game azul2020

Author(s)

Journal Title

[Journal Article] Diverse exploration via infomax options2020

Author(s)

Journal Title

[Journal Article] 離散行動空間における soft actor-critic の評価2020

Author(s)

Journal Title

[Journal Article] 逆転の余地を考慮した評価関数の設計とどうぶつしょうぎによる評価2020

Author(s)

Journal Title

[Presentation] Improve counterfactual regret minimization for card game cheat2020

Author(s)

Organizer

[Presentation] Application of dream to the board game geister2020

Author(s)

Organizer

[Presentation] Training japanese mahjong agent with two dimension feature representation2020

Author(s)

Organizer

[Presentation] ProcgenBenchmark における汎化性能を高める強化学習2020

Author(s)

Organizer

金子知適東京大学, 大学院情報学環・学際情報学府, 准教授 (00345068)