2017 Fiscal Year Annual Research Report

Theory and Application of Statistical Reinforcement Learning

Research Project

Project/Area Number	17H00757
Research Institution	The University of Tokyo
Principal Investigator	杉山将東京大学, 大学院新領域創成科学研究科, 教授 (90334515)
Project Period (FY)	2017-04-01 – 2022-03-31
Keywords	強化学習 / 機械学習 / 多椀バンディット問題 / ロバスト性
Outline of Annual Research Achievements	本年度は，標準的な強化学習法がうまく適用できない困難な状況ーーー例えば，大量にデータを収集することが難しい状況，環境が動的に変化する状況，データに異常値が含まれる状況などーーーでも，実用的に動作する強化学習アルゴリズムの開発を目指し，基礎研究を行なった．特に，複数の方策を適用的に使い分ける階層型の強化学習の新手法，および，価値関数の２次の情報を活用できるactor-critic強化学習の新手法を開発した．そして，これらの手法の有効性を計算機実験によって評価し，従来法を上回る性能が得られることを確認した．また強化学習の一課題である多椀バンディット問題に対して，報酬が線形の場合の理論保証付き学習アルゴリズム，および，良い腕を効率的に見つけるという新しい定式化に対する理論保証付きの学習アルゴリズムを開発し，それらの有効性を数値実験によって確認した．また，データに異常値が含まれる状況に対するベイズ推論問題に対して，モデルベースのロバスト推論手法，モデルフリーのロバスト推論手法，および，時系列データのノンパラメトリック解析手法を開発し，それらの有効性を数値実験によって確認した．上記の基礎技術研究に加え，自動車，ドローン，工事車両などの制御や，コンピュータゲーム，オンライン広告配信，医学の臨床試験，複数エージェント交渉，防災などの分野における強化学習の適用可能性について様々な企業や研究機関と議論を行った．
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 既存の強化学習がうまく適用できない困難な状況に対応すべく，新しい枠組みの考案，新しいアルゴリズムの開発，それらに付随する基礎理論解析を多方面に渡って行い，国際的にインパクトのある著名国際会議に多数の論文を発表することができた．応用の観点からは，多数の企業や研究所から強化学習に関する問い合わせを受けており，既存の応用分野における性能改善だけでなく，強化学習関連技術の新たな応用の可能性を切り開きつつある．また，強化学習関連分野は世界的に見ても極度な人材不足に苦しんでいるが，本科研費プロジェクトの立ち上げとともに，多数の優秀な若手研究者を確保できており，人材育成の観点からも最高のスタートを切ることができている．
Strategy for Future Research Activity	引き続き，既存の強化学習がうまく適用できない困難な状況に対応できる新しいアプローチを貪欲に開拓するとともに，強化学習の普及へ向けた新たな応用分野の開拓，人材育成を総合的に進めていく．

Research Products
(12 results)

All 2018 2017 Other

All Int'l Joint Research (2 results) Presentation (7 results) (of which Int'l Joint Research: 6 results) Book (1 results) Remarks (1 results) Funded Workshop (1 results)

[Int'l Joint Research] TU Darmstadt(Germany)
- Country Name
  Germany
- Counterpart Institution
  TU Darmstadt
[Int'l Joint Research] Data61(Australia)
- Country Name
  Australia
- Counterpart Institution
  Data61
[Presentation] Fully adaptive algorithm for pure exploration in linear bandits2018
- Author(s)
  Xu, L., Honda, J., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Int'l Joint Research
[Presentation] Variational inference based on robust divergences2018
- Author(s)
  Futami, F., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Int'l Joint Research
[Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling2018
- Author(s)
  Ding, H., Khan, M. E., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Int'l Joint Research
[Presentation] Hierarchical policy search via return-weighted density estimation2018
- Author(s)
  Osa, T. & Sugiyama, M.
- Organizer
  AAAI Conference on Artificial Intelligence (AAAI2018)
- Int'l Joint Research
[Presentation] Guide actor-critic for continuous control2018
- Author(s)
  Tangkaratt, V., Abdolmaleki, A., & Sugiyama, M.
- Organizer
  International Conference on Learning Representations (ICLR2018)
- Int'l Joint Research
[Presentation] Good arm identification from bandit feedback2017
- Author(s)
  Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
- Organizer
  2017 Workshop on Information-Based Induction Sciences (IBIS2017)
[Presentation] Expectation propagation for t-exponential family using q-algebra2017
- Author(s)
  Futami, F., Sato, I., & Sugiyama, M.
- Organizer
  Neural Information Processing Systems (NIPS2017)
- Int'l Joint Research
[Book] An Algorithmic Perspective on Imitation Learning2018
- Author(s)
  Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel and Jan Peters
- Total Pages
  179
- Publisher
  Foundations and Trends in Robotics
[Remarks] Publications
- URL
  http://www.ms.k.u-tokyo.ac.jp/sugi/publications.html
[Funded Workshop] Tokyo Deep Learning Workshop (TDLW2018)2018

2017 Fiscal Year Annual Research Report

Theory and Application of Statistical Reinforcement Learning

Principal Investigator

杉山 将 東京大学, 大学院新領域創成科学研究科, 教授 (90334515)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] TU Darmstadt(Germany)

Country Name

Counterpart Institution

[Int'l Joint Research] Data61(Australia)

Country Name

Counterpart Institution

[Presentation] Fully adaptive algorithm for pure exploration in linear bandits2018

Author(s)

Organizer

[Presentation] Variational inference based on robust divergences2018

Author(s)

Organizer

[Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling2018

Author(s)

Organizer

[Presentation] Hierarchical policy search via return-weighted density estimation2018

Author(s)

Organizer

[Presentation] Guide actor-critic for continuous control2018

Author(s)

Organizer

[Presentation] Good arm identification from bandit feedback2017

Author(s)

Organizer

[Presentation] Expectation propagation for t-exponential family using q-algebra2017

Author(s)

Organizer

[Book] An Algorithmic Perspective on Imitation Learning2018

Author(s)

Total Pages

Publisher

[Remarks] Publications

URL

[Funded Workshop] Tokyo Deep Learning Workshop (TDLW2018)2018

杉山将東京大学, 大学院新領域創成科学研究科, 教授 (90334515)