2018 Fiscal Year Annual Research Report

Theory and Application of Statistical Reinforcement Learning

Research Project

Project/Area Number	17H00757
Research Institution	The University of Tokyo
Principal Investigator	杉山将東京大学, 大学院新領域創成科学研究科, 教授 (90334515)
Project Period (FY)	2017-04-01 – 2022-03-31
Keywords	強化学習 / 機械学習 / 多椀バンディット問題 / ロバスト性
Outline of Annual Research Achievements	本年度は，強化学習アルゴリズムの改良と多椀バンディット問題に対する手法開発・理論解析を行うとともに，強化学習をさらに発展させるために有用であると考えられる機械学習基盤技術の構築を広範に行い，多数の成果を得た．強化学習に関しては，アクタークリティック法，連続時間強化学習法，階層強化学習法の開発を行った．アクタークリティック法に関する研究では，曲率の情報を活用することによって，連続制御問題に対する収束性を大幅に改善することに成功した．連続時間強化学習法に関する研究では，時間を離散化することなく連続時間のまま，価値関数を再生核ヒルベルト空間の中で学習する汎用的な手法を提案した．階層強化学習法に関する研究では，複雑なタスクを階層的に分解することによって，効果的に政策を学習する手法を提案した．多椀バンディット問題に対しては，線形バンディット法，比較バンディット法，良腕識別法の研究を行った．線形バンディット法の研究では，最適腕識別問題に対する完全適応型アルゴリズムを開発した．比較バンディット法の研究では，絶対的な報酬が観測できない場合でも，相対的な報酬だけからでも学習できるアルゴリズムを開発した．良腕識別法に関する研究では，最適腕を見つけるのに時間がかかる場合でも，「良腕」であればすばやく見つけられるアルゴリズムを開発した．また，時系列データのモデリングに関するベイズ手法，および，ベイズ手法のロバスト化・高速化にも取り組んだ．具体的には，ノンパラメトリックモデルを用いてイベントの発生時刻データをモデル化する変分ベイズ推論アルゴリズム，および，正確なイベントの発生時刻でなく時刻の区間のみしかわからない場合でも適用できる変分ベイズ推論アルゴリズムを開発した．更に，クラウドソーシングにおける誤差解析と深層学習のロバスト学習法に関する研究も行った．
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 強化学習手法およびその周辺の強化学習手法に関する論文は，AISTATS, ICLR, ICML, NeurIPS, AAAIなど，研究業界でトップレベルと認識されている国際会議に採録されており，国際的にインパクトのある研究成果を多数得ることができた．また，これらの論文の多くは大学院生との共著であり，世界的に人材難が叫ばれているAI分野における若手人材育成にも大きく貢献できている．
Strategy for Future Research Activity	今後も，強化学習アルゴリズムの改良と多椀バンディット問題に対する新手法開発を続けるとともに，強化学習のさらなる発展に貢献できる可能性のある機械学習技術の基礎理論研究を行う．

Research Products
(11 results)

All 2019 2018 Other

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (9 results) (of which Int'l Joint Research: 9 results) Remarks (1 results)

[Journal Article] Good arm identification via bandit feedback.2019
- Author(s)
  Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
- Journal Title
  
  Machine Learning
  
  Volume: - Pages: -
- Peer Reviewed
[Presentation] Bayesian posterior approximation via greedy particle optimization.2019
- Author(s)
  Futami, F., Cui, Z., Sato, I., & Sugiyama, M.
- Organizer
  AAAI Conference on Artificial Intelligence (AAAI2019)
- Int'l Joint Research
[Presentation] Dueling bandits with qualitative feedback.2019
- Author(s)
  Xu, L., Honda, J., & Sugiyama, M.
- Organizer
  AAAI Conference on Artificial Intelligence (AAAI2019)
- Int'l Joint Research
[Presentation] Fully adaptive algorithm for pure exploration in linear bandits.2018
- Author(s)
  Xu, L., Honda, J., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Int'l Joint Research
[Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling.2018
- Author(s)
  Ding, H., Khan, M. E., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Int'l Joint Research
[Presentation] Guide actor-critic for continuous control.2018
- Author(s)
  Tangkaratt, V., Abdolmaleki, A., & Sugiyama, M.
- Organizer
  International Conference on Learning Representations (ICLR2018)
- Int'l Joint Research
[Presentation] Analysis of minimax error rate for crowdsourcing and its application to worker clustering model.2018
- Author(s)
  Imamura, H., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Machine Learning (ICML2018)
- Int'l Joint Research
[Presentation] Variational inference for Gaussian process with panel count data.2018
- Author(s)
  Ding, H., Lee, Y., Sato, I., & Sugiyama, M.
- Organizer
  Conference on Uncertainty in Artificial Intelligence (UAI2018)
- Int'l Joint Research
[Presentation] Continuous-time value function approximation in reproducing kernel Hilbert spaces.2018
- Author(s)
  Ohnishi, M., Yukawa, M., Johansson, M., & Sugiyama, M.
- Organizer
  Neural Information Processing Systems (NeurIPS2018)
- Int'l Joint Research
[Presentation] Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks.2018
- Author(s)
  Tsuzuku, Y., Sato, I., & Sugiyama, M.
- Organizer
  Neural Information Processing Systems (NeurIPS2018)
- Int'l Joint Research
[Remarks] 杉山将のウェブページ
- URL
  http://www.ms.k.u-tokyo.ac.jp/sugi/index-jp.html

2018 Fiscal Year Annual Research Report

Theory and Application of Statistical Reinforcement Learning

Principal Investigator

杉山 将 東京大学, 大学院新領域創成科学研究科, 教授 (90334515)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Good arm identification via bandit feedback.2019

Author(s)

Journal Title

[Presentation] Bayesian posterior approximation via greedy particle optimization.2019

Author(s)

Organizer

[Presentation] Dueling bandits with qualitative feedback.2019

Author(s)

Organizer

[Presentation] Fully adaptive algorithm for pure exploration in linear bandits.2018

Author(s)

Organizer

[Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling.2018

Author(s)

Organizer

[Presentation] Guide actor-critic for continuous control.2018

Author(s)

Organizer

[Presentation] Analysis of minimax error rate for crowdsourcing and its application to worker clustering model.2018

Author(s)

Organizer

[Presentation] Variational inference for Gaussian process with panel count data.2018

Author(s)

Organizer

[Presentation] Continuous-time value function approximation in reproducing kernel Hilbert spaces.2018

Author(s)

Organizer

[Presentation] Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks.2018

Author(s)

Organizer

[Remarks] 杉山将のウェブページ

URL

杉山将東京大学, 大学院新領域創成科学研究科, 教授 (90334515)