Theory and Application of Statistical Reinforcement Learning

Research Project

Project/Area Number	17H00757
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	Sugiyama Masashi 東京大学, 大学院新領域創成科学研究科, 教授 (90334515)
Project Period (FY)	2017-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥44,980,000 (Direct Cost: ¥34,600,000、Indirect Cost: ¥10,380,000) Fiscal Year 2021: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000) Fiscal Year 2020: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000) Fiscal Year 2019: ¥10,660,000 (Direct Cost: ¥8,200,000、Indirect Cost: ¥2,460,000) Fiscal Year 2018: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000) Fiscal Year 2017: ¥11,700,000 (Direct Cost: ¥9,000,000、Indirect Cost: ¥2,700,000)
Keywords	強化学習 / 機械学習 / 多腕バンディット問題 / 模倣学習 / ベイズ推論 / ロバスト性 / 多椀バンディット問題 / ロバスト化 / クラウドソーシング / 次元削減 / マルチタスク学習 / オンライン学習
Outline of Final Research Achievements	In this research, we developed theories and algorithms for sqeuential decision making and probabilistic inference. In the study of reinforcement learning, we developed methods for weakly supervised imitation learning and hierarchization of complex problems to improve their practicality, and demonstrated their effectiveness experimentally. For multi-arm bandit problems, we developed algorithms with theoretical guarantees for linear bandit, dueling bandit, good-arm identification, and combinatorial bandit. In the area of probabilistic inference, we have conducted research on making Bayesian inference robust, speeding up approximate computation, and modeling temporal events, and have verified the effectiveness of these methods both theoretically and experimentally.
Academic Significance and Societal Importance of the Research Achievements	逐次的意思決定や確率的推論は，今後の発展が大いに期待される重要な機械学習技術である．本研究では，強化学習や多腕バンディットの適用範囲を拡大する新しいアルゴリズムを開発するとともに，確率的推論のロバスト性向上や近似計算の高速化に関する研究を行った．このような基礎理論的な研究成果は，逐次的意思決定や確率的推論の原理の解明に貢献するものであり，機械学習分野の主要国際会議で学術的に高い評価を受けた．また，開発したアルゴリズムの有効性は計算機実験によって示されており，将来の社会実装につながる社会的意義のある開発であるとも考えられる．

Report

(6 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Annual Research Report
2019 Annual Research Report
2018 Annual Research Report
2017 Annual Research Report

Research Products
(40 results)

All 2022 2021 2020 2019 2018 2017 Other

All Int'l Joint Research (3 results) Journal Article (15 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 15 results, Open Access: 9 results) Presentation (16 results) (of which Int'l Joint Research: 15 results) Book (2 results) Remarks (3 results) Funded Workshop (1 results)

[Int'l Joint Research] University of Washington/Georgia Institute of Technology(米国)
- Related Report
  2021 Annual Research Report
[Int'l Joint Research] TU Darmstadt(Germany)
- Related Report
  2017 Annual Research Report
[Int'l Joint Research] Data61(Australia)
- Related Report
  2017 Annual Research Report
[Journal Article] Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.2022
- Author(s)
  Osa, T., Tangkaratt, V., & Sugiyama, M.
- Journal Title
  
  Neural Networks
  
  Volume: -
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Constraint learning for control tasks with limited duration barrier functions2021
- Author(s)
  Ohnishi Motoya、Notomista Gennaro、Sugiyama Masashi、Egerstedt Magnus
- Journal Title
  
  Automatica
  
  Volume: 127 Pages: 109504-109504
- DOI
  10.1016/j.automatica.2021.109504
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] A unified view of likelihood ratio and reparameterization gradients2021
- Author(s)
  Parmas, P. & Sugiyama, M.
- Journal Title
  
  Proceedings of 24th International Conference on Artificial Intelligence and Statistics (AISTATS2021)
  
  Volume: - Pages: 4078-4086
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Robust imitation learning from noisy demonstrations2021
- Author(s)
  Tangkaratt, V., Charoenphakdee, N., & Sugiyama, M.
- Journal Title
  
  Proceedings of 24th International Conference on Artificial Intelligence and Statistics (AISTATS2021)
  
  Volume: - Pages: 298-306
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] γ-ABC: Outlier-robust approximate Bayesian computation based on a robust divergence estimator2021
- Author(s)
  Fujisawa, M., Teshima, T., Sato, I., & Sugiyama, M.
- Journal Title
  
  Proceedings of 24th International Conference on Artificial Intelligence and Statistics (AISTATS2021)
  
  Volume: - Pages: 1783-1791
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Variational imitation learning with diverse-quality demonstrations.2020
- Author(s)
  Tangkaratt, V., Han, B., Khan, M. E., & Sugiyama, M.
- Journal Title
  
  Proceedings of 37th International Conference on Machine Learning (ICML2020)
  
  Volume: - Pages: 9407-9417
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Polynomial-time algorithms for multiple-arm identification with full-bandit feedback.2020
- Author(s)
  Kuroki, Y., Xu, L., Miyauchi, A., Honda, J., & Sugiyama, M.
- Journal Title
  
  Neural Computation
  
  Volume: 32 Pages: 1733-1773
- Related Report
  2020 Annual Research Report
- Peer Reviewed
[Journal Article] Online dense subgraph discovery via blurred-graph feedback.2020
- Author(s)
  Kuroki, Y., Miyauchi, A., Honda, J., & Sugiyama, M.
- Journal Title
  
  Proceedings of 37th International Conference on Machine Learning (ICML2020)
  
  Volume: - Pages: 5522-5532
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Accelerating the diffusion-based ensemble sampling by non-reversible dynamics.2020
- Author(s)
  Futami, F., Sato, I., & Sugiyama, M.
- Journal Title
  
  Proceedings of 37th International Conference on Machine Learning (ICML2020)
  
  Volume: - Pages: 3337-3347
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Analysis and design of Thompson sampling for stochastic partial monitoring.2020
- Author(s)
  Tsuchiya, T., Honda, J., & Sugiyama, M.
- Journal Title
  
  Advances in Neural Information Processing Systems 33
  
  Volume: - Pages: 8861-8871
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Active deep Q-learning with demonstration2020
- Author(s)
  Chen, S.-A., Tangkaratt, V., Lin, H.-T., & Sugiyama, M.
- Journal Title
  
  Machine Learning, to appear
  
  Volume: -
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Good arm identification via bandit feedback2019
- Author(s)
  Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
- Journal Title
  
  Machine Learning
  
  Volume: 108 Pages: 721-745
- Related Report
  2019 Annual Research Report
- Peer Reviewed
[Journal Article] Hierarchical reinforcement learning via advantage-weighted information maximization2019
- Author(s)
  Osa, T., Tangkaratt, V., & Sugiyama, M.
- Journal Title
  
  Proceedings of Seventh International Conference on Learning Representations (ICLR2019)
  
  Volume: -
- Related Report
  2019 Annual Research Report
- Peer Reviewed
[Journal Article] Imitation learning from imperfect demonstration2019
- Author(s)
  Wu, Y.-H., Charoenphakdee, N., Bao, H., Tangkaratt, V., & Sugiyama, M.
- Journal Title
  
  Proceedings of 36th International Conference on Machine Learning (ICML2019)
  
  Volume: - Pages: 6818-6827
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Good arm identification via bandit feedback.2019
- Author(s)
  Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
- Journal Title
  
  Machine Learning
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed
[Presentation] Bayesian posterior approximation via greedy particle optimization.2019
- Author(s)
  Futami, F., Cui, Z., Sato, I., & Sugiyama, M.
- Organizer
  AAAI Conference on Artificial Intelligence (AAAI2019)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Dueling bandits with qualitative feedback.2019
- Author(s)
  Xu, L., Honda, J., & Sugiyama, M.
- Organizer
  AAAI Conference on Artificial Intelligence (AAAI2019)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Fully adaptive algorithm for pure exploration in linear bandits.2018
- Author(s)
  Xu, L., Honda, J., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling.2018
- Author(s)
  Ding, H., Khan, M. E., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Guide actor-critic for continuous control.2018
- Author(s)
  Tangkaratt, V., Abdolmaleki, A., & Sugiyama, M.
- Organizer
  International Conference on Learning Representations (ICLR2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Analysis of minimax error rate for crowdsourcing and its application to worker clustering model.2018
- Author(s)
  Imamura, H., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Machine Learning (ICML2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Variational inference for Gaussian process with panel count data.2018
- Author(s)
  Ding, H., Lee, Y., Sato, I., & Sugiyama, M.
- Organizer
  Conference on Uncertainty in Artificial Intelligence (UAI2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Continuous-time value function approximation in reproducing kernel Hilbert spaces.2018
- Author(s)
  Ohnishi, M., Yukawa, M., Johansson, M., & Sugiyama, M.
- Organizer
  Neural Information Processing Systems (NeurIPS2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks.2018
- Author(s)
  Tsuzuku, Y., Sato, I., & Sugiyama, M.
- Organizer
  Neural Information Processing Systems (NeurIPS2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Fully adaptive algorithm for pure exploration in linear bandits2018
- Author(s)
  Xu, L., Honda, J., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Variational inference based on robust divergences2018
- Author(s)
  Futami, F., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling2018
- Author(s)
  Ding, H., Khan, M. E., Sato, I., & Sugiyama, M.
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS2018)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Hierarchical policy search via return-weighted density estimation2018
- Author(s)
  Osa, T. & Sugiyama, M.
- Organizer
  AAAI Conference on Artificial Intelligence (AAAI2018)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Guide actor-critic for continuous control2018
- Author(s)
  Tangkaratt, V., Abdolmaleki, A., & Sugiyama, M.
- Organizer
  International Conference on Learning Representations (ICLR2018)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Good arm identification from bandit feedback2017
- Author(s)
  Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
- Organizer
  2017 Workshop on Information-Based Induction Sciences (IBIS2017)
- Related Report
  2017 Annual Research Report
[Presentation] Expectation propagation for t-exponential family using q-algebra2017
- Author(s)
  Futami, F., Sato, I., & Sugiyama, M.
- Organizer
  Neural Information Processing Systems (NIPS2017)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Book] Machine Learning from Weak Supervision: An Empirical Risk Minimization Approach2022
- Author(s)
  Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai, and Gang Niu
- Publisher
  The MIT Press
- Related Report
  2021 Annual Research Report
[Book] An Algorithmic Perspective on Imitation Learning2018
- Author(s)
  Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel and Jan Peters
- Total Pages
  179
- Publisher
  Foundations and Trends in Robotics
- Related Report
  2017 Annual Research Report
[Remarks] 論文リスト
- URL
  http://www.ms.k.u-tokyo.ac.jp/sugi/publications.html
- Related Report
  2021 Annual Research Report 2020 Annual Research Report 2019 Annual Research Report
[Remarks] 杉山将のウェブページ
- URL
  http://www.ms.k.u-tokyo.ac.jp/sugi/index-jp.html
- Related Report
  2018 Annual Research Report
[Remarks] Publications
- URL
  http://www.ms.k.u-tokyo.ac.jp/sugi/publications.html
- Related Report
  2017 Annual Research Report
[Funded Workshop] Tokyo Deep Learning Workshop (TDLW2018)2018
- Related Report
  2017 Annual Research Report

Theory and Application of Statistical Reinforcement Learning

Principal Investigator

Sugiyama Masashi 東京大学, 大学院新領域創成科学研究科, 教授 (90334515)

¥44,980,000 (Direct Cost: ¥34,600,000、Indirect Cost: ¥10,380,000)

Report

Research Products

[Int'l Joint Research] University of Washington/Georgia Institute of Technology(米国)

Related Report

[Int'l Joint Research] TU Darmstadt(Germany)

Related Report

[Int'l Joint Research] Data61(Australia)

Related Report

[Journal Article] Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.2022

Author(s)

Journal Title

Related Report

[Journal Article] Constraint learning for control tasks with limited duration barrier functions2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A unified view of likelihood ratio and reparameterization gradients2021

Author(s)

Journal Title

Related Report

[Journal Article] Robust imitation learning from noisy demonstrations2021

Author(s)

Journal Title

Related Report

[Journal Article] γ-ABC: Outlier-robust approximate Bayesian computation based on a robust divergence estimator2021

Author(s)

Journal Title

Related Report

[Journal Article] Variational imitation learning with diverse-quality demonstrations.2020

Author(s)

Journal Title

Related Report

[Journal Article] Polynomial-time algorithms for multiple-arm identification with full-bandit feedback.2020

Author(s)

Journal Title

Related Report

[Journal Article] Online dense subgraph discovery via blurred-graph feedback.2020

Author(s)

Journal Title

Related Report

[Journal Article] Accelerating the diffusion-based ensemble sampling by non-reversible dynamics.2020

Author(s)

Journal Title

Related Report

[Journal Article] Analysis and design of Thompson sampling for stochastic partial monitoring.2020

Author(s)

Journal Title

Related Report

[Journal Article] Active deep Q-learning with demonstration2020

Author(s)

Journal Title

Related Report

[Journal Article] Good arm identification via bandit feedback2019

Author(s)

Journal Title

Related Report

[Journal Article] Hierarchical reinforcement learning via advantage-weighted information maximization2019

Author(s)

Journal Title

Related Report

[Journal Article] Imitation learning from imperfect demonstration2019

Author(s)

Journal Title

Related Report

[Journal Article] Good arm identification via bandit feedback.2019

Author(s)

Journal Title

Related Report

[Presentation] Bayesian posterior approximation via greedy particle optimization.2019

Author(s)

Organizer

Related Report

[Presentation] Dueling bandits with qualitative feedback.2019

Author(s)

Organizer