• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Theory and Application of Statistical Reinforcement Learning

Research Project

Project/Area Number 17H00757
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

Sugiyama Masashi  東京大学, 大学院新領域創成科学研究科, 教授 (90334515)

Project Period (FY) 2017-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥44,980,000 (Direct Cost: ¥34,600,000、Indirect Cost: ¥10,380,000)
Fiscal Year 2021: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000)
Fiscal Year 2020: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000)
Fiscal Year 2019: ¥10,660,000 (Direct Cost: ¥8,200,000、Indirect Cost: ¥2,460,000)
Fiscal Year 2018: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000)
Fiscal Year 2017: ¥11,700,000 (Direct Cost: ¥9,000,000、Indirect Cost: ¥2,700,000)
Keywords強化学習 / 機械学習 / 多腕バンディット問題 / 模倣学習 / ベイズ推論 / ロバスト性 / 多椀バンディット問題 / ロバスト化 / クラウドソーシング / 次元削減 / マルチタスク学習 / オンライン学習
Outline of Final Research Achievements

In this research, we developed theories and algorithms for sqeuential decision making and probabilistic inference. In the study of reinforcement learning, we developed methods for weakly supervised imitation learning and hierarchization of complex problems to improve their practicality, and demonstrated their effectiveness experimentally. For multi-arm bandit problems, we developed algorithms with theoretical guarantees for linear bandit, dueling bandit, good-arm identification, and combinatorial bandit. In the area of probabilistic inference, we have conducted research on making Bayesian inference robust, speeding up approximate computation, and modeling temporal events, and have verified the effectiveness of these methods both theoretically and experimentally.

Academic Significance and Societal Importance of the Research Achievements

逐次的意思決定や確率的推論は,今後の発展が大いに期待される重要な機械学習技術である.本研究では,強化学習や多腕バンディットの適用範囲を拡大する新しいアルゴリズムを開発するとともに,確率的推論のロバスト性向上や近似計算の高速化に関する研究を行った.このような基礎理論的な研究成果は,逐次的意思決定や確率的推論の原理の解明に貢献するものであり,機械学習分野の主要国際会議で学術的に高い評価を受けた.また,開発したアルゴリズムの有効性は計算機実験によって示されており,将来の社会実装につながる社会的意義のある開発であるとも考えられる.

Report

(6 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Annual Research Report
  • 2019 Annual Research Report
  • 2018 Annual Research Report
  • 2017 Annual Research Report
  • Research Products

    (40 results)

All 2022 2021 2020 2019 2018 2017 Other

All Int'l Joint Research (3 results) Journal Article (15 results) (of which Int'l Joint Research: 4 results,  Peer Reviewed: 15 results,  Open Access: 9 results) Presentation (16 results) (of which Int'l Joint Research: 15 results) Book (2 results) Remarks (3 results) Funded Workshop (1 results)

  • [Int'l Joint Research] University of Washington/Georgia Institute of Technology(米国)

    • Related Report
      2021 Annual Research Report
  • [Int'l Joint Research] TU Darmstadt(Germany)

    • Related Report
      2017 Annual Research Report
  • [Int'l Joint Research] Data61(Australia)

    • Related Report
      2017 Annual Research Report
  • [Journal Article] Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.2022

    • Author(s)
      Osa, T., Tangkaratt, V., & Sugiyama, M.
    • Journal Title

      Neural Networks

      Volume: -

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Constraint learning for control tasks with limited duration barrier functions2021

    • Author(s)
      Ohnishi Motoya、Notomista Gennaro、Sugiyama Masashi、Egerstedt Magnus
    • Journal Title

      Automatica

      Volume: 127 Pages: 109504-109504

    • DOI

      10.1016/j.automatica.2021.109504

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] A unified view of likelihood ratio and reparameterization gradients2021

    • Author(s)
      Parmas, P. & Sugiyama, M.
    • Journal Title

      Proceedings of 24th International Conference on Artificial Intelligence and Statistics (AISTATS2021)

      Volume: - Pages: 4078-4086

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Robust imitation learning from noisy demonstrations2021

    • Author(s)
      Tangkaratt, V., Charoenphakdee, N., & Sugiyama, M.
    • Journal Title

      Proceedings of 24th International Conference on Artificial Intelligence and Statistics (AISTATS2021)

      Volume: - Pages: 298-306

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] γ-ABC: Outlier-robust approximate Bayesian computation based on a robust divergence estimator2021

    • Author(s)
      Fujisawa, M., Teshima, T., Sato, I., & Sugiyama, M.
    • Journal Title

      Proceedings of 24th International Conference on Artificial Intelligence and Statistics (AISTATS2021)

      Volume: - Pages: 1783-1791

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Variational imitation learning with diverse-quality demonstrations.2020

    • Author(s)
      Tangkaratt, V., Han, B., Khan, M. E., & Sugiyama, M.
    • Journal Title

      Proceedings of 37th International Conference on Machine Learning (ICML2020)

      Volume: - Pages: 9407-9417

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Polynomial-time algorithms for multiple-arm identification with full-bandit feedback.2020

    • Author(s)
      Kuroki, Y., Xu, L., Miyauchi, A., Honda, J., & Sugiyama, M.
    • Journal Title

      Neural Computation

      Volume: 32 Pages: 1733-1773

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Online dense subgraph discovery via blurred-graph feedback.2020

    • Author(s)
      Kuroki, Y., Miyauchi, A., Honda, J., & Sugiyama, M.
    • Journal Title

      Proceedings of 37th International Conference on Machine Learning (ICML2020)

      Volume: - Pages: 5522-5532

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Accelerating the diffusion-based ensemble sampling by non-reversible dynamics.2020

    • Author(s)
      Futami, F., Sato, I., & Sugiyama, M.
    • Journal Title

      Proceedings of 37th International Conference on Machine Learning (ICML2020)

      Volume: - Pages: 3337-3347

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Analysis and design of Thompson sampling for stochastic partial monitoring.2020

    • Author(s)
      Tsuchiya, T., Honda, J., & Sugiyama, M.
    • Journal Title

      Advances in Neural Information Processing Systems 33

      Volume: - Pages: 8861-8871

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Active deep Q-learning with demonstration2020

    • Author(s)
      Chen, S.-A., Tangkaratt, V., Lin, H.-T., & Sugiyama, M.
    • Journal Title

      Machine Learning, to appear

      Volume: -

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Good arm identification via bandit feedback2019

    • Author(s)
      Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
    • Journal Title

      Machine Learning

      Volume: 108 Pages: 721-745

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Hierarchical reinforcement learning via advantage-weighted information maximization2019

    • Author(s)
      Osa, T., Tangkaratt, V., & Sugiyama, M.
    • Journal Title

      Proceedings of Seventh International Conference on Learning Representations (ICLR2019)

      Volume: -

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Imitation learning from imperfect demonstration2019

    • Author(s)
      Wu, Y.-H., Charoenphakdee, N., Bao, H., Tangkaratt, V., & Sugiyama, M.
    • Journal Title

      Proceedings of 36th International Conference on Machine Learning (ICML2019)

      Volume: - Pages: 6818-6827

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Good arm identification via bandit feedback.2019

    • Author(s)
      Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
    • Journal Title

      Machine Learning

      Volume: -

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed
  • [Presentation] Bayesian posterior approximation via greedy particle optimization.2019

    • Author(s)
      Futami, F., Cui, Z., Sato, I., & Sugiyama, M.
    • Organizer
      AAAI Conference on Artificial Intelligence (AAAI2019)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Dueling bandits with qualitative feedback.2019

    • Author(s)
      Xu, L., Honda, J., & Sugiyama, M.
    • Organizer
      AAAI Conference on Artificial Intelligence (AAAI2019)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Fully adaptive algorithm for pure exploration in linear bandits.2018

    • Author(s)
      Xu, L., Honda, J., & Sugiyama, M.
    • Organizer
      International Conference on Artificial Intelligence and Statistics (AISTATS2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling.2018

    • Author(s)
      Ding, H., Khan, M. E., Sato, I., & Sugiyama, M.
    • Organizer
      International Conference on Artificial Intelligence and Statistics (AISTATS2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Guide actor-critic for continuous control.2018

    • Author(s)
      Tangkaratt, V., Abdolmaleki, A., & Sugiyama, M.
    • Organizer
      International Conference on Learning Representations (ICLR2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Analysis of minimax error rate for crowdsourcing and its application to worker clustering model.2018

    • Author(s)
      Imamura, H., Sato, I., & Sugiyama, M.
    • Organizer
      International Conference on Machine Learning (ICML2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Variational inference for Gaussian process with panel count data.2018

    • Author(s)
      Ding, H., Lee, Y., Sato, I., & Sugiyama, M.
    • Organizer
      Conference on Uncertainty in Artificial Intelligence (UAI2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Continuous-time value function approximation in reproducing kernel Hilbert spaces.2018

    • Author(s)
      Ohnishi, M., Yukawa, M., Johansson, M., & Sugiyama, M.
    • Organizer
      Neural Information Processing Systems (NeurIPS2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks.2018

    • Author(s)
      Tsuzuku, Y., Sato, I., & Sugiyama, M.
    • Organizer
      Neural Information Processing Systems (NeurIPS2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Fully adaptive algorithm for pure exploration in linear bandits2018

    • Author(s)
      Xu, L., Honda, J., & Sugiyama, M.
    • Organizer
      International Conference on Artificial Intelligence and Statistics (AISTATS2018)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Variational inference based on robust divergences2018

    • Author(s)
      Futami, F., Sato, I., & Sugiyama, M.
    • Organizer
      International Conference on Artificial Intelligence and Statistics (AISTATS2018)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Bayesian nonparametric Poisson-process allocation for time-sequence modeling2018

    • Author(s)
      Ding, H., Khan, M. E., Sato, I., & Sugiyama, M.
    • Organizer
      International Conference on Artificial Intelligence and Statistics (AISTATS2018)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Hierarchical policy search via return-weighted density estimation2018

    • Author(s)
      Osa, T. & Sugiyama, M.
    • Organizer
      AAAI Conference on Artificial Intelligence (AAAI2018)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Guide actor-critic for continuous control2018

    • Author(s)
      Tangkaratt, V., Abdolmaleki, A., & Sugiyama, M.
    • Organizer
      International Conference on Learning Representations (ICLR2018)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Good arm identification from bandit feedback2017

    • Author(s)
      Kano, H., Honda, J., Sakamaki, K., Matsuura, K., Nakamura, A., & Sugiyama, M.
    • Organizer
      2017 Workshop on Information-Based Induction Sciences (IBIS2017)
    • Related Report
      2017 Annual Research Report
  • [Presentation] Expectation propagation for t-exponential family using q-algebra2017

    • Author(s)
      Futami, F., Sato, I., & Sugiyama, M.
    • Organizer
      Neural Information Processing Systems (NIPS2017)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Book] Machine Learning from Weak Supervision: An Empirical Risk Minimization Approach2022

    • Author(s)
      Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai, and Gang Niu
    • Publisher
      The MIT Press
    • Related Report
      2021 Annual Research Report
  • [Book] An Algorithmic Perspective on Imitation Learning2018

    • Author(s)
      Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel and Jan Peters
    • Total Pages
      179
    • Publisher
      Foundations and Trends in Robotics
    • Related Report
      2017 Annual Research Report
  • [Remarks] 論文リスト

    • URL

      http://www.ms.k.u-tokyo.ac.jp/sugi/publications.html

    • Related Report
      2021 Annual Research Report 2020 Annual Research Report 2019 Annual Research Report
  • [Remarks] 杉山将のウェブページ

    • URL

      http://www.ms.k.u-tokyo.ac.jp/sugi/index-jp.html

    • Related Report
      2018 Annual Research Report
  • [Remarks] Publications

    • URL

      http://www.ms.k.u-tokyo.ac.jp/sugi/publications.html

    • Related Report
      2017 Annual Research Report
  • [Funded Workshop] Tokyo Deep Learning Workshop (TDLW2018)2018

    • Related Report
      2017 Annual Research Report

URL: 

Published: 2017-04-28   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi