2021 Fiscal Year Final Research Report

Theory and Application of Statistical Reinforcement Learning

Research Project

PDF

Project/Area Number	17H00757
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	Sugiyama Masashi 東京大学, 大学院新領域創成科学研究科, 教授 (90334515)
Project Period (FY)	2017-04-01 – 2022-03-31
Keywords	強化学習 / 機械学習 / 多腕バンディット問題 / 模倣学習 / ベイズ推論 / ロバスト性
Outline of Final Research Achievements	In this research, we developed theories and algorithms for sqeuential decision making and probabilistic inference. In the study of reinforcement learning, we developed methods for weakly supervised imitation learning and hierarchization of complex problems to improve their practicality, and demonstrated their effectiveness experimentally. For multi-arm bandit problems, we developed algorithms with theoretical guarantees for linear bandit, dueling bandit, good-arm identification, and combinatorial bandit. In the area of probabilistic inference, we have conducted research on making Bayesian inference robust, speeding up approximate computation, and modeling temporal events, and have verified the effectiveness of these methods both theoretically and experimentally.
Free Research Field	知能情報学
Academic Significance and Societal Importance of the Research Achievements	逐次的意思決定や確率的推論は，今後の発展が大いに期待される重要な機械学習技術である．本研究では，強化学習や多腕バンディットの適用範囲を拡大する新しいアルゴリズムを開発するとともに，確率的推論のロバスト性向上や近似計算の高速化に関する研究を行った．このような基礎理論的な研究成果は，逐次的意思決定や確率的推論の原理の解明に貢献するものであり，機械学習分野の主要国際会議で学術的に高い評価を受けた．また，開発したアルゴリズムの有効性は計算機実験によって示されており，将来の社会実装につながる社会的意義のある開発であるとも考えられる．