2022 Fiscal Year Final Research Report

Deep Reinforcement Learning by Simultaneous Learning of Environment Models and Strategies

Research Project

PDF

Project/Area Number	20H04301
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 62040:Entertainment and game informatics-related
Research Institution	The University of Tokyo
Principal Investigator	Tsuruoka Yoshimasa 東京大学, 大学院情報理工学系研究科, 教授 (50566362)
Project Period (FY)	2020-04-01 – 2023-03-31
Keywords	強化学習 / 深層学習
Outline of Final Research Achievements	We developed a planning method that leverages multiple environment models to reduce the impact of errors, and a multi-step model that directly predicts states several steps ahead, successfully achieving efficient deep reinforcement learning. We also designed an intrinsic reward and a latent state representation based on action similarity for unsupervised reinforcement learning in partially observable environments, improving the generalization performance of reinforcement learning. Furthermore, we improved the design of rewards in roguelike games, reduced memory consumption in off-policy reinforcement learning, and realized the construction of highly interpretable strategies through the use of hierarchical reinforcement learning.
Free Research Field	強化学習、自然言語処理、ゲームAI
Academic Significance and Societal Importance of the Research Achievements	本研究成果は、モデルベース強化学習における環境モデルのより良い活用法、内発的報酬の設計、潜在状態表現の改善などを深層強化学習に導入することで、深層強化学習の性能を改善し、より効率的で汎用性の高い学習を実現することに貢献するものである。また、社会的には、本研究の成果は、ビデオゲームだけでなく、自動運転、ロボット制御、エネルギー管理など、実世界の多様なタスクに対する深層強化学習の適用可能性を高めることに貢献する可能性がある。