2014 Fiscal Year Final Research Report

Information theoretic optimization of intrinsic rewards for reinforcement learning

Research Project

Project/Area Number	24500249
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Okinawa Institute of Science and Technology Graduate University
Principal Investigator	UCHIBE Eiji 沖縄科学技術大学院大学, 神経計算ユニット, グループリーダー (20426571)
Project Period (FY)	2012-04-01 – 2015-03-31
Keywords	強化学習 / 逆強化学習
Outline of Final Research Achievements	This study investigates novel inverse reinforcement learning methods based on density ratio estimation. To derive the algorithm, we exloit constraints on reward by KL-divergence. We show that the logarithm of the ratio between the optimal policy and the baseline policy is represented by the state-dependent reward and the value function. Our method is data-efficient because those functions can be estimated from a set of state transitions while most of previous methods require a set of trajectories. In addition, we do not need to compute the integral such as evaluation of the partition function. The proposed method is applied into a real-robot navigation task and experimental results show its superiority over conventional methods. In particular, we show that the estimated reward and value functions are useful when forward reinforcement learning is performed with the theory of shaping reward.
Free Research Field	知能ロボティクス