2014 Fiscal Year Final Research Report
Information theoretic optimization of intrinsic rewards for reinforcement learning
Project/Area Number |
24500249
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Okinawa Institute of Science and Technology Graduate University |
Principal Investigator |
UCHIBE Eiji 沖縄科学技術大学院大学, 神経計算ユニット, グループリーダー (20426571)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Keywords | 強化学習 / 逆強化学習 |
Outline of Final Research Achievements |
This study investigates novel inverse reinforcement learning methods based on density ratio estimation. To derive the algorithm, we exloit constraints on reward by KL-divergence. We show that the logarithm of the ratio between the optimal policy and the baseline policy is represented by the state-dependent reward and the value function. Our method is data-efficient because those functions can be estimated from a set of state transitions while most of previous methods require a set of trajectories. In addition, we do not need to compute the integral such as evaluation of the partition function. The proposed method is applied into a real-robot navigation task and experimental results show its superiority over conventional methods. In particular, we show that the estimated reward and value functions are useful when forward reinforcement learning is performed with the theory of shaping reward.
|
Free Research Field |
知能ロボティクス
|