Budget Amount *help |
¥4,940,000 (Direct Cost: ¥3,800,000、Indirect Cost: ¥1,140,000)
Fiscal Year 2014: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2013: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2012: ¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
|
Outline of Final Research Achievements |
This study investigates novel inverse reinforcement learning methods based on density ratio estimation. To derive the algorithm, we exloit constraints on reward by KL-divergence. We show that the logarithm of the ratio between the optimal policy and the baseline policy is represented by the state-dependent reward and the value function. Our method is data-efficient because those functions can be estimated from a set of state transitions while most of previous methods require a set of trajectories. In addition, we do not need to compute the integral such as evaluation of the partition function. The proposed method is applied into a real-robot navigation task and experimental results show its superiority over conventional methods. In particular, we show that the estimated reward and value functions are useful when forward reinforcement learning is performed with the theory of shaping reward.
|