2018 Fiscal Year Final Research Report

Apprenticeship learning for heterogeneous robots

Research Project

PDF

Project/Area Number	16K16132
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Intelligent robotics
Research Institution	Meijo University (2017-2018) Chuo University (2016)
Principal Investigator	Masuyama Gakuto 名城大学, 理工学部, 准教授 (20707088)
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	徒弟学習 / 逆強化学習
Outline of Final Research Achievements	What this project pursued is to develop algorithms that transfer reward function between heterogeneous agents. Relevant inverse reinforcement learning techniques were also studied. Representative contributions of this project are as follows: 1) Inverse reinforcement algorithm assuming that an expert and agent follows non-identical Markov decision processes, or incompatible features. To represent demonstrations of expert observed in distinct feature space, a conditional density estimation technique is leveraged, and it is shown that approximation of demonstrations in agent feature can be represented in closed-form with a specific model. 2) Non-linear score-based inverse reinforcement learning, which enables us to use arbitrary trajectories, i.e. trajectories sampled from pre-learned policy of an agent, to estimate reward function.
Free Research Field	知能ロボティクス
Academic Significance and Societal Importance of the Research Achievements	人手で目的関数を設計することなく，観測情報に基づいてロボット単体で目的関数を構成することは，ロボットの自律性向上という意味で意義があるものと考える．現在の技術で目的関数を推定するには，何らかのお手本となるデータをロボットに観測させる必要があるが，一方で観測する対象とロボットでは身体，社会から求められる要請など，多くの差異がある．そのため，単純な模倣の枠組みでは適用可能な場面が限られる．本研究課題ではこの問題を緩和する新たな知見を提示した．