Coordination of Multiple Behaviors for Competition Robots by Vision-Based Reinforcement Learning
Grant-in-Aid for Scientific Research (B)
Intelligent mechanics/Mechanical systems
|Research Institution||Osaka University|
ASADA Minoru Osaka University, Faculty of Engineering, Professor, 工学部, 教授 (60151031)
SUZUKI Shoji Osaka University, Faculty of Engineering, Professor, 工学部, 助手 (50273587)
HOSODA Koh Osaka University, Faculty of Engineering, Professor, 工学部, 助教授 (10252610)
|Project Fiscal Year
1995 – 1996
Completed(Fiscal Year 1996)
|Budget Amount *help
¥7,500,000 (Direct Cost : ¥7,500,000)
Fiscal Year 1996 : ¥2,000,000 (Direct Cost : ¥2,000,000)
Fiscal Year 1995 : ¥5,500,000 (Direct Cost : ¥5,500,000)
|Keywords||Reinforcement Learning / Vision / Behavior Coordination / Modular Learning / Hidden States / AIC / Mobile Robots / 強化学習 / マルチロボット / 状態空間構成 / システム同定 / 赤池情報量基準 / 視覚移動ロボット / Q学習 / 多重タスク / 競合行動 / 協調行動|
Coordination of multiple behaviors independently obtained by the reinforcement learning method is one of the issues in order for the method to be scaled to larger and more complex robot learning tasks. Direct combination of all the state spaces for individual modules (subtasks) needs enormous learning time, and it causes hidden states. In this project, we propsed a method which accomplished a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by vision-based reinforcement learning in the first year, and modified the method by introducing modular learning which coordinates multiple behaviors taking account of a trade-off between learning time and performance in the second year.
The first year :
1.Individual behaviors which achieve the corresponding subtasks were independently acquired by Q-learning.
2.Three kinds of coordinations of multiple behaviors were considered ; simple summation of different action-value functions, switching action-value functions a
ccording to situations, and learning with previously obtained action-value funcions as initial values of a new action-value function.
3.A Task of shooting a ball into the goal avoiding collisions with an opponet was examined. The task can be decomposed into a ball shooting subtask and a collision avoiding subtask.
4.As a result, the learing method was the best one in shooting ratio, mean steps to the goal, and avoidance performance.
The second year :
1.In order to reduce the learing time the whole state space was classified into two categories based on the action values separately obtained by Q- learning : the area where one of the learned behaviors was directly applicable (no more learning area), and the area where learning was necessary due to the competition of multiple behaviors (re-learning area).
2.Hidden states are detected by model fitting to the learned action values based on the information criterion.
3.The initial action values in the re-learning area were adjusted so that they could be consistent with the values in the no more learning area.
4.The method was applied to one to one soccer playing robots, and the validity of the proposed method was shown by computer simulation and real robot experiments. Less
Research Output (16results)