Project/Area Number |
07680376
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | HOKKAIDO UNIVERSITY |
Principal Investigator |
MIKAMI Sadayoshi Hokkaido Univ., Fac.Eng., Assoc.Prof., 工学部, 助教授 (50229655)
|
Co-Investigator(Kenkyū-buntansha) |
SUZUKI Keiji Hokkaido Univ., Fac.Eng., Assoc.Prof., 工学部, 助教授 (10250482)
KAKAZU Yukinori Hokkaido Univ., Fac.Eng., Prof., 工学部, 教授 (60042090)
|
Project Period (FY) |
1995 – 1996
|
Project Status |
Completed (Fiscal Year 1996)
|
Budget Amount *help |
¥1,300,000 (Direct Cost: ¥1,300,000)
Fiscal Year 1996: ¥400,000 (Direct Cost: ¥400,000)
Fiscal Year 1995: ¥900,000 (Direct Cost: ¥900,000)
|
Keywords | Reinforcement Learning / Multi-Agents / Co-operation / Competition / Task-Decomposition / 人工生命 / 遺伝的アルゴリズム / 協調 / 機械学習 / 複数戦略 / 異種エージェント |
Research Abstract |
The objective of the research is to propose a reactive planning agent that can adopt to the complex environment which usually is generated by the combination of hidden multiple dynamical factors. To this end, we have developed multi-strategic approach by independent learning agents that acquire complemental reactive strategies and then co-operate with each other to achieve global objective. The results obtained from the term of the project is summarised as follows : 1. We have considered an architecture that consists of single agent containing multiple sub-agents, which have portion of input state vector and have independent performance criteria and learning mechanism. For the automatic task decomposition and arbitration of sub-agents, we have introduced a usage function for each sub-agent, which is specified by the success rate and the results of recent observations. It is shown that the usage function can be used to generate the discrimination pressure. Then by using the output of usage function as a teaching signal and the history of recent sub-agent allocation as a state, a learning arbitration module can be introduced. 2. We have shown another approach by the notion of aggregate problem solving, which involves the automatic task decomposition to agents. Providing the pressure for the decomposition of knowledge, we have introduced interaction between agents through the mutual exchange of payoff function, because it is an only information that guarantees to be used in any reinforcement learning algorithm. A payoff filtering function is proposed to this end. It is shown that the function form of the filtering function should be set either competitive or co-operative according to the objective function of the whole agents. This report shows practical implementation of the method to a deadlock avoidance problem of mobile robots. The results show the effect and the promising property of the algorithm.
|