ITO Takuya THE UNIVERSITY OF TOKUSHIMA, FACULTY OF ENGINEERING, RESEARCH ASSOCIATE, 工学部, 助手 (50314844)
ONO Isao THE UNIVERSITY OF TOKUSHIMA, FACULTY OF ENGINEERING, LECTURER, 工学部, 講師 (00304551)
|Budget Amount *help
¥1,000,000 (Direct Cost: ¥1,000,000)
Fiscal Year 1999: ¥1,000,000 (Direct Cost: ¥1,000,000)
Several attempts have been reported to let multiple monolithic reinforcement learning (RL) agents synthesize highly coordinated behavior needed to accomplish their common goal effectively. Most of these straightforward application of RL scale poorly to more complex multi-agent (MA) learning problems, because the state space for each RL agent grows exponentially with the number of its partner agents engaged in the joint task. To remedy the exponentially large state space in multi-agent RL (MARL), we previously proposed a modular approach and demonstrated its effectiveness through the application to the MA learning problems.
The results obtained by modular approach to MARL are encouraging, but it still has a serious problem. The performance of modular RL agents strongly depends on their modular structures, and hence we have to design appropriate structures for the agents. However, it is extremely difficult for us to identify such structures in a top-down manner, because we are not able to
correctly predict the performance of a given MA systems, which consists of multiple modular RL agents and accordingly is of substantially complexity with respect to both its structure and its functionality. This means that we have to identify appropriate modular structures for the agents by trial and error. To overcome this problem, we have to establish a framework for automatically synthesizing appropriate modular structures for the agents.
We suppose that a collection of multiple homogeneous modular RL agents are engaged in a joint task, aimed at the accomplishment of their common goal, and they have the same modular structure in common. We proposed a framework for identifying an appropriate modular structure for the agents, which begins with a randomly generated structure, and attempts to incrementally improve it. A modular structure is represented by a set of a variable number of learning modules, and is evaluated based on the performance of those RL agents employing the structure. The modular structure is improved using a kind of hill-climbing scheme. A set of simple operators is devised, each generating a neighborhood of the current structure.
To show the effectiveness of the proposed framework, we applied it to a multi-agent learning problem, called the Simulated Dodgeball Game-II and attempted to identify an appropriate modular structure for the attacker agents, each implemented by an independent but homogeneous modular RL architecture. A modular structure is evaluated based on the performance of those attacker agents employing the structure. The results are quite encouraging. Using this framework, for example, we always identified a modular structure which substantially outperforms those manually designed by a human expert. Less