2001 Fiscal Year Final Research Report Summary
Multi-agent Reinforcement Learning Based on Compressed Representation of Decision Policies
Project/Area Number |
12680387
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokushima |
Principal Investigator |
ONO Norihiko The University of Tokushima, Faculty of Engineering, Professor, 工学部, 教授 (60194594)
|
Co-Investigator(Kenkyū-buntansha) |
ITO Takuya The University of Tokushima, Faculty of Engineering, Research Associate, 工学部, 助手 (50314844)
ONO Isao The University of Tokushima, Faculty of Engineering, Associate Professor, 工学部, 助教授 (00304551)
|
Project Period (FY) |
2000 – 2001
|
Keywords | MULTI-AGENT SYSTEMS / MULTI-AGENT REINFORCEMENT LEARNING / REINFORCEMENT LEARNING / MACHINE LEARNING / EVOLUTIONARY COMPUTING / NEURAL NETWORKS / REAL-CODED GA / OPTIMIZATION |
Research Abstract |
Several attempts have been reported to let multiple monolithic reinforcement learning (RL) agents synthesize highly coordinated behavior needed to accomplish their common goal effectively. Most of these straightforward application of RL scale poorly to more complex multi-agent (MA) learning problems, because the state space for each RL agent grows exponentially with the number of its partner agents engaged in the joint task. To remedy the exponentially large state space in multi-agent RL (MARL), we previously proposed a modular approach and demonstrated its effectiveness through the application to the MA learning problems. The results obtained by modular approach to MARL are encouraging, but it still has serious problems. The approach supposes: (i) all the sensory inputs and action outputs for an agent are discrete values, and (ii) all the agents make their decisions totally synchronously at regular time intervals, while such assumption does not hold in real-world multi-agent environments in general. We propose yet another MARL framework which can overcome the state space explosion in MARL, based on neural network representation of the decision policy for an agent and its optimization with a real-coded GA, which is applicable to multi-agent domains where individual agents are allowed to receive and output discrete/continuous values and to make their decisions asynchronously. To show the effectiveness of the proposed framework for real-world MARL, we have applied it to the asynchronous multi-agent seesaw balancing problem and the dynamic channel allocation problem in cellular telephone systems. The results are quite encouraging, while those problems can not be solved appropriately using any other conventional MARL frameworks.
|
Research Products
(12 results)