Analysis of reward appraisal evolution processes of reinforcement learning agents in a multiagent environment
Project/Area Number |
16K00302
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Nagoya Institute of Technology |
Principal Investigator |
Moriyama Koichi 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10361776)
|
Project Period (FY) |
2016-04-01 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Fiscal Year 2018: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
|
Keywords | 知的エージェント / 強化学習 / 報酬設計 / 進化 / マルチエージェントシステム / ゲーム理論 / 報酬形成 / 進化計算 / 人工知能 / 機械学習 |
Outline of Final Research Achievements |
This research targets the emergence of social behaviors, e.g., cooperation, of reinforcement learning agents in an environment where multiple agents exist. Such social behaviors may emerge if every agent has a different purpose due to learning its behaviors not only from comparable objective evaluation but from its own appraisal. Based on the above discussion, this work investigated how the appraisal system of each agent evolved from the objective evaluation and what society would appear, by computer simulation and mathematical analyses. In a dilemma situation where agents get less payoff by individually rational deception than that by cooperation, we found that the appraisal system evolved to the direction of facilitating cooperation. We also analyzed the direction of the evolution.
|
Academic Significance and Societal Importance of the Research Achievements |
強化学習の実現には,状態・行動・報酬の設計が必要である.しかし,複数のエージェントが存在する開いた環境における報酬の設計は非常に困難である.一方で,我々人間は,価値観に基づく主観的な評価(うれしい,恥ずかしいなど)から,複数の人間が存在する開いた社会で適切な振る舞いを学習することができている.本研究は,エージェントの「価値観」の発生・進化を考えることで,開いた環境における報酬の設計を自動化する試みである.同時に,エージェントの「価値観」の形成過程から,人間の価値観などの非合理的側面の存在理由を考える研究でもある.
|
Report
(4 results)
Research Products
(7 results)