Analysis of reward appraisal evolution processes of reinforcement learning agents in a multiagent environment

Research Project

Project/Area Number	16K00302
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Intelligent informatics
Research Institution	Nagoya Institute of Technology
Principal Investigator	Moriyama Koichi 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10361776)
Project Period (FY)	2016-04-01 – 2019-03-31
Project Status	Completed (Fiscal Year 2018)
Budget Amount *help	¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2018: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords	知的エージェント / 強化学習 / 報酬設計 / 進化 / マルチエージェントシステム / ゲーム理論 / 報酬形成 / 進化計算 / 人工知能 / 機械学習
Outline of Final Research Achievements	This research targets the emergence of social behaviors, e.g., cooperation, of reinforcement learning agents in an environment where multiple agents exist. Such social behaviors may emerge if every agent has a different purpose due to learning its behaviors not only from comparable objective evaluation but from its own appraisal. Based on the above discussion, this work investigated how the appraisal system of each agent evolved from the objective evaluation and what society would appear, by computer simulation and mathematical analyses. In a dilemma situation where agents get less payoff by individually rational deception than that by cooperation, we found that the appraisal system evolved to the direction of facilitating cooperation. We also analyzed the direction of the evolution.
Academic Significance and Societal Importance of the Research Achievements	強化学習の実現には，状態・行動・報酬の設計が必要である．しかし，複数のエージェントが存在する開いた環境における報酬の設計は非常に困難である．一方で，我々人間は，価値観に基づく主観的な評価（うれしい，恥ずかしいなど）から，複数の人間が存在する開いた社会で適切な振る舞いを学習することができている．本研究は，エージェントの「価値観」の発生・進化を考えることで，開いた環境における報酬の設計を自動化する試みである．同時に，エージェントの「価値観」の形成過程から，人間の価値観などの非合理的側面の存在理由を考える研究でもある．

Report

(4 results)

2018 Annual Research Report Final Research Report ( PDF )
2017 Research-status Report
2016 Research-status Report

Research Products
(7 results)

All 2018 2017

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (4 results)

[Journal Article] Evolution Direction of Reward Appraisal in Reinforcement Learning Agents2018
- Author(s)
  Masaya Miyawaki, Koichi Moriyama, Atsuko Mutoh, Tohgoroh Matsui, and Nobuhiro Inuzuka
- Journal Title
  
  Proceedings of the 12th KES International Conference on Agent and Multi-agent Systems: Technologies and Applications
  
  Volume: - Pages: 13-22
- DOI
  10.1007/978-3-319-92031-3_2
- ISBN
  9783319920306, 9783319920313
- Related Report
  2018 Annual Research Report
- Peer Reviewed
[Journal Article] Accelerating Deep Q Network by Weighting Experiences2018
- Author(s)
  Kazuhiro Murakami, Koichi Moriyama, Atsuko Mutoh, Tohgoroh Matsui, and
- Journal Title
  
  Proceedings of the 25th International Conference on Neural Information
  
  Volume: - Pages: 204-213
- DOI
  10.1007/978-3-030-04167-0_19
- NAID
  130007423935
- ISBN
  9783030041663, 9783030041670
- Related Report
  2018 Annual Research Report
- Peer Reviewed
[Journal Article] The Resilience of Cooperation in a Dilemma Game Played by Reinforcement Learning Agents2017
- Author(s)
  Koichi Moriyama, Kaori Nakase, Atsuko Mutoh, and Nobuhiro Inuzuka
- Journal Title
  
  Proceedings of the 2nd IEEE International Conference on Agents
  
  Volume: - Pages: 33-39
- DOI
  10.1109/agents.2017.8015297
- Related Report
  2017 Research-status Report
- Peer Reviewed
[Presentation] GPGPUを用いた強化学習エージェントの並列進化シミュレーション2018
- Author(s)
  千賀喜貴, 森山甲一, 武藤敦子, 松井藤五郎, 犬塚信博
- Organizer
  人工知能学会全国大会（第32回）
- Related Report
  2018 Annual Research Report
[Presentation] 経験データ重み付けによるDeep Q Networkの高速化2018
- Author(s)
  村上知優, 森山甲一, 武藤敦子, 松井藤五郎, 犬塚信博
- Organizer
  人工知能学会全国大会（第32回）
- Related Report
  2018 Annual Research Report
[Presentation] GPGPUを用いた2人ゲームにおける強化学習の高速化2017
- Author(s)
  黒木是冶，森山甲一，武藤敦子，犬塚信博
- Organizer
  情報処理学会第79回全国大会
- Place of Presentation
  名古屋大学（名古屋市）
- Year and Date
  2017-03-16
- Related Report
  2016 Research-status Report
[Presentation] マルチエージェント強化学習における主観的効用の進化過程に関する分析2017
- Author(s)
  宮脇昌哉, 森山甲一, 武藤敦子, 松井藤五郎, 犬塚信博
- Organizer
  人工知能学会全国大会（第31回）
- Related Report
  2017 Research-status Report

Analysis of reward appraisal evolution processes of reinforcement learning agents in a multiagent environment

Principal Investigator

Moriyama Koichi 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10361776)

¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)

Report

Research Products

[Journal Article] Evolution Direction of Reward Appraisal in Reinforcement Learning Agents2018

Author(s)

Journal Title

DOI

ISBN

Related Report

[Journal Article] Accelerating Deep Q Network by Weighting Experiences2018

Author(s)

Journal Title

DOI

NAID

ISBN

Related Report

[Journal Article] The Resilience of Cooperation in a Dilemma Game Played by Reinforcement Learning Agents2017

Author(s)

Journal Title

DOI

Related Report

[Presentation] GPGPUを用いた強化学習エージェントの並列進化シミュレーション2018

Author(s)

Organizer

Related Report

[Presentation] 経験データ重み付けによるDeep Q Networkの高速化2018

Author(s)

Organizer

Related Report

[Presentation] GPGPUを用いた2人ゲームにおける強化学習の高速化2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] マルチエージェント強化学習における主観的効用の進化過程に関する分析2017

Author(s)

Organizer

Related Report