2018 Fiscal Year Annual Research Report

Integration of Kullback-Leibler control and intrinsic rewards for reinforcement learning

Research Project

Project/Area Number	16K12504
Research Institution	Advanced Telecommunications Research Institute International
Principal Investigator	内部英治株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	強化学習 / 進化的計算 / スマートフォンロボット / 重点サンプリング
Outline of Annual Research Achievements	本研究は環境探査のための強化学習アルゴリズムを開発することを目的としている。本年度は以下の成果を得た。 (1) 適応的多重重点サンプリングによる経験の再利用: 方策探査法は多くのアルゴリズムが方策オン型であり過去の経験を再利用するためには重点サンプリングを用いた補正が必要になるが、単純な重点サンプリングの使用は学習を不安定化させる．そこで重点サンプリングによる推定値の分散を最小にするように過去のデータ収集分布の結合重みを修正する適応的重点サンプリング法を開発し、5種類の方策探査法に適用しデータ効率が改善できることを示した。 (2) 環境探査のための報酬の符号の分離: 報酬値を符号に応じて分離する強化学習法MaxPainを深層強化学習化したDeep MaxPainを開発した．Deep MaxPainは最下層の畳み込み層は共有するが価値関数を学習する全結合層は独立している。そのため正の報酬から学習するための経験と負の報酬から学習するための経験を個別に保存し、学習時に同じ重みで混合することで学習が安定になり、ニューラルネットワークを用いた関数近似とMaxPainを統合することに成功した。 (3) スマートフォンロボットを用いた自律分散協調ロボットシステムの開発: 強化学習のメタパラメータの影響を調査するためには、メタパラメータの値の異なる学習システムを並列に実行し学習過程を比較する方法が有効である。また、実ロボットにおける学習のサンプル効率を改善するために、複数学習システムのためのアルゴリズムの開発が重要で、検証のためにロボット実験システムを改良した。本年度は実ロボット上で外部バッテリからの充電行動、および交配行動実現のためのロボット間での視覚情報を通した情報交換の行動を方策探査法によって実現した。

Research Products
(8 results)

All 2019 2018

All Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results, Open Access: 1 results) Presentation (7 results) (of which Int'l Joint Research: 4 results, Invited: 2 results)

[Journal Article] Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules2018
- Author(s)
  Eiji Uchibe
- Journal Title
  
  Frontiers in Neurorobotics
  
  Volume: 12
- DOI
  10.3389/fnbot.2018.00061
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Imitation learning under entropy regularization2019
- Author(s)
  Eiji Uchibe
- Organizer
  Workshop on Reinforcement Learning & Biological Intelligence
- Int'l Joint Research / Invited
[Presentation] Cooperative and competitive reinforcement and imitation learning2018
- Author(s)
  Eiji Uchibe
- Organizer
  The 8th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics
- Int'l Joint Research
[Presentation] Deep reinforcement learning by parallelizing reward and punishment using MaxPain architecture2018
- Author(s)
  Jiexin Wang, Stefan Elfwing, and Eiji Uchibe
- Organizer
  The 8th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics
- Int'l Joint Research
[Presentation] Efficient sample reuse in policy search by multiple importance sampling2018
- Author(s)
  Eiji Uchibe
- Organizer
  Genetic and Evolutionary Computation Conference
- Int'l Joint Research
[Presentation] 方策探査法のための多重重点サンプリングを用いた経験再利用2018
- Author(s)
  内部英治
- Organizer
  ロボティクス・メカトロニクス講演会
[Presentation] EM-based policy search for learning foraging and mating behaviors2018
- Author(s)
  Jiexin Wang and Eiji Uchibe
- Organizer
  ロボティクス・メカトロニクス講演会
[Presentation] Forward and inverse reinforcement learning and generative adversarial formulation2018
- Author(s)
  Eiji Uchibe
- Organizer
  NC/IBISML/IPSJ-MPS/IPSJ-BIO合同研究会
- Invited

2018 Fiscal Year Annual Research Report

Integration of Kullback-Leibler control and intrinsic rewards for reinforcement learning

Principal Investigator

内部 英治 株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)

Research Products

[Journal Article] Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules2018

Author(s)

Journal Title

DOI

[Presentation] Imitation learning under entropy regularization2019

Author(s)

Organizer

[Presentation] Cooperative and competitive reinforcement and imitation learning2018

Author(s)

Organizer

[Presentation] Deep reinforcement learning by parallelizing reward and punishment using MaxPain architecture2018

Author(s)

Organizer

[Presentation] Efficient sample reuse in policy search by multiple importance sampling2018

Author(s)

Organizer

[Presentation] 方策探査法のための多重重点サンプリングを用いた経験再利用2018

Author(s)

Organizer

[Presentation] EM-based policy search for learning foraging and mating behaviors2018

Author(s)

Organizer

[Presentation] Forward and inverse reinforcement learning and generative adversarial formulation2018

Author(s)

Organizer

内部英治株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)