Parallel deep reinforcement learning

Publicly Offered Research

Project Area	Correspondence and Fusion of Artificial Intelligence and Brain Science
Project/Area Number	17H06042
Research Category	Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)
Allocation Type	Single-year Grants
Review Section	Complex systems
Research Institution	Advanced Telecommunications Research Institute International
Principal Investigator	内部英治株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)
Project Period (FY)	2017-04-01 – 2019-03-31
Project Status	Completed (Fiscal Year 2018)
Budget Amount *help	¥12,220,000 (Direct Cost: ¥9,400,000、Indirect Cost: ¥2,820,000) Fiscal Year 2018: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000) Fiscal Year 2017: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Keywords	強化学習 / 深層学習 / 並列学習 / 重点サンプリング / 模倣学習 / 逆強化学習 / 機械学習
Outline of Annual Research Achievements	本研究では深層強化学習の学習効率を改善するための並列学習法を開発することを目的としている。本年度は以下の成果を得た。 (1) 多重重点サンプリングと自己模倣を用いた並列強化学習：適切なネットワーク構造と学習アルゴリズムを選択するためには、従来実験者が試行錯誤的に予備実験を繰り返す必要があった。開発した学習法CRAILは複数の強化学習モジュールを同時並列的かつ協調・競合的に学習させることで、自動的に性能の良い組み合わせを状況に応じて選択する。強化学習モジュール間での収集データを共用するためには前年度に開発した多重重点サンプリングを用いた。また合成方策との模倣学習を導入することで、現在の学習モジュールの方策が合成方策のモジュールと著しく異なっている場合にも学習が行われるようにした。これにより、以前に開発した並列強化学習法CLISと比較し学習効率を大幅に改善することができた。またCRAILはロボットの物理パラメータの変化にも動的に学習モジュールを切り替えることで対処できることを示した。 (2) 報酬の符号に応じた並列学習法の提案：ベルマン最適方程式を用いた強化学習では、状態遷移後の価値を評価するためにmax演算子を用いるため、報酬値が負の場合にうまく出伝播されず将来受け取る大きなリスクを回避することができないという問題があった。そこでMaxPainと呼ばれる並列学習法を開発したが、ニューラルネットワークを用いない単純な問題に対してのみ適用していた。そこでMaxPainを深層強化学習化したDeep MaxPainを開発し、ロボットナビゲーションなどの大規模な問題に適用した。Deep MaxPainでは二つのネットワークの合成法として価値関数の合成と方策の合成の二つを検討した。Deep MaxPainはHRAなど従来法よりもリスクを安全に回避しつつサンプル効率を改善できた。
Research Progress Status	平成30年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	平成30年度が最終年度であるため、記入しない。

Report

(2 results)

2018 Annual Research Report
2017 Annual Research Report

Research Products
(16 results)

All 2019 2018 2017

All Journal Article (5 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 4 results, Open Access: 4 results) Presentation (11 results) (of which Int'l Joint Research: 7 results, Invited: 3 results)

[Journal Article] Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules2018
- Author(s)
  Eiji Uchibe
- Journal Title
  
  Frontiers in Neurorobotics
  
  Volume: 12
- DOI
  10.3389/fnbot.2018.00061
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression2018
- Author(s)
  Eiji Uchibe
- Journal Title
  
  Neural Processing Letters
  
  Volume: 47 Issue: 3 Pages: 891-905
- DOI
  10.1007/s11063-017-9702-7
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning2018
- Author(s)
  Elfwing S, Uchibe E, Doya K
- Journal Title
  
  Neural Networks
  
  Volume: 2017 Specail issue Pages: 30297-6
- DOI
  10.1016/j.neunet.2017.12.012
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Robustness of linearly solvable Markov games employing inaccurate dynamics model2018
- Author(s)
  Ken Kinjo, Eiji Uchibe, and Kenji Doya
- Journal Title
  
  Artificial Life and Robotics
  
  Volume: 23 Issue: 1 Pages: 1-9
- DOI
  10.1007/s10015-017-0401-2
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Deterministic Policy Search Method for Real Robot Control2017
- Author(s)
  内部英治, 王潔心
- Journal Title
  
  The Brain & Neural Networks
  
  Volume: 24 Issue: 4 Pages: 195-203
- DOI
  10.3902/jnns.24.195
- NAID
  130006337689
- ISSN
  1340-766X, 1883-0455
- Related Report
  2017 Annual Research Report
[Presentation] Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning2019
- Author(s)
  Tadashi Kozuno, Eiji Uchibe, and Kenji Doya
- Organizer
  The 22nd International Conference on Artificial Intelligence and Statistics
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Imitation learning under entropy regularization2019
- Author(s)
  Eiji Uchibe
- Organizer
  Workshop on Reinforcement Learning & Biological Intelligence
- Related Report
  2018 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Deep reinforcement learning by parallelizing reward and punishment using MaxPain architecture2018
- Author(s)
  Jiexin Wang, Stefan Elfwing, and Eiji Uchibe
- Organizer
  The 8th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Cooperative and competitive reinforcement and imitation learning2018
- Author(s)
  Eiji Uchibe
- Organizer
  The 8th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Efficient Sample Reuse in Policy Search by Multiple Importance Sampling2018
- Author(s)
  Eiji Uchibe
- Organizer
  Genetic and Evolutionary Computation Conference
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Online Meta-Learning by Parallel Algorithm Competition2018
- Author(s)
  Stefan Elfwing, Eiji Uchibe, and Kenji Doya
- Organizer
  Genetic and Evolutionary Computation Conference
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] 方策探査法のための多重重点サンプリングを用いた経験再利用2018
- Author(s)
  内部英治
- Organizer
  ロボティクス・メカトロニクス講演会
- Related Report
  2018 Annual Research Report
[Presentation] EM-based policy search for learning foraging and mating behaviors2018
- Author(s)
  Jiexin Wang and Eiji Uchibe
- Organizer
  ロボティクス・メカトロニクス講演会
- Related Report
  2018 Annual Research Report
[Presentation] Forward and inverse reinforcement learning and generative adversarial formulation2018
- Author(s)
  Eiji Uchibe
- Organizer
  NC/IBISML/IPSJ-MPS/IPSJ-BIO合同研究会
- Related Report
  2018 Annual Research Report
- Invited
[Presentation] Deep inverse reinforcement learning2017
- Author(s)
  E. Uchibe
- Organizer
  The Third International Workshop on Intrinsically Motivated Open-ended learning
- Related Report
  2017 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] ディープNNによる順・逆強化学習2017
- Author(s)
  内部英治
- Organizer
  第27回日本神経回路学会全国大会
- Related Report
  2017 Annual Research Report

Parallel deep reinforcement learning

Principal Investigator

内部 英治 株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)

¥12,220,000 (Direct Cost: ¥9,400,000、Indirect Cost: ¥2,820,000)

Report

Research Products

[Journal Article] Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Sigmoid-weighted linear units for neural network function approximation in reinforcement learning2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Robustness of linearly solvable Markov games employing inaccurate dynamics model2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Deterministic Policy Search Method for Real Robot Control2017

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning2019

Author(s)

Organizer

Related Report

[Presentation] Imitation learning under entropy regularization2019

Author(s)

Organizer

Related Report

[Presentation] Deep reinforcement learning by parallelizing reward and punishment using MaxPain architecture2018

Author(s)

Organizer

Related Report

[Presentation] Cooperative and competitive reinforcement and imitation learning2018

Author(s)

Organizer

Related Report

[Presentation] Efficient Sample Reuse in Policy Search by Multiple Importance Sampling2018

Author(s)

Organizer

Related Report

[Presentation] Online Meta-Learning by Parallel Algorithm Competition2018

Author(s)

Organizer

Related Report

[Presentation] 方策探査法のための多重重点サンプリングを用いた経験再利用2018

Author(s)

Organizer

Related Report

[Presentation] EM-based policy search for learning foraging and mating behaviors2018

Author(s)

Organizer

Related Report

[Presentation] Forward and inverse reinforcement learning and generative adversarial formulation2018

Author(s)

Organizer

Related Report

[Presentation] Deep inverse reinforcement learning2017

Author(s)

Organizer

Related Report

[Presentation] ディープNNによる順・逆強化学習2017

Author(s)

Organizer

Related Report

内部英治株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)