• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Integration of Kullback-Leibler control and intrinsic rewards for reinforcement learning

Research Project

Project/Area Number 16K12504
Research Category

Grant-in-Aid for Challenging Exploratory Research

Allocation TypeMulti-year Fund
Research Field Intelligent robotics
Research InstitutionAdvanced Telecommunications Research Institute International

Principal Investigator

UCHIBE Eiji  株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)

Project Period (FY) 2016-04-01 – 2019-03-31
Project Status Completed (Fiscal Year 2018)
Budget Amount *help
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2018: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2017: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2016: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords強化学習 / EMアルゴリズム / ロボット学習 / スマートフォンロボット / 逆強化学習 / 進化計算 / 進化的計算 / 重点サンプリング / マルチエージェント強化学習 / 知能ロボティックス / 機械学習 / KL制御 / 人工知能
Outline of Final Research Achievements

We have developed sample-efficient reinforcement learning algorithms: EM-based Policy Hyperparameter Exploration (EPHE) with adaptive baseline and Adaptive Multiple Importance Sampling (AMIS) for Policy Search. EPHE optimizes deterministic policies by EM algorithm and it was implemented in a wheeled inverted pendulum mobile robot. Experimental results showed that EPHE outperformed standard policy search methods. AMIS reduces the variance of the estimator based on multiple importance sampling when policy search algorithms tries to reuse samples that are collected in previous iteration steps. AMIS is evaluated with several policy search methods such as EPHE, REINFORCE, REPS, CMA-ES, and NES and experimental results showed that AMIS improved sample efficiency for all the algorithms. Besides we developed experimental platform based on smartphone and some basic behaviors such as battery foraging and mating based on visual communication are implemented by reinforcement learning.

Academic Significance and Societal Importance of the Research Achievements

学術的意義はロボットの制御器の学習に適した強化学習アルゴリズムを開発したことである。通常の強化学習アルゴリズムでは確率的な制御則を用いることが多いが、生成行動系列は滑らかではないためロボットの制御には適さない。EPHEは決定論的な制御則を学習するため滑らかな行動系列を生成でき、スマートフォンロボットのような高精度のアクチュエータを持たないシステムにも適用できる。
社会的意義はデータの高効率性を実現したことである。現実的な問題設定では学習に利用できるデータは限られており、様々なアルゴリズムと組み合わせて使用可能なAMISは強化学習アルゴリズムを実問題に適用する際に重要な構成要素となると期待できる。

Report

(4 results)
  • 2018 Annual Research Report   Final Research Report ( PDF )
  • 2017 Research-status Report
  • 2016 Research-status Report
  • Research Products

    (14 results)

All 2019 2018 2017 2016 Other

All Journal Article (4 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 3 results,  Open Access: 3 results) Presentation (8 results) (of which Int'l Joint Research: 5 results,  Invited: 2 results) Remarks (2 results)

  • [Journal Article] Cooperative and Competitive Reinforcement and Imitation Learning for a Mixture of Heterogeneous Learning Modules2018

    • Author(s)
      Eiji Uchibe
    • Journal Title

      Frontiers in Neurorobotics

      Volume: 12

    • DOI

      10.3389/fnbot.2018.00061

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Robustness of linearly solvable Markov games employing inaccurate dynamics model2018

    • Author(s)
      Ken Kinjo, Eiji Uchibe, and Kenji Doya
    • Journal Title

      Artificial Life and Robotics

      Volume: 23 Issue: 1 Pages: 1-9

    • DOI

      10.1007/s10015-017-0401-2

    • Related Report
      2017 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Deterministic Policy Search Method for Real Robot Control2017

    • Author(s)
      内部 英治, 王 潔心
    • Journal Title

      The Brain & Neural Networks

      Volume: 24 Issue: 4 Pages: 195-203

    • DOI

      10.3902/jnns.24.195

    • NAID

      130006337689

    • ISSN
      1340-766X, 1883-0455
    • Related Report
      2017 Research-status Report
  • [Journal Article] Adaptive Baseline Enhances EM-based Policy Search: Validation in a View-based Positioning Task of a Smartphone Balancer2017

    • Author(s)
      Jiexin Wang, Eiji Uchibe, Kenji Doya
    • Journal Title

      Frontiers in Neurorobotics

      Volume: 11 Pages: 1-15

    • DOI

      10.3389/fnbot.2017.00001

    • NAID

      120005980916

    • Related Report
      2016 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Imitation learning under entropy regularization2019

    • Author(s)
      Eiji Uchibe
    • Organizer
      Workshop on Reinforcement Learning & Biological Intelligence
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Cooperative and competitive reinforcement and imitation learning2018

    • Author(s)
      Eiji Uchibe
    • Organizer
      The 8th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Deep reinforcement learning by parallelizing reward and punishment using MaxPain architecture2018

    • Author(s)
      Jiexin Wang, Stefan Elfwing, and Eiji Uchibe
    • Organizer
      The 8th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Efficient sample reuse in policy search by multiple importance sampling2018

    • Author(s)
      Eiji Uchibe
    • Organizer
      Genetic and Evolutionary Computation Conference
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 方策探査法のための多重重点サンプリングを用いた経験再利用2018

    • Author(s)
      内部英治
    • Organizer
      ロボティクス・メカトロニクス講演会
    • Related Report
      2018 Annual Research Report
  • [Presentation] EM-based policy search for learning foraging and mating behaviors2018

    • Author(s)
      Jiexin Wang and Eiji Uchibe
    • Organizer
      ロボティクス・メカトロニクス講演会
    • Related Report
      2018 Annual Research Report
  • [Presentation] Forward and inverse reinforcement learning and generative adversarial formulation2018

    • Author(s)
      Eiji Uchibe
    • Organizer
      NC/IBISML/IPSJ-MPS/IPSJ-BIO合同研究会
    • Related Report
      2018 Annual Research Report
    • Invited
  • [Presentation] Emergence of communication among reinforcement learning agents under coordination environment2016

    • Author(s)
      Qiong Huang, Eiji Uchibe, and Kenji Doya
    • Organizer
      6th Joint IEEE International Conference on Developmental Learning and Epigenetic Robotics
    • Place of Presentation
      Cergy-Pontoise / Paris
    • Year and Date
      2016-09-19
    • Related Report
      2016 Research-status Report
    • Int'l Joint Research
  • [Remarks]

    • URL

      https://arxiv.org/abs/1702.03118

    • Related Report
      2016 Research-status Report
  • [Remarks]

    • URL

      https://arxiv.org/abs/1702.07490

    • Related Report
      2016 Research-status Report

URL: 

Published: 2016-04-21   Modified: 2020-03-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi