• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

部分観測環境下におけるモデルベース・モデルフリー強化学習の役割分担

Publicly Offered Research

Project AreaElucidation of neural computation for prediction and decision making: toward better human understanding and applications
Project/Area Number 26120727
Research Category

Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)

Allocation TypeSingle-year Grants
Review Section Complex systems
Research InstitutionAdvanced Telecommunications Research Institute International (2015)
Okinawa Institute of Science and Technology Graduate University (2014)

Principal Investigator

内部 英治  株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)

Project Period (FY) 2014-04-01 – 2016-03-31
Project Status Completed (Fiscal Year 2015)
Budget Amount *help
¥9,620,000 (Direct Cost: ¥7,400,000、Indirect Cost: ¥2,220,000)
Fiscal Year 2015: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)
Fiscal Year 2014: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)
Keywords強化学習 / 逆強化学習 / EMアルゴリズム / 線形可解マルコフ決定過程 / 密度比推定法 / 部分観測環境 / 深層学習
Outline of Annual Research Achievements

このプロジェクトでは線形可解マルコフ決定過程(LMDP)に基づいた強化学習および逆強化学習について研究した。一つはベルマン方程式が線形化されることを利用した、解の重ね合わせ原理に基づく制御則の合成を実ロボット実験によって検証した。その結果、実世界では重ね合わせは厳密には成立せず、重ね合わせによって得られた解を初期値として追加学習する手法が有効であることを示した。また逆強化学習法として、LMDPでは学習前後の状態遷移確率の比の対数が報酬と価値関数によって表現できることを示し、それに基づいた逆強化学習法を提案した。一つは密度比推定法と正則化付き最小二乗法によるもので、これは特許として出願した(PCT/JP2015/004001)。また最小二乗法を必要としないロジスティック回帰に基づく方法も特許として出願した。これらの手法は従来法OptV, MaxEnt-IRL, RelEnt-IRLよりも少ない計算コスト、少ないサンプル数で報酬関数を効率よく推定することができた。これらの成果は神経回路学会誌の解説記事としてまとめた。

また、これまで決定論的方策を学習できる勾配探査法であるPolicy Gradients with Parameter based Exploration (PGPE)とEMアルゴリズムの導入により学習率の調整の問題を回避したReward Weighted Regressionをもとに新しい学習率を必要としない方策探査法を提案し、従来法のPGPEやFinite Differenceよりも少ないサンプルで、かつ素早く制御則が獲得できることをシミュレーションで示した。この結果はArtificial Life and Roboticsに掲載された。またベースラインの導入による推定量の改善や実ロボットの実験を含めた結果を2016年6月をめどに英文誌に投稿する予定である。

Research Progress Status

27年度が最終年度であるため、記入しない。

Strategy for Future Research Activity

27年度が最終年度であるため、記入しない。

Report

(2 results)
  • 2015 Annual Research Report
  • 2014 Annual Research Report
  • Research Products

    (18 results)

All 2016 2015 2014 Other

All Journal Article (3 results) (of which Int'l Joint Research: 1 results,  Peer Reviewed: 2 results,  Open Access: 2 results,  Acknowledgement Compliant: 2 results) Presentation (11 results) (of which Int'l Joint Research: 5 results) Remarks (1 results) Patent(Industrial Property Rights) (3 results) (of which Overseas: 3 results)

  • [Journal Article] EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot2016

    • Author(s)
      Wang J, Uchibe E, Doya K
    • Journal Title

      Artificial Life and Robotics

      Volume: 21 Issue: 1 Pages: 125-131

    • DOI

      10.1007/s10015-015-0260-7

    • Related Report
      2015 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
  • [Journal Article] Forward and Inverse Reinforcement Learning Based on Linearly Solvable Markov Decision Processes2016

    • Author(s)
      内部英治
    • Journal Title

      The Brain & Neural Networks

      Volume: 23 Issue: 1 Pages: 2-13

    • DOI

      10.3902/jnns.23.2

    • NAID

      130005150459

    • ISSN
      1340-766X, 1883-0455
    • Related Report
      2015 Annual Research Report
    • Acknowledgement Compliant
  • [Journal Article] Expected energy-based restricted Boltzmann machine for classification2014

    • Author(s)
      Elfwing S.,Uchibe E., Doya K.
    • Journal Title

      Neural Networks

      Volume: 64 Pages: 29-38

    • DOI

      10.1016/j.neunet.2014.09.006

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed / Open Access
  • [Presentation] Learning of Stress Adaptive Habits with an Ensemble of Q-Learners2016

    • Author(s)
      Chris Reinke, Eiji Uchibe, and Kenji Doya
    • Organizer
      The 2nd International Workshop on Cognitive Neuroscience Robotics
    • Place of Presentation
      Osaka, Japan
    • Year and Date
      2016-02-21
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] From Neuroscience to Artificial Intelligence: Maximizing Average Reward in Episodic Reinforcement Learning Tasks with an Ensemble of Q-Learners2016

    • Author(s)
      Chris Reinke, Eiji Uchibe, and Kenji Doya
    • Organizer
      Third CiNet Conference, Neural mechanisms of decision making: Achievements and new directions
    • Place of Presentation
      Osaka, Japan
    • Year and Date
      2016-02-03
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Forward and inverse reinforcement learning for playing games2015

    • Author(s)
      Eiji Uchibe, and Kenji Doya
    • Organizer
      新学術領域研究「予測と意思決定の脳内計算機構の解明による人間理解と応用」第10回領域会議、2015年度包括脳冬のワークショップ
    • Place of Presentation
      Tokyo, Japan
    • Year and Date
      2015-12-17
    • Related Report
      2015 Annual Research Report
  • [Presentation] Maximizing the average reward in episodic reinforcement learning tasks2015

    • Author(s)
      Chris Reinke, Eiji Uchibe, and Kenji Doya
    • Organizer
      IEEE International Conference on Intelligent Informatics and Biomedical Sciences
    • Place of Presentation
      Okinawa, Japan
    • Year and Date
      2015-11-28
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Inverse reinforcement learning for behavior analysis and control2015

    • Author(s)
      Eiji Uchibe, and Kenji Doya
    • Organizer
      International Symposium on Prediction and Decision Making 2015
    • Place of Presentation
      Tokyo, Japan
    • Year and Date
      2015-10-31
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Inverse Reinforcement Learning with Density Ratio Estimation2015

    • Author(s)
      Eiji Uchibe, and Kenji Doya
    • Organizer
      The 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
    • Place of Presentation
      The University of Alberta
    • Year and Date
      2015-06-07
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Two-wheeled smartphone robot learns to stand up and balance by EM-based policy hyper parameter exploration2015

    • Author(s)
      J. Wang, E. Uchibe, and K. Doya
    • Organizer
      20th International Symposium on Artificial Life and Robotics
    • Place of Presentation
      Beppu
    • Year and Date
      2015-01-21 – 2015-01-23
    • Related Report
      2014 Annual Research Report
  • [Presentation] Inverse Reinforcement Learning Using Dynamic Policy Programming2014

    • Author(s)
      E. Uchibe and K. Doya
    • Organizer
      4th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics
    • Place of Presentation
      Genoa
    • Year and Date
      2014-10-13 – 2014-10-16
    • Related Report
      2014 Annual Research Report
  • [Presentation] 密度比推定を用いた逆強化学習2014

    • Author(s)
      内部英治、銅谷健司
    • Organizer
      第32回日本ロボット学会学術講演会
    • Place of Presentation
      九州産業大学
    • Year and Date
      2014-09-04 – 2014-09-06
    • Related Report
      2014 Annual Research Report
  • [Presentation] Control of Two-Wheeled Balancing and Standing-up Behaviors by an Android Phone Robot2014

    • Author(s)
      J. Wang, E. Uchibe, and K. Doya.
    • Organizer
      第32回日本ロボット学会学術講演会
    • Place of Presentation
      九州産業大学
    • Year and Date
      2014-09-04 – 2014-09-06
    • Related Report
      2014 Annual Research Report
  • [Presentation] Combining learned controllers to achieve new goals based on linearly solvable MDPs2014

    • Author(s)
      E. Uchibe and K. Doya
    • Organizer
      IEEE International Conference on Robotics and Automation
    • Place of Presentation
      Hong Kong
    • Year and Date
      2014-05-31 – 2014-06-07
    • Related Report
      2014 Annual Research Report
  • [Remarks] 神経計算ユニット 適応システムグループ

    • URL

      https://groups.oist.jp/ja/ncu/adaptive-systems-group

    • Related Report
      2014 Annual Research Report
  • [Patent(Industrial Property Rights)] Direct Inverse Reinforcement Learning with Density Ratio Estimation2016

    • Inventor(s)
      Eiji Uchibe and Kenji Doya
    • Industrial Property Rights Holder
      OIST
    • Industrial Property Rights Type
      特許
    • Filing Date
      2016-03-15
    • Related Report
      2015 Annual Research Report
    • Overseas
  • [Patent(Industrial Property Rights)] Inverse Reinforcement Learning by Density Ratio Estimation2015

    • Inventor(s)
      Eiji Uchibe and Kenji Doya
    • Industrial Property Rights Holder
      OIST
    • Industrial Property Rights Type
      特許
    • Filing Date
      2015-08-07
    • Related Report
      2015 Annual Research Report
    • Overseas
  • [Patent(Industrial Property Rights)] Estimating goals using inverse reinforcement learning based on density ratio estimation2014

    • Inventor(s)
      E. Uchibe and K. Doya
    • Industrial Property Rights Holder
      E. Uchibe and K. Doya
    • Industrial Property Rights Type
      特許
    • Filing Date
      2014-07-31
    • Related Report
      2014 Annual Research Report
    • Overseas

URL: 

Published: 2014-04-04   Modified: 2018-03-28  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi