部分観測環境下におけるモデルベース・モデルフリー強化学習の役割分担

Publicly Offered Research

Project Area	Elucidation of neural computation for prediction and decision making: toward better human understanding and applications
Project/Area Number	26120727
Research Category	Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)
Allocation Type	Single-year Grants
Review Section	Complex systems
Research Institution	Advanced Telecommunications Research Institute International (2015) Okinawa Institute of Science and Technology Graduate University (2014)
Principal Investigator	内部英治株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)
Project Period (FY)	2014-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥9,620,000 (Direct Cost: ¥7,400,000、Indirect Cost: ¥2,220,000) Fiscal Year 2015: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000) Fiscal Year 2014: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)
Keywords	強化学習 / 逆強化学習 / EMアルゴリズム / 線形可解マルコフ決定過程 / 密度比推定法 / 部分観測環境 / 深層学習
Outline of Annual Research Achievements	このプロジェクトでは線形可解マルコフ決定過程(LMDP)に基づいた強化学習および逆強化学習について研究した。一つはベルマン方程式が線形化されることを利用した、解の重ね合わせ原理に基づく制御則の合成を実ロボット実験によって検証した。その結果、実世界では重ね合わせは厳密には成立せず、重ね合わせによって得られた解を初期値として追加学習する手法が有効であることを示した。また逆強化学習法として、LMDPでは学習前後の状態遷移確率の比の対数が報酬と価値関数によって表現できることを示し、それに基づいた逆強化学習法を提案した。一つは密度比推定法と正則化付き最小二乗法によるもので、これは特許として出願した（PCT/JP2015/004001）。また最小二乗法を必要としないロジスティック回帰に基づく方法も特許として出願した。これらの手法は従来法OptV, MaxEnt-IRL, RelEnt-IRLよりも少ない計算コスト、少ないサンプル数で報酬関数を効率よく推定することができた。これらの成果は神経回路学会誌の解説記事としてまとめた。また、これまで決定論的方策を学習できる勾配探査法であるPolicy Gradients with Parameter based Exploration (PGPE)とEMアルゴリズムの導入により学習率の調整の問題を回避したReward Weighted Regressionをもとに新しい学習率を必要としない方策探査法を提案し、従来法のPGPEやFinite Differenceよりも少ないサンプルで、かつ素早く制御則が獲得できることをシミュレーションで示した。この結果はArtificial Life and Roboticsに掲載された。またベースラインの導入による推定量の改善や実ロボットの実験を含めた結果を2016年6月をめどに英文誌に投稿する予定である。
Research Progress Status	27年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	27年度が最終年度であるため、記入しない。

Report

(2 results)

2015 Annual Research Report
2014 Annual Research Report

Research Products
(18 results)

All 2016 2015 2014 Other

All Journal Article (3 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (11 results) (of which Int'l Joint Research: 5 results) Remarks (1 results) Patent(Industrial Property Rights) (3 results) (of which Overseas: 3 results)

[Journal Article] EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot2016
- Author(s)
  Wang J, Uchibe E, Doya K
- Journal Title
  
  Artificial Life and Robotics
  
  Volume: 21 Issue: 1 Pages: 125-131
- DOI
  10.1007/s10015-015-0260-7
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] Forward and Inverse Reinforcement Learning Based on Linearly Solvable Markov Decision Processes2016
- Author(s)
  内部英治
- Journal Title
  
  The Brain & Neural Networks
  
  Volume: 23 Issue: 1 Pages: 2-13
- DOI
  10.3902/jnns.23.2
- NAID
  130005150459
- ISSN
  1340-766X, 1883-0455
- Related Report
  2015 Annual Research Report
- Acknowledgement Compliant
[Journal Article] Expected energy-based restricted Boltzmann machine for classification2014
- Author(s)
  Elfwing S.,Uchibe E., Doya K.
- Journal Title
  
  Neural Networks
  
  Volume: 64 Pages: 29-38
- DOI
  10.1016/j.neunet.2014.09.006
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] Learning of Stress Adaptive Habits with an Ensemble of Q-Learners2016
- Author(s)
  Chris Reinke, Eiji Uchibe, and Kenji Doya
- Organizer
  The 2nd International Workshop on Cognitive Neuroscience Robotics
- Place of Presentation
  Osaka, Japan
- Year and Date
  2016-02-21
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] From Neuroscience to Artificial Intelligence: Maximizing Average Reward in Episodic Reinforcement Learning Tasks with an Ensemble of Q-Learners2016
- Author(s)
  Chris Reinke, Eiji Uchibe, and Kenji Doya
- Organizer
  Third CiNet Conference, Neural mechanisms of decision making: Achievements and new directions
- Place of Presentation
  Osaka, Japan
- Year and Date
  2016-02-03
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Forward and inverse reinforcement learning for playing games2015
- Author(s)
  Eiji Uchibe, and Kenji Doya
- Organizer
  新学術領域研究「予測と意思決定の脳内計算機構の解明による人間理解と応用」第10回領域会議､2015年度包括脳冬のワークショップ
- Place of Presentation
  Tokyo, Japan
- Year and Date
  2015-12-17
- Related Report
  2015 Annual Research Report
[Presentation] Maximizing the average reward in episodic reinforcement learning tasks2015
- Author(s)
  Chris Reinke, Eiji Uchibe, and Kenji Doya
- Organizer
  IEEE International Conference on Intelligent Informatics and Biomedical Sciences
- Place of Presentation
  Okinawa, Japan
- Year and Date
  2015-11-28
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Inverse reinforcement learning for behavior analysis and control2015
- Author(s)
  Eiji Uchibe, and Kenji Doya
- Organizer
  International Symposium on Prediction and Decision Making 2015
- Place of Presentation
  Tokyo, Japan
- Year and Date
  2015-10-31
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Inverse Reinforcement Learning with Density Ratio Estimation2015
- Author(s)
  Eiji Uchibe, and Kenji Doya
- Organizer
  The 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Place of Presentation
  The University of Alberta
- Year and Date
  2015-06-07
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Two-wheeled smartphone robot learns to stand up and balance by EM-based policy hyper parameter exploration2015
- Author(s)
  J. Wang, E. Uchibe, and K. Doya
- Organizer
  20th International Symposium on Artificial Life and Robotics
- Place of Presentation
  Beppu
- Year and Date
  2015-01-21 – 2015-01-23
- Related Report
  2014 Annual Research Report
[Presentation] Inverse Reinforcement Learning Using Dynamic Policy Programming2014
- Author(s)
  E. Uchibe and K. Doya
- Organizer
  4th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics
- Place of Presentation
  Genoa
- Year and Date
  2014-10-13 – 2014-10-16
- Related Report
  2014 Annual Research Report
[Presentation] 密度比推定を用いた逆強化学習2014
- Author(s)
  内部英治、銅谷健司
- Organizer
  第32回日本ロボット学会学術講演会
- Place of Presentation
  九州産業大学
- Year and Date
  2014-09-04 – 2014-09-06
- Related Report
  2014 Annual Research Report
[Presentation] Control of Two-Wheeled Balancing and Standing-up Behaviors by an Android Phone Robot2014
- Author(s)
  J. Wang, E. Uchibe, and K. Doya.
- Organizer
  第32回日本ロボット学会学術講演会
- Place of Presentation
  九州産業大学
- Year and Date
  2014-09-04 – 2014-09-06
- Related Report
  2014 Annual Research Report
[Presentation] Combining learned controllers to achieve new goals based on linearly solvable MDPs2014
- Author(s)
  E. Uchibe and K. Doya
- Organizer
  IEEE International Conference on Robotics and Automation
- Place of Presentation
  Hong Kong
- Year and Date
  2014-05-31 – 2014-06-07
- Related Report
  2014 Annual Research Report
[Remarks] 神経計算ユニット適応システムグループ
- URL
  https://groups.oist.jp/ja/ncu/adaptive-systems-group
- Related Report
  2014 Annual Research Report
[Patent(Industrial Property Rights)] Direct Inverse Reinforcement Learning with Density Ratio Estimation2016
- Inventor(s)
  Eiji Uchibe and Kenji Doya
- Industrial Property Rights Holder
  OIST
- Industrial Property Rights Type
  特許
- Filing Date
  2016-03-15
- Related Report
  2015 Annual Research Report
- Overseas
[Patent(Industrial Property Rights)] Inverse Reinforcement Learning by Density Ratio Estimation2015
- Inventor(s)
  Eiji Uchibe and Kenji Doya
- Industrial Property Rights Holder
  OIST
- Industrial Property Rights Type
  特許
- Filing Date
  2015-08-07
- Related Report
  2015 Annual Research Report
- Overseas
[Patent(Industrial Property Rights)] Estimating goals using inverse reinforcement learning based on density ratio estimation2014
- Inventor(s)
  E. Uchibe and K. Doya
- Industrial Property Rights Holder
  E. Uchibe and K. Doya
- Industrial Property Rights Type
  特許
- Filing Date
  2014-07-31
- Related Report
  2014 Annual Research Report
- Overseas

部分観測環境下におけるモデルベース・モデルフリー強化学習の役割分担

Principal Investigator

内部 英治 株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)

¥9,620,000 (Direct Cost: ¥7,400,000、Indirect Cost: ¥2,220,000)

Report

Research Products

[Journal Article] EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Forward and Inverse Reinforcement Learning Based on Linearly Solvable Markov Decision Processes2016

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Expected energy-based restricted Boltzmann machine for classification2014

Author(s)

Journal Title

DOI

Related Report

[Presentation] Learning of Stress Adaptive Habits with an Ensemble of Q-Learners2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] From Neuroscience to Artificial Intelligence: Maximizing Average Reward in Episodic Reinforcement Learning Tasks with an Ensemble of Q-Learners2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Forward and inverse reinforcement learning for playing games2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Maximizing the average reward in episodic reinforcement learning tasks2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Inverse reinforcement learning for behavior analysis and control2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Inverse Reinforcement Learning with Density Ratio Estimation2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Two-wheeled smartphone robot learns to stand up and balance by EM-based policy hyper parameter exploration2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Inverse Reinforcement Learning Using Dynamic Policy Programming2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 密度比推定を用いた逆強化学習2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Control of Two-Wheeled Balancing and Standing-up Behaviors by an Android Phone Robot2014

Author(s)

Organizer

内部英治株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 主幹研究員 (20426571)

[Remarks] 神経計算ユニット適応システムグループ