モデルベース予測状態フィードバックを組み込んだ強化学習

Publicly Offered Research

Project Area	Elucidation of neural computation for prediction and decision making: toward better human understanding and applications
Project/Area Number	24120527
Research Category	Grant-in-Aid for Scientific Research on Innovative Areas (Research in a proposed research area)
Allocation Type	Single-year Grants
Review Section	Complex systems
Research Institution	Okinawa Institute of Science and Technology Graduate University
Principal Investigator	内部英治沖縄科学技術大学院大学, 神経計算ユニット, 研究員 (20426571)
Project Period (FY)	2012-04-01 – 2014-03-31
Project Status	Completed (Fiscal Year 2013)
Budget Amount *help	¥8,450,000 (Direct Cost: ¥6,500,000、Indirect Cost: ¥1,950,000) Fiscal Year 2013: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2012: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Keywords	強化学習 / モデルフリー / モデルベース / 線形可解マルコフ決定過程 / 最適制御
Research Abstract	最適制御問題は非線形偏微分方程式であるベルマン方程式を解く問題に帰着され、解として得られる価値関数から最適制御則が構築される。この非線形性を緩和する方法が線形可解ベルマン方程式による強化学習法である。前年度の実ロボット実験の結果より、この手法はモデル化誤差の影響が通常の強化学習法よりも大きいことが分かった。この問題に対処するために二つの解決法を提案した。一つはゲーム理論に基づくミニマックス法を導入することにより、環境のモデル化誤差にロバストな線形化強化学習法を提案した。基本的なアルゴリズムはDvijotham and Todorov 2011によって提案されていたものの、環境のモデル化誤差が制御則に及ぼす影響は調査されていなかったため、本研究ではその点を膨大なシミュレーションによって調査した。その結果、関数近似誤差のある連続問題とそうでない離散問題で、ロバストネスを調節するパラメータの設計指針が異なることを示した。離散問題に適用した結果をAROBシンポジウムで発表し、連続問題に適用した結果をJournal of Artificial Life and Roboticsに投稿予定である。もう一つは価値関数を指数変換した適合度関数を環境との相互作用から直接推定するモデルフリー強化学習法を手案した。これはfinite horizonの問題として定式化されるが、導出されたアルゴリズムはコスト関数を指数変換したものの逆数が状態依存の割引率に相当することを示した。この手法を線形可解マルコフ決定過程における制御器の重ね合わせ法と組み合わせ、実ロボットを用いた実証実験を行った。この結果はNeuro2013の招待講演での講演および国際会議ICRA2014に採択され、ICRA2014については6月に口頭発表予定である。
Current Status of Research Progress	Reason 25年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	25年度が最終年度であるため、記入しない。

Report

(2 results)

2013 Annual Research Report
2012 Annual Research Report

Research Products
(10 results)

All 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (9 results) (of which Invited: 3 results)

[Journal Article] Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task2013
- Author(s)
  Kinjo K, Uchibe E, Doya K
- Journal Title
  
  Frontiers in Neurorobotics
  
  Volume: 7 Pages: 7-7
- DOI
  10.3389/fnbot.2013.00007
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Presentation] Combining learned controllers to achieve new goals based on linearly solvable MDPs2014
- Author(s)
  E. Uchibe and K. Doya
- Organizer
  Proc. of IEEE International Conference on Robotics and Automation
- Place of Presentation
  Hong Kong
- Related Report
  2013 Annual Research Report
[Presentation] Robustness of Linearly Solvable Markov Games with Inaccurate Dynamics Models2014
- Author(s)
  K. Kinjo, E. Uchibe, and K. Doya
- Organizer
  Proc. of International Symposium on Artificial Life and Robotics
- Place of Presentation
  Beppu, Japan
- Related Report
  2013 Annual Research Report
[Presentation] Standing-up and Balancing Behaviors of Android Phone Robot -- Control of Spring Attached Wheeled Inverted Pendulum --2013
- Author(s)
  J. Wang, E. Uchibe, and K. Doya
- Organizer
  IEICE Technical Committee on Nonlinear Problems (NLP)
- Place of Presentation
  City University of Hong Kong
- Related Report
  2013 Annual Research Report
[Presentation] Inverse reinforcement learning for analysis of human behaviors2013
- Author(s)
  E. Uchibe, S. Ota, and K. Doya
- Organizer
  The 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Place of Presentation
  Princeton University
- Related Report
  2013 Annual Research Report
[Presentation] Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces2013
- Author(s)
  E. Uchibe, S. Elfwing, and K. Doya
- Organizer
  Neuro 2013
- Place of Presentation
  Kyoto International Conference Center
- Related Report
  2013 Annual Research Report
- Invited
[Presentation] Combining learned controllers to achieve new goals based on linearly solvable MDPs2013
- Author(s)
  E. Uchibe, and K. Doya
- Organizer
  Neuro 2013
- Place of Presentation
  Kyoto International Conference Center
- Related Report
  2013 Annual Research Report
- Invited
[Presentation] Inverse reinforcement learning by density ratio estimation2013
- Author(s)
  E. Uchibe, and K. Doya
- Organizer
  第16回情報論的学習理論ワークショップIBIS2013
- Place of Presentation
  東京工業大学蔵前会館
- Related Report
  2013 Annual Research Report
[Presentation] Inverse reinforcement learning for understanding human behaviors2013
- Author(s)
  E. Uchibe
- Organizer
  International Symposium on Past and Future Directions of Cognitive Developmental Robotics
- Place of Presentation
  Osaka University Nakanoshima Center 10F
- Related Report
  2013 Annual Research Report
- Invited
[Presentation] Analysis of human behaviors by inverse reinforcement learning in a pole balancing task2013
- Author(s)
  S. Ota, E. Uchibe, and K. Doya
- Organizer
  The 3rd International Symposium on The Biology of Decision Making
- Place of Presentation
  Paris, France
- Related Report
  2013 Annual Research Report

モデルベース予測状態フィードバックを組み込んだ強化学習

Principal Investigator

内部 英治 沖縄科学技術大学院大学, 神経計算ユニット, 研究員 (20426571)

¥8,450,000 (Direct Cost: ¥6,500,000、Indirect Cost: ¥1,950,000)

Reason

Report

Research Products

[Journal Article] Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task2013

Author(s)

Journal Title

DOI

Related Report

[Presentation] Combining learned controllers to achieve new goals based on linearly solvable MDPs2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Robustness of Linearly Solvable Markov Games with Inaccurate Dynamics Models2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Standing-up and Balancing Behaviors of Android Phone Robot -- Control of Spring Attached Wheeled Inverted Pendulum --2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Inverse reinforcement learning for analysis of human behaviors2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Combining learned controllers to achieve new goals based on linearly solvable MDPs2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Inverse reinforcement learning by density ratio estimation2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Inverse reinforcement learning for understanding human behaviors2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Analysis of human behaviors by inverse reinforcement learning in a pole balancing task2013

Author(s)

Organizer

Place of Presentation

Related Report

内部英治沖縄科学技術大学院大学, 神経計算ユニット, 研究員 (20426571)