2015 Fiscal Year Annual Research Report

報酬が動的に変化する環境における事前知識を活用する強化学習

Research Project

Project/Area Number	24760308
Research Institution	University of Tsukuba
Principal Investigator	澁谷長史筑波大学, システム情報系, 助教 (90582776)
Project Period (FY)	2012-04-01 – 2016-03-31
Keywords	機械学習 / 強化学習
Outline of Annual Research Achievements	本研究では、行動の効用が変化する環境における望ましい行動決定のための、効率的な強化学習法の実現をめざして研究を進めている。本研究が基礎におく強化学習は、自ら行動し経験を重ねることで振る舞いを獲得する枠組みである。強化学習には、多くの潜在的アプリケーションが期待されている反面、「ある行動を選択することの望ましいは時間に対して不変である」という仮定があり、目標が移り変わっていくような対象を学習できないという本質的な課題があり、本研究では、この実現の切り口として、変化に関する事前知識を活用した手法の検討を進めてきた。最終年度である本年度は、これまでの手法を発展させ、方位に対する周期性に着目した学習方式を提案し、国際会議論文として発表した。具体的には、行動価値関数を方位に依存する部分と方位に依存しない部分に分け、後者についてのみ学習することで、高速な学習を実現した。また、事故の発生を抑制することで試行錯誤の継続困難を回避する機能をもった学習方式、についても、国際会議論文として発表した。

Research Products
(4 results)

All 2015

All Journal Article (2 results) (of which Peer Reviewed: 1 results) Presentation (2 results)

[Journal Article] Profit Sharing reducing the occurrences of accidents by predicted action-safety degree2015
- Author(s)
  Junki Tamaru and Takeshi Shibuya
- Journal Title
  
  Proceedings of the 10th Asian Control Conference 2015 (ASCC 2015)
  
  Volume: ASCC2015 Pages: 2468--2473
- Peer Reviewed
[Journal Article] A study of efficient reinforcement learning using the relative angle of two objects2015
- Author(s)
  Moriaki Onishi and Takeshi Shibuya
- Journal Title
  
  Proceedings on the 16th International Symposium on Advanced Intelligent Systems
  
  Volume: ISIS2015 Pages: 1091-1098
[Presentation] 外部による評価を報酬に組み入れる繰り返し動作の獲得手法の一検討2015
- Author(s)
  NGUYEN VAN, BAC, 澁谷長史
- Organizer
  電気学会（システム研究会）
- Place of Presentation
  新潟県立看護大学（新潟県上越市）
- Year and Date
  2015-12-06 – 2015-12-06
[Presentation] dotQ-learningのための位相変化量の動的な更新に関する一検討2015
- Author(s)
  作田宏行, 澁谷長史
- Organizer
  電気学会（システム研究会）
- Place of Presentation
  新潟県立看護大学（新潟県上越市）
- Year and Date
  2015-12-06 – 2015-12-06

2015 Fiscal Year Annual Research Report

報酬が動的に変化する環境における事前知識を活用する強化学習

Principal Investigator

澁谷 長史 筑波大学, システム情報系, 助教 (90582776)

Research Products

[Journal Article] Profit Sharing reducing the occurrences of accidents by predicted action-safety degree2015

Author(s)

Journal Title

[Journal Article] A study of efficient reinforcement learning using the relative angle of two objects2015

Author(s)

Journal Title

[Presentation] 外部による評価を報酬に組み入れる繰り返し動作の獲得手法の一検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] dotQ-learningのための位相変化量の動的な更新に関する一検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

澁谷長史筑波大学, システム情報系, 助教 (90582776)