2008 Fiscal Year Annual Research Report

帰納的強化学習の計算理論~環境の探索と帰納的再構成のベイズ推定

Research Project

Project/Area Number	20700126
Research Institution	The University of Tokyo
Principal Investigator	牧野貴樹 The University of Tokyo, 大学院・新領域創成科学研究科, 特任助教 (20418651)
Keywords	強化学習 / ベイズ推定 / 部分観測マルコフ決定過程 / 予測的状態表現 / TD-Network / エルマンネット
Research Abstract	本年度は、強化学習に対するベイズ推定手法の適用手法について研究を行った。まず、強化学習が解いている問題のドメインである、部分観測可能マルコフ決定過程(POMDP)を効率的に再構築する手法である予測的状態表現と、その学習手法であるTD-Networkに関して、2種類の新たな提案を行った。ひとつは依存検出ネットワークを用いたQuestion Networkのオンライン自動獲得、もうひとつはエルマンのSimple Recurrent Networkの構造を利用した前予測的状態表現である。これらの提案により、予測的状態表現の欠点である、問題の構造が分からなければCoretestset(正しい状態表現を構築するための予測対象の集合)が構築できず、そのため状態表現が獲得できないという問題を緩和することに成功した。これらの成果により、状態を完全に観測できないというより現実的な仮定のもとで、強化学習が適用できる範囲が拡大できることが期待される。これらの成果は、2008年および2009年のInternational Conference of Machine Learningにて採択された。また、学内外の研究者を集め、1〜2か月に1回のペースでBayesian強化学習研究会を開催している。これにより、ベイズ推定および強化学習の分野に関する多くの研究者の知見を集め、新しいノンパラメトリックベイズ手法など、来年度以降の研究のきっかけとなるさまざまな新しいアイディアを交換することができた。

Research Products
(4 results)

All 2009 2008

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (2 results)

[Journal Article] Proto-Predictive Representation of States with Simple Recurrent Temporal-Difference Networks2009
- Author(s)
  Takaki Makino
- Journal Title
  
  Proceedings of the 26th International Conference of Machine Learning (ICML 2009) (印刷中)(掲載確定)
- Peer Reviewed
[Journal Article] On-line Discovery of Temporal-Difference Networks2008
- Author(s)
  Takaki Makino and Toshihisa Takegi
- Journal Title
  
  Proceedings of the 25th International Conference of Machine Learning (ICML 2008)
  
  Pages: 632-639
- Peer Reviewed
[Presentation] POのP環境中でのTD-Networkの自動獲得 : 単純再帰構造による拡張2008
- Author(s)
  牧野貴樹
- Organizer
  人工知能学会第22回全国大会
- Place of Presentation
  ときわ市民ホール(旭川市)
- Year and Date
  2008-10-29
[Presentation] Simple Recurrent Temporal-Difference Networks2008
- Author(s)
  Takaki Makino
- Organizer
  第11回情報論的学習論ワークショップ
- Place of Presentation
  仙台国際センター
- Year and Date
  2008-10-29

2008 Fiscal Year Annual Research Report

帰納的強化学習の計算理論~環境の探索と帰納的再構成のベイズ推定

Principal Investigator

牧野 貴樹 The University of Tokyo, 大学院・新領域創成科学研究科, 特任助教 (20418651)

Research Products

[Journal Article] Proto-Predictive Representation of States with Simple Recurrent Temporal-Difference Networks2009

Author(s)

Journal Title

[Journal Article] On-line Discovery of Temporal-Difference Networks2008

Author(s)

Journal Title

[Presentation] POのP環境中でのTD-Networkの自動獲得 : 単純再帰構造による拡張2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Simple Recurrent Temporal-Difference Networks2008

Author(s)

Organizer

Place of Presentation

Year and Date

牧野貴樹 The University of Tokyo, 大学院・新領域創成科学研究科, 特任助教 (20418651)