Computational Theory of Inductive Reinforcement Learning-Bayesian Inference on Environment Search and Inductive Reconstruction

Research Project

Project/Area Number	20700126
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	MAKINO Takaki 東京大学, 生産技術研究所, 特任准教授 (20418651)
Project Period (FY)	2008 – 2010
Project Status	Completed (Fiscal Year 2010)
Budget Amount *help	¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000) Fiscal Year 2010: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2009: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2008: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Keywords	強化学習 / Restricted Collapsed Draws / ベイズ推論 / 徒弟学習 / 無限隠れマルコフモデル / クラスタリング / 中華料理店過程 / TD-Network / ノンパラメトリックベイズ / 逆強化学習 / 隠れマルコフモデル / 階層的クラスタリング / サンプリング法 / ベイズ推定 / 部分観測マルコフ決定過程 / 予測的状態表現 / エルマンネット
Research Abstract	This study focuses on environmental model reconstruction in reinforcement learning based on Bayesian inference techniques. In reinforcement learning, an agent learns environment model by trial-and-error; if we have a suitable Bayesian environment model that represents uncertainty in the environment, an optimal exploration can be achieved. For this purpose, we proposed new approaches that improve TD-network, an environment description framework based on predictive state representation. In addition, we extended a nonparametric Bayesian model for hidden Markov model to represent hierarchical clustering of hidden states. Moreover, we applied the framework of apprenticeship learning and proposed a method that constructs environment model from other’s actions based on Bayesian inference. These are elements that are required for Bayesian reconstruction of the process of environmental search and reconstruction.

Report

(4 results)

2010 Annual Research Report Final Research Report ( PDF )
2009 Annual Research Report
2008 Annual Research Report

Research Products
(33 results)

All 2012 2011 2010 2009 2008 Other

All Journal Article (13 results) (of which Peer Reviewed: 9 results) Presentation (17 results) Book (2 results) Remarks (1 results)

[Journal Article] Apprenticeship learning for model parameters of partially observable environments2012
- Author(s)
  Takaki Makino and Johane Takeuchi
- Journal Title
  
  To be appeared in ICML '12: Proceedings of the 29th Annual international conference on machine learning
- NAID
  110009545975
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] 部分観測環境のモデルパラメータに対する徒弟学習2012
- Author(s)
  牧野貴樹, 竹内誉羽
- Journal Title
  
  信学技報
  
  Volume: Vol.111, No.480 Pages: 49-54
- NAID
  110009545975
- Related Report
  2010 Final Research Report
[Journal Article] 部分観測環境のモデルパラメータに対する徒弟学習2012
- Author(s)
  牧野貴樹, 竹内誉羽
- Journal Title
  
  電子情報通信学会技術報告(IBISML2011-94)
  
  Volume: 111(480) Pages: 49-54
- NAID
  110009545975
- Related Report
  2010 Annual Research Report
[Journal Article] 強化学習(私のブックマーク)2011
- Author(s)
  牧野貴樹
- Journal Title
  
  人工知能学会誌
  
  Volume: Vol.26, No.3 Pages: 301-303
- NAID
  110008662160
- Related Report
  2010 Final Research Report
[Journal Article] 利他的行動と再帰的他者推定2010
- Author(s)
  牧野貴樹, 滝久雄, 合原一幸
- Journal Title
  
  生産研究
  
  Volume: Vol.62, No.3 Pages: 259-265
- NAID
  130000342806
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] ノンパラメトリックベイズに基づく統計的機械学習2010
- Author(s)
  牧野貴樹
- Journal Title
  
  電子情報通信学会技術研究報告IBISML2010-14
  
  Volume: 110(76) Pages: 87-94
- NAID
  110008096185
- Related Report
  2010 Annual Research Report
[Journal Article] Cultural neuroeconomics of intertemporal choice2009
- Author(s)
  Taiki Takahashi, Tarik Hadzibeganovic, Sergio A. Cannas, Takaki Makino, Hiroki Fukui, and Shinobu Kitayama
- Journal Title
  
  Neuroendocrinology Letters
  
  Volume: Vol.30, No.2 Pages: 185-191
- NAID
  130004959898
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] Proto-predictive representation of states with simple recurrent temporal-difference networks2009
- Author(s)
  Takaki Makino
- Journal Title
  
  In Leon Bottou and Michael Littman, editors, ICML '09 : Proceedings of the 26th Annual international conference on machine learning
  
  Volume: vol.26 Pages: 697-704
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] Proto-predictive representation of states with simple recurrent temporal-difference networks2009
- Author(s)
  牧野貴樹
- Journal Title
  
  Proceedings of the 26th Annual international conference on machine learning 26
  
  Pages: 697-704
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] コミュニケーションの自己組織化2009
- Author(s)
  牧野貴樹
- Journal Title
  
  自己組織化ハンドブック(NTS出版)
  
  Pages: 438-443
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Proto-Predictive Representation of States with Simple Recurrent Temporal-Difference Networks2009
- Author(s)
  Takaki Makino
- Journal Title
  
  Proceedings of the 26th International Conference of Machine Learning (ICML 2009) (印刷中)(掲載確定)
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] On-line discovery of temporal-difference networks2008
- Author(s)
  Takaki Makino and Toshihisa Takagi
- Journal Title
  
  In Andrew McCallum and Sam Roweis, editors, ICML '08 : Proceedings of the 25th Annual International Conference on Machine Learning
  
  Volume: vol.25 Pages: 632-639
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] On-line Discovery of Temporal-Difference Networks2008
- Author(s)
  Takaki Makino and Toshihisa Takegi
- Journal Title
  
  Proceedings of the 25th International Conference of Machine Learning (ICML 2008)
  
  Pages: 632-639
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Presentation] Hierarchical Nested Infinite Hidden Markov Models2012
- Author(s)
  Takaki Makino
- Organizer
  Bayesian Inference and Stochastic Computation 2012 workshop
- Place of Presentation
  立川市
- Year and Date
  2012-06-22
- Related Report
  2010 Final Research Report
[Presentation] Learning model parameters of partially observable markov decision process from demonstration2012
- Author(s)
  Takaki Makino and Johane Takeuchi
- Organizer
  In Proc. Of the 2nd International Symposium on Innovative Mathematical Modeling
- Place of Presentation
  東京
- Year and Date
  2012-05-13
- Related Report
  2010 Final Research Report
[Presentation] 部分観測環境のモデルパラメータに対する徒弟学習2012
- Author(s)
  牧野貴樹, 竹内誉羽
- Organizer
  電子情報通信学会情報論的学習理論と機械学習研究会
- Place of Presentation
  統計数理研究所
- Year and Date
  2012-03-12
- Related Report
  2010 Annual Research Report
[Presentation] Slice sampling for chinese restaurant process2010
- Author(s)
  Takaki Makino
- Organizer
  In Proc. Of the 2nd Asian Conference on Machine Learning (ACML 2010)
- Place of Presentation
  Tokyo
- Year and Date
  2010-11-08
- Related Report
  2010 Final Research Report
[Presentation] ノンパラメトリックベイズに基づく統計的機械学習2010
- Author(s)
  牧野貴樹
- Organizer
  電子情報通信学会技術研究報告IBISML2010-14,電子情報通信学会
- Place of Presentation
  東京
- Year and Date
  2010-06-15
- Related Report
  2010 Final Research Report
[Presentation] ノンパラメトリックベイズに基づく統計的機械学習2010
- Author(s)
  牧野貴樹
- Organizer
  電子情報通信学会情報論的学習理論と機械学習研究会
- Place of Presentation
  東京大学武田ホール
- Year and Date
  2010-06-15
- Related Report
  2010 Annual Research Report
[Presentation] 隠れマルコフモデルのノンパラメトリックベイズ推定とMCMC法2010
- Author(s)
  牧野貴樹
- Organizer
  研究会『マルコフ連鎖モンテカルロ法とその周辺』
- Place of Presentation
  統計数理研究所(立川市)
- Year and Date
  2010-02-21
- Related Report
  2009 Annual Research Report
[Presentation] Conditional simultaneous draws from hierarchical chinese restaurant processes2009
- Author(s)
  Takaki Makino, Shunsuke Takei, Daichi Mochihashi, Issei Sato, Toshihisa Takagi
- Organizer
  Nonparametric Bayes Workshop at NIPS 2009(NPBayes 2009)
- Place of Presentation
  Whistler, BC, Canada
- Year and Date
  2009-12-11
- Related Report
  2009 Annual Research Report
[Presentation] 階層状態無限隠れマルコフモデル2009
- Author(s)
  牧野貴樹
- Organizer
  情報論的学習理論 (IBIS2009)ポスター発表
- Place of Presentation
  福岡市
- Year and Date
  2009-10-20
- Related Report
  2010 Final Research Report
[Presentation] ベイズ確率文脈自由文法のための高速構文木サンプリング法2009
- Author(s)
  武井俊祐, 牧野貴樹, 高木利久
- Organizer
  情報論的学習理論(IBIS)2009
- Place of Presentation
  九州大学(福岡市)
- Year and Date
  2009-10-19
- Related Report
  2009 Annual Research Report
[Presentation] 階層状態無限隠れマルコフモデル2009
- Author(s)
  牧野貴樹
- Organizer
  情報論的学習理論(IBIS)2009
- Place of Presentation
  九州大学(福岡市)
- Year and Date
  2009-10-19
- Related Report
  2009 Annual Research Report
[Presentation] Probabilistic discounting for modeling behaviors in Iowa gambling task2009
- Author(s)
  Takaki Makino, Taiki Takahashi, Hirofumi Nishinaka, and Hiroki Fukui
- Organizer
  In Proceedings of Multi-disciplinary Symposium on Reinforcement Learning (MSRL2009)
- Place of Presentation
  Montreal, Canada
- Year and Date
  2009-06-18
- Related Report
  2010 Final Research Report
[Presentation] Simple recurrent temporal-difference networks2008
- Author(s)
  Takaki Makino
- Organizer
  情報論的学習理論ワークショップ (IBIS2008)
- Place of Presentation
  仙台市
- Year and Date
  2008-10-29
- Related Report
  2010 Final Research Report
[Presentation] POのP環境中でのTD-Networkの自動獲得 : 単純再帰構造による拡張2008
- Author(s)
  牧野貴樹
- Organizer
  人工知能学会第22回全国大会
- Place of Presentation
  ときわ市民ホール(旭川市)
- Year and Date
  2008-10-29
- Related Report
  2008 Annual Research Report
[Presentation] Simple Recurrent Temporal-Difference Networks2008
- Author(s)
  Takaki Makino
- Organizer
  第11回情報論的学習論ワークショップ
- Place of Presentation
  仙台国際センター
- Year and Date
  2008-10-29
- Related Report
  2008 Annual Research Report
[Presentation] 自己観測原理 : 他者認知の数理的枠組2008
- Author(s)
  牧野貴樹, 合原一幸
- Organizer
  第22回人工知能学会全国大会
- Place of Presentation
  旭川市
- Year and Date
  2008-06-13
- Related Report
  2010 Final Research Report
[Presentation] POMDP環境中でのTD-networkの自動獲得 : 単純再帰構造による拡張2008
- Author(s)
  牧野貴樹
- Organizer
  第22回人工知能学会全国大会
- Place of Presentation
  旭川市
- Year and Date
  2008-06-13
- Related Report
  2010 Final Research Report
[Book] コミュニケーションの自己組織化. 国武豊喜 (監修), 自己組織化ハンドブック2009
- Author(s)
  牧野貴樹
- Publisher
  NTS出版
- Related Report
  2010 Final Research Report
[Book] Employing delay and probability discounting frameworks for a neuroeconomic understanding of gambling behavior. In M. J. Esposito, editor, Psychology of Gambling2008
- Author(s)
  Taiki Takahashi, Takaki Makino, Yu Ohmura, and Hiroki Fukui
- Publisher
  Nova Science
- Related Report
  2010 Final Research Report
[Remarks]
- URL
  http://www.sat.t.u-tokyo.ac.jp/~mak/
- Related Report
  2010 Final Research Report

Computational Theory of Inductive Reinforcement Learning-Bayesian Inference on Environment Search and Inductive Reconstruction

Principal Investigator

MAKINO Takaki 東京大学, 生産技術研究所, 特任准教授 (20418651)

¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)

Report

Research Products

[Journal Article] Apprenticeship learning for model parameters of partially observable environments2012

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 部分観測環境のモデルパラメータに対する徒弟学習2012

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 部分観測環境のモデルパラメータに対する徒弟学習2012

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 強化学習(私のブックマーク)2011

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 利他的行動と再帰的他者推定2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] ノンパラメトリックベイズに基づく統計的機械学習2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Cultural neuroeconomics of intertemporal choice2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Proto-predictive representation of states with simple recurrent temporal-difference networks2009

Author(s)

Journal Title

Related Report

[Journal Article] Proto-predictive representation of states with simple recurrent temporal-difference networks2009

Author(s)

Journal Title

Related Report

[Journal Article] コミュニケーションの自己組織化2009

Author(s)

Journal Title

Related Report

[Journal Article] Proto-Predictive Representation of States with Simple Recurrent Temporal-Difference Networks2009

Author(s)

Journal Title

Related Report

[Journal Article] On-line discovery of temporal-difference networks2008

Author(s)

Journal Title

Related Report

[Journal Article] On-line Discovery of Temporal-Difference Networks2008

Author(s)

Journal Title

Related Report

[Presentation] Hierarchical Nested Infinite Hidden Markov Models2012

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Learning model parameters of partially observable markov decision process from demonstration2012

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 部分観測環境のモデルパラメータに対する徒弟学習2012

Author(s)

Organizer