• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Computational Theory of Inductive Reinforcement Learning-Bayesian Inference on Environment Search and Inductive Reconstruction

Research Project

Project/Area Number 20700126
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeSingle-year Grants
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

MAKINO Takaki  東京大学, 生産技術研究所, 特任准教授 (20418651)

Project Period (FY) 2008 – 2010
Project Status Completed (Fiscal Year 2010)
Budget Amount *help
¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
Fiscal Year 2010: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2009: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2008: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Keywords強化学習 / Restricted Collapsed Draws / ベイズ推論 / 徒弟学習 / 無限隠れマルコフモデル / クラスタリング / 中華料理店過程 / TD-Network / ノンパラメトリックベイズ / 逆強化学習 / 隠れマルコフモデル / 階層的クラスタリング / サンプリング法 / ベイズ推定 / 部分観測マルコフ決定過程 / 予測的状態表現 / エルマンネット
Research Abstract

This study focuses on environmental model reconstruction in reinforcement learning based on Bayesian inference techniques. In reinforcement learning, an agent learns environment model by trial-and-error; if we have a suitable Bayesian environment model that represents uncertainty in the environment, an optimal exploration can be achieved. For this purpose, we proposed new approaches that improve TD-network, an environment description framework based on predictive state representation. In addition, we extended a nonparametric Bayesian model for hidden Markov model to represent hierarchical clustering of hidden states. Moreover, we applied the framework of apprenticeship learning and proposed a method that constructs environment model from other’s actions based on Bayesian inference. These are elements that are required for Bayesian reconstruction of the process of environmental search and reconstruction.

Report

(4 results)
  • 2010 Annual Research Report   Final Research Report ( PDF )
  • 2009 Annual Research Report
  • 2008 Annual Research Report
  • Research Products

    (33 results)

All 2012 2011 2010 2009 2008 Other

All Journal Article (13 results) (of which Peer Reviewed: 9 results) Presentation (17 results) Book (2 results) Remarks (1 results)

  • [Journal Article] Apprenticeship learning for model parameters of partially observable environments2012

    • Author(s)
      Takaki Makino and Johane Takeuchi
    • Journal Title

      To be appeared in ICML '12: Proceedings of the 29th Annual international conference on machine learning

    • NAID

      110009545975

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] 部分観測環境のモデルパラメータに対する徒弟学習2012

    • Author(s)
      牧野貴樹, 竹内誉羽
    • Journal Title

      信学技報

      Volume: Vol.111, No.480 Pages: 49-54

    • NAID

      110009545975

    • Related Report
      2010 Final Research Report
  • [Journal Article] 部分観測環境のモデルパラメータに対する徒弟学習2012

    • Author(s)
      牧野貴樹, 竹内誉羽
    • Journal Title

      電子情報通信学会技術報告(IBISML2011-94)

      Volume: 111(480) Pages: 49-54

    • NAID

      110009545975

    • Related Report
      2010 Annual Research Report
  • [Journal Article] 強化学習(私のブックマーク)2011

    • Author(s)
      牧野貴樹
    • Journal Title

      人工知能学会誌

      Volume: Vol.26, No.3 Pages: 301-303

    • NAID

      110008662160

    • Related Report
      2010 Final Research Report
  • [Journal Article] 利他的行動と再帰的他者推定2010

    • Author(s)
      牧野貴樹, 滝久雄, 合原一幸
    • Journal Title

      生産研究

      Volume: Vol.62, No.3 Pages: 259-265

    • NAID

      130000342806

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] ノンパラメトリックベイズに基づく統計的機械学習2010

    • Author(s)
      牧野貴樹
    • Journal Title

      電子情報通信学会技術研究報告IBISML2010-14

      Volume: 110(76) Pages: 87-94

    • NAID

      110008096185

    • Related Report
      2010 Annual Research Report
  • [Journal Article] Cultural neuroeconomics of intertemporal choice2009

    • Author(s)
      Taiki Takahashi, Tarik Hadzibeganovic, Sergio A. Cannas, Takaki Makino, Hiroki Fukui, and Shinobu Kitayama
    • Journal Title

      Neuroendocrinology Letters

      Volume: Vol.30, No.2 Pages: 185-191

    • NAID

      130004959898

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] Proto-predictive representation of states with simple recurrent temporal-difference networks2009

    • Author(s)
      Takaki Makino
    • Journal Title

      In Leon Bottou and Michael Littman, editors, ICML '09 : Proceedings of the 26th Annual international conference on machine learning

      Volume: vol.26 Pages: 697-704

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] Proto-predictive representation of states with simple recurrent temporal-difference networks2009

    • Author(s)
      牧野貴樹
    • Journal Title

      Proceedings of the 26th Annual international conference on machine learning 26

      Pages: 697-704

    • Related Report
      2009 Annual Research Report
    • Peer Reviewed
  • [Journal Article] コミュニケーションの自己組織化2009

    • Author(s)
      牧野貴樹
    • Journal Title

      自己組織化ハンドブック(NTS出版)

      Pages: 438-443

    • Related Report
      2009 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Proto-Predictive Representation of States with Simple Recurrent Temporal-Difference Networks2009

    • Author(s)
      Takaki Makino
    • Journal Title

      Proceedings of the 26th International Conference of Machine Learning (ICML 2009) (印刷中)(掲載確定)

    • Related Report
      2008 Annual Research Report
    • Peer Reviewed
  • [Journal Article] On-line discovery of temporal-difference networks2008

    • Author(s)
      Takaki Makino and Toshihisa Takagi
    • Journal Title

      In Andrew McCallum and Sam Roweis, editors, ICML '08 : Proceedings of the 25th Annual International Conference on Machine Learning

      Volume: vol.25 Pages: 632-639

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] On-line Discovery of Temporal-Difference Networks2008

    • Author(s)
      Takaki Makino and Toshihisa Takegi
    • Journal Title

      Proceedings of the 25th International Conference of Machine Learning (ICML 2008)

      Pages: 632-639

    • Related Report
      2008 Annual Research Report
    • Peer Reviewed
  • [Presentation] Hierarchical Nested Infinite Hidden Markov Models2012

    • Author(s)
      Takaki Makino
    • Organizer
      Bayesian Inference and Stochastic Computation 2012 workshop
    • Place of Presentation
      立川市
    • Year and Date
      2012-06-22
    • Related Report
      2010 Final Research Report
  • [Presentation] Learning model parameters of partially observable markov decision process from demonstration2012

    • Author(s)
      Takaki Makino and Johane Takeuchi
    • Organizer
      In Proc. Of the 2nd International Symposium on Innovative Mathematical Modeling
    • Place of Presentation
      東京
    • Year and Date
      2012-05-13
    • Related Report
      2010 Final Research Report
  • [Presentation] 部分観測環境のモデルパラメータに対する徒弟学習2012

    • Author(s)
      牧野貴樹, 竹内誉羽
    • Organizer
      電子情報通信学会情報論的学習理論と機械学習研究会
    • Place of Presentation
      統計数理研究所
    • Year and Date
      2012-03-12
    • Related Report
      2010 Annual Research Report
  • [Presentation] Slice sampling for chinese restaurant process2010

    • Author(s)
      Takaki Makino
    • Organizer
      In Proc. Of the 2nd Asian Conference on Machine Learning (ACML 2010)
    • Place of Presentation
      Tokyo
    • Year and Date
      2010-11-08
    • Related Report
      2010 Final Research Report
  • [Presentation] ノンパラメトリックベイズに基づく統計的機械学習2010

    • Author(s)
      牧野貴樹
    • Organizer
      電子情報通信学会技術研究報告IBISML2010-14,電子情報通信学会
    • Place of Presentation
      東京
    • Year and Date
      2010-06-15
    • Related Report
      2010 Final Research Report
  • [Presentation] ノンパラメトリックベイズに基づく統計的機械学習2010

    • Author(s)
      牧野貴樹
    • Organizer
      電子情報通信学会情報論的学習理論と機械学習研究会
    • Place of Presentation
      東京大学武田ホール
    • Year and Date
      2010-06-15
    • Related Report
      2010 Annual Research Report
  • [Presentation] 隠れマルコフモデルのノンパラメトリックベイズ推定とMCMC法2010

    • Author(s)
      牧野貴樹
    • Organizer
      研究会『マルコフ連鎖モンテカルロ法とその周辺』
    • Place of Presentation
      統計数理研究所(立川市)
    • Year and Date
      2010-02-21
    • Related Report
      2009 Annual Research Report
  • [Presentation] Conditional simultaneous draws from hierarchical chinese restaurant processes2009

    • Author(s)
      Takaki Makino, Shunsuke Takei, Daichi Mochihashi, Issei Sato, Toshihisa Takagi
    • Organizer
      Nonparametric Bayes Workshop at NIPS 2009(NPBayes 2009)
    • Place of Presentation
      Whistler, BC, Canada
    • Year and Date
      2009-12-11
    • Related Report
      2009 Annual Research Report
  • [Presentation] 階層状態無限隠れマルコフモデル2009

    • Author(s)
      牧野貴樹
    • Organizer
      情報論的学習理論 (IBIS2009)ポスター発表
    • Place of Presentation
      福岡市
    • Year and Date
      2009-10-20
    • Related Report
      2010 Final Research Report
  • [Presentation] ベイズ確率文脈自由文法のための高速構文木サンプリング法2009

    • Author(s)
      武井俊祐, 牧野貴樹, 高木利久
    • Organizer
      情報論的学習理論(IBIS)2009
    • Place of Presentation
      九州大学(福岡市)
    • Year and Date
      2009-10-19
    • Related Report
      2009 Annual Research Report
  • [Presentation] 階層状態無限隠れマルコフモデル2009

    • Author(s)
      牧野貴樹
    • Organizer
      情報論的学習理論(IBIS)2009
    • Place of Presentation
      九州大学(福岡市)
    • Year and Date
      2009-10-19
    • Related Report
      2009 Annual Research Report
  • [Presentation] Probabilistic discounting for modeling behaviors in Iowa gambling task2009

    • Author(s)
      Takaki Makino, Taiki Takahashi, Hirofumi Nishinaka, and Hiroki Fukui
    • Organizer
      In Proceedings of Multi-disciplinary Symposium on Reinforcement Learning (MSRL2009)
    • Place of Presentation
      Montreal, Canada
    • Year and Date
      2009-06-18
    • Related Report
      2010 Final Research Report
  • [Presentation] Simple recurrent temporal-difference networks2008

    • Author(s)
      Takaki Makino
    • Organizer
      情報論的学習理論ワークショップ (IBIS2008)
    • Place of Presentation
      仙台市
    • Year and Date
      2008-10-29
    • Related Report
      2010 Final Research Report
  • [Presentation] POのP環境中でのTD-Networkの自動獲得 : 単純再帰構造による拡張2008

    • Author(s)
      牧野貴樹
    • Organizer
      人工知能学会第22回全国大会
    • Place of Presentation
      ときわ市民ホール(旭川市)
    • Year and Date
      2008-10-29
    • Related Report
      2008 Annual Research Report
  • [Presentation] Simple Recurrent Temporal-Difference Networks2008

    • Author(s)
      Takaki Makino
    • Organizer
      第11回情報論的学習論ワークショップ
    • Place of Presentation
      仙台国際センター
    • Year and Date
      2008-10-29
    • Related Report
      2008 Annual Research Report
  • [Presentation] 自己観測原理 : 他者認知の数理的枠組2008

    • Author(s)
      牧野貴樹, 合原一幸
    • Organizer
      第22回人工知能学会全国大会
    • Place of Presentation
      旭川市
    • Year and Date
      2008-06-13
    • Related Report
      2010 Final Research Report
  • [Presentation] POMDP環境中でのTD-networkの自動獲得 : 単純再帰構造による拡張2008

    • Author(s)
      牧野貴樹
    • Organizer
      第22回人工知能学会全国大会
    • Place of Presentation
      旭川市
    • Year and Date
      2008-06-13
    • Related Report
      2010 Final Research Report
  • [Book] コミュニケーションの自己組織化. 国武豊喜 (監修), 自己組織化ハンドブック2009

    • Author(s)
      牧野貴樹
    • Publisher
      NTS出版
    • Related Report
      2010 Final Research Report
  • [Book] Employing delay and probability discounting frameworks for a neuroeconomic understanding of gambling behavior. In M. J. Esposito, editor, Psychology of Gambling2008

    • Author(s)
      Taiki Takahashi, Takaki Makino, Yu Ohmura, and Hiroki Fukui
    • Publisher
      Nova Science
    • Related Report
      2010 Final Research Report
  • [Remarks]

    • URL

      http://www.sat.t.u-tokyo.ac.jp/~mak/

    • Related Report
      2010 Final Research Report

URL: 

Published: 2008-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi