2009 Fiscal Year Final Research Report

On study of temporal difference method in decision process and its application

Research Project

Project/Area Number	19740060
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	General mathematics (including Probability theory/Statistical mathematics)
Research Institution	Kanagawa University (2008-2009) Yuge National College of Maritime Technology (2007)
Principal Investigator	HORIGUCHI Masayuki Kanagawa University, 工学部, 准教授 (90366401)
Project Period (FY)	2007 – 2009
Keywords	マルコフ決定過程 / 計画数学 / 適応政策 / 学習理論 / マルコフ集合連鎖
Research Abstract	In decision process with uncertainty, the optimal solutions are constructed by using dynamic programming(DP) algorithms. In order to solve practical problems involving very large state space, we need to decrease the amount of computation necessary for learning algorithm since DP algorithm cannot be applied directly. Based on using the data of state-action process, the value function is estimated by learning algorithm. We consider temporal difference method of Neuro Dynamic Programming (Neuro-DP) and Bayesian interval estimation in Markov decision processes with unknown transition law. We derive algorithms of constructing optimal solution theoretically. We also treat numerical examples to show the validity of algorithms and improve corresponding algorithms.

Research Products
(16 results)

All 2010 2009 2008 2007 Other

All Journal Article (9 results) (of which Peer Reviewed: 3 results) Presentation (6 results) Remarks (1 results)

[Journal Article] 不確実性の下でのマルコフ決定過程に対する区間ベイズ手法2009
- Author(s)
  伊喜哲一郎、堀口正之、安田正實、蔵野正美
- Journal Title
  
  京都大学数理解析研究所講究録 1636
  
  Pages: 1-8
[Journal Article] ダイナミックプログラミングを用いたファジィメトリッククラスタリング(Fuzzy Metric Clustering Based on Dynamic Programming)2009
- Author(s)
  岩村覚三、堀口正之、堀池真琴
- Journal Title
  
  京都大学数理解析研究所講究録 1630
  
  Pages: 77-88
[Journal Article] A pattern-matrix learning algorithm for adaptive MDPs: The regularly communicating case2008
- Author(s)
  伊喜哲一郎、堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録 1589
  
  Pages: 110-119
[Journal Article] 区間ベイズ推定による適応型品質管理2008
- Author(s)
  佐々木稔、堀口正之、蔵野正美
- Journal Title
  
  京都大学数理解析研究所講究録 1589
  
  Pages: 120-129
[Journal Article] マルコフ決定過程における適応型アルゴリズム(Adaptive Algorithms for Markov Decision Processes)2008
- Author(s)
  堀口正之
- Journal Title
  
  神奈川大学工学研究所所報
  
  Pages: 22-29
[Journal Article] A structured pattern matrix algorithm for multichain Markov decision processes2007
- Author(s)
  T. Iki, M. Horiguchi, M. Kurano
- Journal Title
  
  Mathematical Methods of Operations Research 66
  
  Pages: 545-555
- Peer Reviewed
[Journal Article] A learning algorithm for communicating Markov decision processes with unknown transition matrices2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  Bulletin of Information and Cybernetics 39
  
  Pages: 11-24
- Peer Reviewed
[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dynamic Programming2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  4th International conference on Proceedings of Modeling Decisions for Artificial Intelligence (MDAI)(Vicenc Torra, Yasuo Narukawa, Yuji Yoshida (Eds. )) (CD-ROM Proceedings)
  
  Pages: 112-122
- Peer Reviewed
[Journal Article] マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)2007
- Author(s)
  堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録 1559
  
  Pages: 34-49
[Presentation] Uncertain Markov decision processes and Bayesian intervals2010
- Author(s)
  堀口正之
- Organizer
  日本数学会2010年度年会統計数学分科会
- Place of Presentation
  慶應義塾大学
- Year and Date
  2010-03-26
[Presentation] On bounds for Bayes estimate intervals in uncertain MDPs2009
- Author(s)
  堀口正之
- Organizer
  日本数学会2009年度秋季総合分科会
- Place of Presentation
  大阪大学
- Year and Date
  2009-09-27
[Presentation] Bayesian approach to uncertain MDPs with intervals of prior measures2009
- Author(s)
  堀口正之
- Organizer
  日本数学会2009年度年会統計数学分科会
- Place of Presentation
  東京大学
- Year and Date
  2009-03-27
[Presentation] Adaptive algorithm for MDPs using pattern matrix learning method2008
- Author(s)
  堀口正之
- Organizer
  日本数学会2008年度秋季総合分科会統計数学分科会
- Place of Presentation
  東京工業大学
- Year and Date
  2008-09-27
[Presentation] 未知の推移法則を持つマルコフ決定過程における学習アルゴリズムについて2007
- Author(s)
  堀口正之
- Organizer
  日本数学会第117回九州支部例会
- Place of Presentation
  宮崎大学
- Year and Date
  2007-10-13
[Presentation] Adaptive Markov decision processes based on temporal difference method2007
- Author(s)
  堀口正之
- Organizer
  日本数学会2007年度秋季総合分科会統計数学分科会
- Place of Presentation
  東北大学
- Year and Date
  2007-09-24
[Remarks]
- URL
  http://www.math.kanagawa-u.ac.jp/~horiguchi

2009 Fiscal Year Final Research Report

On study of temporal difference method in decision process and its application

Principal Investigator

HORIGUCHI Masayuki Kanagawa University, 工学部, 准教授 (90366401)

Research Products

[Journal Article] 不確実性の下でのマルコフ決定過程に対する区間ベイズ手法2009

Author(s)

Journal Title

[Journal Article] ダイナミックプログラミングを用いたファジィメトリッククラスタリング(Fuzzy Metric Clustering Based on Dynamic Programming)2009

Author(s)

Journal Title

[Journal Article] A pattern-matrix learning algorithm for adaptive MDPs: The regularly communicating case2008

Author(s)

Journal Title

[Journal Article] 区間ベイズ推定による適応型品質管理2008

Author(s)

Journal Title

[Journal Article] マルコフ決定過程における適応型アルゴリズム(Adaptive Algorithms for Markov Decision Processes)2008

Author(s)

Journal Title

[Journal Article] A structured pattern matrix algorithm for multichain Markov decision processes2007

Author(s)

Journal Title

[Journal Article] A learning algorithm for communicating Markov decision processes with unknown transition matrices2007

Author(s)

Journal Title

[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dynamic Programming2007

Author(s)

Journal Title

[Journal Article] マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)2007

Author(s)

Journal Title

[Presentation] Uncertain Markov decision processes and Bayesian intervals2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] On bounds for Bayes estimate intervals in uncertain MDPs2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Bayesian approach to uncertain MDPs with intervals of prior measures2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Adaptive algorithm for MDPs using pattern matrix learning method2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 未知の推移法則を持つマルコフ決定過程における学習アルゴリズムについて2007

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Adaptive Markov decision processes based on temporal difference method2007

Author(s)

Organizer

Place of Presentation

Year and Date

[Remarks]

URL