On study of temporal difference method in decision process and its application

Research Project

Project/Area Number	19740060
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	General mathematics (including Probability theory/Statistical mathematics)
Research Institution	Kanagawa University (2008-2009) Yuge National College of Maritime Technology (2007)
Principal Investigator	HORIGUCHI Masayuki Kanagawa University, 工学部, 准教授 (90366401)
Project Period (FY)	2007 – 2009
Project Status	Completed (Fiscal Year 2009)
Budget Amount *help	¥2,760,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥360,000) Fiscal Year 2009: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2008: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2007: ¥1,200,000 (Direct Cost: ¥1,200,000)
Keywords	マルコフ決定過程 / 計画数学 / 適応政策 / 学習理論 / マルコフ集合連鎖 / 区間ベイズ推定法 / 確信区間
Research Abstract	In decision process with uncertainty, the optimal solutions are constructed by using dynamic programming(DP) algorithms. In order to solve practical problems involving very large state space, we need to decrease the amount of computation necessary for learning algorithm since DP algorithm cannot be applied directly. Based on using the data of state-action process, the value function is estimated by learning algorithm. We consider temporal difference method of Neuro Dynamic Programming (Neuro-DP) and Bayesian interval estimation in Markov decision processes with unknown transition law. We derive algorithms of constructing optimal solution theoretically. We also treat numerical examples to show the validity of algorithms and improve corresponding algorithms.

Report

(4 results)

2009 Annual Research Report Final Research Report ( PDF )
2008 Annual Research Report
2007 Annual Research Report

Research Products
(28 results)

All 2010 2009 2008 2007 Other

All Journal Article (17 results) (of which Peer Reviewed: 6 results) Presentation (10 results) Remarks (1 results)

[Journal Article] 不確実性の下でのマルコフ決定過程に対する区間ベイズ手法2009
- Author(s)
  伊喜哲一郎、堀口正之、安田正實、蔵野正美
- Journal Title
  
  京都大学数理解析研究所講究録 1636
  
  Pages: 1-8
- Related Report
  2009 Annual Research Report 2009 Final Research Report
[Journal Article] ダイナミックプログラミングを用いたファジィメトリッククラスタリング(Fuzzy Metric Clustering Based on Dynamic Programming)2009
- Author(s)
  岩村覚三、堀口正之、堀池真琴
- Journal Title
  
  京都大学数理解析研究所講究録 1630
  
  Pages: 77-88
- Related Report
  2009 Final Research Report
[Journal Article] ダイナミックプログラミングを用いたファジィメトリッククラスタリング (Fuzzy Metric Clustering Based on Dynamic Programming)2009
- Author(s)
  岩村覚三、堀口正之、堀池真琴
- Journal Title
  
  京都大学数理解析研究所講究録1630「非加法性の数理と情報 : 非加法性と凸解析」 1630
  
  Pages: 77-88
- Related Report
  2008 Annual Research Report
[Journal Article] A pattern-matrix learning algorithm for adaptive MDPs: The regularly communicating case2008
- Author(s)
  伊喜哲一郎、堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録 1589
  
  Pages: 110-119
- Related Report
  2009 Final Research Report
[Journal Article] 区間ベイズ推定による適応型品質管理2008
- Author(s)
  佐々木稔、堀口正之、蔵野正美
- Journal Title
  
  京都大学数理解析研究所講究録 1589
  
  Pages: 120-129
- Related Report
  2009 Final Research Report
[Journal Article] マルコフ決定過程における適応型アルゴリズム(Adaptive Algorithms for Markov Decision Processes)2008
- Author(s)
  堀口正之
- Journal Title
  
  神奈川大学工学研究所所報
  
  Pages: 22-29
- Related Report
  2009 Final Research Report
[Journal Article] A pattern-matrix learning algorithm for adaptive MDPs : The regularly communicating case2008
- Author(s)
  伊喜哲一郎、堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録1589「不確実な状況における意思決定の理論と応用」 1589
  
  Pages: 110-119
- Related Report
  2008 Annual Research Report
[Journal Article] 区間ベイズ推定による適応型品質管理2008
- Author(s)
  佐々木稔、堀口正之、蔵野正美
- Journal Title
  
  京都大学数理解析研究所講究録1589「不確実な状況における意思決定の理論と応用」 1589
  
  Pages: 120-129
- Related Report
  2008 Annual Research Report
[Journal Article] マルコフ決定過程における適応型アルゴリズム (Adaptive Algohthms for Markov Decision Processes)2008
- Author(s)
  堀口正之
- Journal Title
  
  神奈川大学工学研究所所報 31
  
  Pages: 22-29
- Related Report
  2008 Annual Research Report
[Journal Article] A structured pattern matrix algorithm for multichain Markov decision processes2007
- Author(s)
  T. Iki, M. Horiguchi, M. Kurano
- Journal Title
  
  Mathematical Methods of Operations Research 66
  
  Pages: 545-555
- Related Report
  2009 Final Research Report
- Peer Reviewed
[Journal Article] A learning algorithm for communicating Markov decision processes with unknown transition matrices2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  Bulletin of Information and Cybernetics 39
  
  Pages: 11-24
- NAID
  120001944229
- Related Report
  2009 Final Research Report
- Peer Reviewed
[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dynamic Programming2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  4th International conference on Proceedings of Modeling Decisions for Artificial Intelligence (MDAI)(Vicenc Torra, Yasuo Narukawa, Yuji Yoshida (Eds. )) (CD-ROM Proceedings)
  
  Pages: 112-122
- Related Report
  2009 Final Research Report
- Peer Reviewed
[Journal Article] マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)2007
- Author(s)
  堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録 1559
  
  Pages: 34-49
- Related Report
  2009 Final Research Report
[Journal Article] "A structured pattern matrix algorithm for multichain Markov decision processes"2007
- Author(s)
  T. Iki, M. Horiguchi, M. Kurano.
- Journal Title
  
  Mathematical Methods of Operations Research 66
  
  Pages: 545-555
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] "A Iearning algorithm for communicating Markov decision processes with unknown transition matrices"2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  Bulletin of Information and Cybernetics 39
  
  Pages: 11-24
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dyriamic Programming.2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  In: 4th International conference on Proceedings of Modeling Decisions for Artificial Intelligence(MDAI)2007(CD-ROM Proceedings), Vicenc Torra, Yasuo Narukawa, Yuji Yoshida (Eds.), (CD-ROM)ISBN978-84-00-08359-1
  
  Pages: 112-122
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] "マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)"2007
- Author(s)
  堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録1559「最適化問題における確率モデルの展開と応用」 1559
  
  Pages: 34-49
- Related Report
  2007 Annual Research Report
[Presentation] Uncertain Markov decision processes and Bayesian intervals2010
- Author(s)
  堀口正之
- Organizer
  日本数学会2010年度年会統計数学分科会
- Place of Presentation
  慶應義塾大学
- Year and Date
  2010-03-26
- Related Report
  2009 Final Research Report
[Presentation] Uncertain Markov decision processes and Bayesian intervals2010
- Author(s)
  堀口正之
- Organizer
  日本数学会
- Place of Presentation
  慶應義塾大学矢上キャンパス
- Year and Date
  2010-03-26
- Related Report
  2009 Annual Research Report
[Presentation] On bounds for Bayes estimate intervals in uncertain MDPs2009
- Author(s)
  堀口正之
- Organizer
  日本数学会2009年度秋季総合分科会
- Place of Presentation
  大阪大学
- Year and Date
  2009-09-27
- Related Report
  2009 Final Research Report
[Presentation] On bounds for Bayes estimate intervals in uncertain MDPs2009
- Author(s)
  堀口正之、安田正實
- Organizer
  日本数学会
- Place of Presentation
  大阪大学豊中キャンパス
- Year and Date
  2009-09-27
- Related Report
  2009 Annual Research Report
[Presentation] Bayesian approach to uncertain MDPs with intervals of prior measures2009
- Author(s)
  堀口正之
- Organizer
  日本数学会2009年度年会統計数学分科会
- Place of Presentation
  東京大学
- Year and Date
  2009-03-27
- Related Report
  2009 Final Research Report 2008 Annual Research Report
[Presentation] Adaptive algorithm for MDPs using pattern matrix learning method2008
- Author(s)
  堀口正之
- Organizer
  日本数学会2008年度秋季総合分科会統計数学分科会
- Place of Presentation
  東京工業大学
- Year and Date
  2008-09-27
- Related Report
  2009 Final Research Report 2008 Annual Research Report
[Presentation] 未知の推移法則を持つマルコフ決定過程における学習アルゴリズムについて2007
- Author(s)
  堀口正之
- Organizer
  日本数学会第117回九州支部例会
- Place of Presentation
  宮崎大学
- Year and Date
  2007-10-13
- Related Report
  2009 Final Research Report
[Presentation] "未知の推移法則を持つマルコフ決定過程における学習アルゴリズムについて"2007
- Author(s)
  発表者:堀口正之、共同研究者:伊喜哲一郎
- Organizer
  日本数学会第117回九州支部例会
- Place of Presentation
  宮崎大学
- Year and Date
  2007-10-13
- Related Report
  2007 Annual Research Report
[Presentation] Adaptive Markov decision processes based on temporal difference method2007
- Author(s)
  堀口正之
- Organizer
  日本数学会2007年度秋季総合分科会統計数学分科会
- Place of Presentation
  東北大学
- Year and Date
  2007-09-24
- Related Report
  2009 Final Research Report
[Presentation] "Adaptive Markov decision processes based on temporal difference method"2007
- Author(s)
  発表者:堀口正之、共同研究者:伊喜哲一郎、蔵野正美、安田正實
- Organizer
  日本数学会2007年度秋季総合分科会統計数学分科会
- Place of Presentation
  東北大学
- Year and Date
  2007-09-24
- Related Report
  2007 Annual Research Report
[Remarks]
- URL
  http://www.math.kanagawa-u.ac.jp/~horiguchi
- Related Report
  2009 Final Research Report

On study of temporal difference method in decision process and its application

Principal Investigator

HORIGUCHI Masayuki Kanagawa University, 工学部, 准教授 (90366401)

¥2,760,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥360,000)

Report

Research Products

[Journal Article] 不確実性の下でのマルコフ決定過程に対する区間ベイズ手法2009

Author(s)

Journal Title

Related Report

[Journal Article] ダイナミックプログラミングを用いたファジィメトリッククラスタリング(Fuzzy Metric Clustering Based on Dynamic Programming)2009

Author(s)

Journal Title

Related Report

[Journal Article] ダイナミックプログラミングを用いたファジィメトリッククラスタリング (Fuzzy Metric Clustering Based on Dynamic Programming)2009

Author(s)

Journal Title

Related Report

[Journal Article] A pattern-matrix learning algorithm for adaptive MDPs: The regularly communicating case2008

Author(s)

Journal Title

Related Report

[Journal Article] 区間ベイズ推定による適応型品質管理2008

Author(s)

Journal Title

Related Report

[Journal Article] マルコフ決定過程における適応型アルゴリズム(Adaptive Algorithms for Markov Decision Processes)2008

Author(s)

Journal Title

Related Report

[Journal Article] A pattern-matrix learning algorithm for adaptive MDPs : The regularly communicating case2008

Author(s)

Journal Title

Related Report

[Journal Article] 区間ベイズ推定による適応型品質管理2008

Author(s)

Journal Title

Related Report

[Journal Article] マルコフ決定過程における適応型アルゴリズム (Adaptive Algohthms for Markov Decision Processes)2008

Author(s)

Journal Title

Related Report

[Journal Article] A structured pattern matrix algorithm for multichain Markov decision processes2007

Author(s)

Journal Title

Related Report

[Journal Article] A learning algorithm for communicating Markov decision processes with unknown transition matrices2007

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dynamic Programming2007

Author(s)

Journal Title

Related Report

[Journal Article] マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)2007

Author(s)

Journal Title

Related Report

[Journal Article] "A structured pattern matrix algorithm for multichain Markov decision processes"2007

Author(s)

Journal Title

Related Report

[Journal Article] "A Iearning algorithm for communicating Markov decision processes with unknown transition matrices"2007

Author(s)

Journal Title

Related Report

[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dyriamic Programming.2007

Author(s)

Journal Title

Related Report

[Journal Article] "マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)"2007

Author(s)

Journal Title

Related Report

[Presentation] Uncertain Markov decision processes and Bayesian intervals2010

Author(s)

Organizer

Place of Presentation

Year and Date