2007 Fiscal Year Annual Research Report

意思決定過程における時間差分制御の研究とその応用

Research Project

Project/Area Number	19740060
Research Institution	Yuge National College of Maritime Technology
Principal Investigator	堀口正之 Yuge National College of Maritime Technology, 総合教育科, 准教授 (90366401)
Keywords	マルコフ決定過程 / 計画数学 / 適応政策 / 学習理論
Research Abstract	本年度は、有限個の状態数を持つ推移法則未知のマルコフ決定過程において、平均期待利得を最大化する評価関数のもとでの時間差分法(Temporal Difference Method)による最適な適応政策の存在と学習アルゴリズムの研究を行った。具体的には、推移法則の集合族について、 1.すべての状態間に互いに1期間で推移できる正の確率を持つ場合 2.状態集合の、ある部分集合に属する任意の2つの状態間に互いに到達可能な道(path)のできる決定があり(communicaing class)、それ以外の補集合の状態はすべて過渡的状態(transient class)である場合について考察した。1では、各期間での評価関数について、推移法則の推定に履歴による最尤推定を用いながら、時間差分による適応型の決定の取り方として修正greedy policyを導入して適応政策の最適性を明らかにした。2の場合では、先行研究で得ているマルコフ連鎖の推移状況から推測される状態集合の構造を学習するアルゴリズムを適用し、さらに割引き利得最適化問題からの近似理論とgreedy policyを取る学習アルゴリズムにより、最適な適応政策が構成できることを明らかにした。また、その学習アルゴリズムの数値シミュレーションも行い、アルゴリズムの有効性を明らかにした。本研究成果によって、不完全な情報をもつ2つの意思決定モデルでの適応型最適政策の構成方法とその有効性を明らかにした。

Research Products
(6 results)

All 2007

All Journal Article (4 results) (of which Peer Reviewed: 3 results) Presentation (2 results)

[Journal Article] "A structured pattern matrix algorithm for multichain Markov decision processes"2007
- Author(s)
  T. Iki, M. Horiguchi, M. Kurano.
- Journal Title
  
  Mathematical Methods of Operations Research 66
  
  Pages: 545-555
- Peer Reviewed
[Journal Article] "A Iearning algorithm for communicating Markov decision processes with unknown transition matrices"2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  Bulletin of Information and Cybernetics 39
  
  Pages: 11-24
- Peer Reviewed
[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dyriamic Programming.2007
- Author(s)
  T. Iki, M. Horiguchi, M. Yasuda, M. Kurano
- Journal Title
  
  In: 4th International conference on Proceedings of Modeling Decisions for Artificial Intelligence(MDAI)2007(CD-ROM Proceedings), Vicenc Torra, Yasuo Narukawa, Yuji Yoshida (Eds.), (CD-ROM)ISBN978-84-00-08359-1
  
  Pages: 112-122
- Peer Reviewed
[Journal Article] "マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)"2007
- Author(s)
  堀口正之、蔵野正美、安田正實
- Journal Title
  
  京都大学数理解析研究所講究録1559「最適化問題における確率モデルの展開と応用」 1559
  
  Pages: 34-49
[Presentation] "未知の推移法則を持つマルコフ決定過程における学習アルゴリズムについて"2007
- Author(s)
  発表者:堀口正之、共同研究者:伊喜哲一郎
- Organizer
  日本数学会第117回九州支部例会
- Place of Presentation
  宮崎大学
- Year and Date
  2007-10-13
[Presentation] "Adaptive Markov decision processes based on temporal difference method"2007
- Author(s)
  発表者:堀口正之、共同研究者:伊喜哲一郎、蔵野正美、安田正實
- Organizer
  日本数学会2007年度秋季総合分科会統計数学分科会
- Place of Presentation
  東北大学
- Year and Date
  2007-09-24

2007 Fiscal Year Annual Research Report

意思決定過程における時間差分制御の研究とその応用

Principal Investigator

堀口 正之 Yuge National College of Maritime Technology, 総合教育科, 准教授 (90366401)

Research Products

[Journal Article] "A structured pattern matrix algorithm for multichain Markov decision processes"2007

Author(s)

Journal Title

[Journal Article] "A Iearning algorithm for communicating Markov decision processes with unknown transition matrices"2007

Author(s)

Journal Title

[Journal Article] Temporal Difference-Based Adaptive Policies in Neuro Dyriamic Programming.2007

Author(s)

Journal Title

[Journal Article] "マルコフ決定過程におけるTD法による学習アルゴリズムについて(A learning algorithm of TD method for Markov decision processes)"2007

Author(s)

Journal Title

[Presentation] "未知の推移法則を持つマルコフ決定過程における学習アルゴリズムについて"2007

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] "Adaptive Markov decision processes based on temporal difference method"2007

Author(s)

Organizer

Place of Presentation

Year and Date

堀口正之 Yuge National College of Maritime Technology, 総合教育科, 准教授 (90366401)