2009 Fiscal Year Final Research Report
On study of temporal difference method in decision process and its application
Project/Area Number |
19740060
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Single-year Grants |
Research Field |
General mathematics (including Probability theory/Statistical mathematics)
|
Research Institution | Kanagawa University (2008-2009) Yuge National College of Maritime Technology (2007) |
Principal Investigator |
HORIGUCHI Masayuki Kanagawa University, 工学部, 准教授 (90366401)
|
Project Period (FY) |
2007 – 2009
|
Keywords | マルコフ決定過程 / 計画数学 / 適応政策 / 学習理論 / マルコフ集合連鎖 |
Research Abstract |
In decision process with uncertainty, the optimal solutions are constructed by using dynamic programming(DP) algorithms. In order to solve practical problems involving very large state space, we need to decrease the amount of computation necessary for learning algorithm since DP algorithm cannot be applied directly. Based on using the data of state-action process, the value function is estimated by learning algorithm. We consider temporal difference method of Neuro Dynamic Programming (Neuro-DP) and Bayesian interval estimation in Markov decision processes with unknown transition law. We derive algorithms of constructing optimal solution theoretically. We also treat numerical examples to show the validity of algorithms and improve corresponding algorithms.
|