On study of temporal difference method in decision process and its application
Project/Area Number |
19740060
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Single-year Grants |
Research Field |
General mathematics (including Probability theory/Statistical mathematics)
|
Research Institution | Kanagawa University (2008-2009) Yuge National College of Maritime Technology (2007) |
Principal Investigator |
HORIGUCHI Masayuki Kanagawa University, 工学部, 准教授 (90366401)
|
Project Period (FY) |
2007 – 2009
|
Project Status |
Completed (Fiscal Year 2009)
|
Budget Amount *help |
¥2,760,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥360,000)
Fiscal Year 2009: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2008: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2007: ¥1,200,000 (Direct Cost: ¥1,200,000)
|
Keywords | マルコフ決定過程 / 計画数学 / 適応政策 / 学習理論 / マルコフ集合連鎖 / 区間ベイズ推定法 / 確信区間 |
Research Abstract |
In decision process with uncertainty, the optimal solutions are constructed by using dynamic programming(DP) algorithms. In order to solve practical problems involving very large state space, we need to decrease the amount of computation necessary for learning algorithm since DP algorithm cannot be applied directly. Based on using the data of state-action process, the value function is estimated by learning algorithm. We consider temporal difference method of Neuro Dynamic Programming (Neuro-DP) and Bayesian interval estimation in Markov decision processes with unknown transition law. We derive algorithms of constructing optimal solution theoretically. We also treat numerical examples to show the validity of algorithms and improve corresponding algorithms.
|
Report
(4 results)
Research Products
(28 results)