Studies on Learning Algorithms for Flexibly Structured Decision Process Models
Project/Area Number |
18540111
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
General mathematics (including Probability theory/Statistical mathematics)
|
Research Institution | Chiba University |
Principal Investigator |
KURANO Masami Chiba University, Faculty of Education, Professor (70029487)
|
Co-Investigator(Kenkyū-buntansha) |
YASUDA Masami Chiba University, Faculty of Science, Professor (00041244)
NAKAGAMI Jun-ichi Chiba University, Faculty of Science, Professor (30092076)
KADOTA Yoshinobu Wakayama University, Faculty of Education, Professor (90116294)
YOSHIDA Yuji University of Kitakyushu, Faculty of Economics and Business Administration, Professor (90192426)
IWAMURA Kakuzo Josai University, Faculty of Science, Lecturer (00077918)
|
Project Period (FY) |
2006 – 2007
|
Project Status |
Completed (Fiscal Year 2007)
|
Budget Amount *help |
¥2,930,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥330,000)
Fiscal Year 2007: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2006: ¥1,500,000 (Direct Cost: ¥1,500,000)
|
Keywords | Flexibly structured model / Markov decision process / learning algorithm / Fuzzy model / Reinforcement learning / Adaptive policy / Credibilistic process / Genetic algorithm / マルコフ決定モデル / ニューロ動的計画法 / 最適方程式 |
Research Abstract |
In this project, our objective is to establish the adaptive and reinforcement learning algorithms for uncertain decision processes with the more flexible and soft structure. The main research results are as follows. 1. Further studies on construction and analysis of flexibly structured models (a) Investigating possibility and credibility of fuzziness and applying its extension theorem, we have succeeded in constructing credibilistic process from given conditional credibility measures, by which axiomatic development of decision processes under fuzzy environment will be made to be possible. (b) We have succeeded in deriving the flexible optimality equations for an absorbing semi-Markov game with general utility functions which determine the optimal strategies. c Concerning Bayesian analysis for a quality control problem, we have proposed the new control chart which has more flexible structure, grasping the unknown parameter by a priori interval of measures. The efficiency of the new one is shown by comparing with the usual one 2. Learning algorithms for adaptive Markov decision models (MDPs) We have developed a pattern-matrix learning algorithm for adaptive MDPs which learns the structure (pattern) of transition matrices from the observed data and using its information constructs the adaptive policy based on temporal difference (TD) method. This method can be essentially applicable to the multichain case. 3. Application of reinforcement learning methods (a) We have investigated the convergence of the TD or Actor Critic algorithms applicable to various models of neuron dynamic programming, finding its efficiency by numerical experiments. (b) In order to solve several Operational Research models under fuzzy environments, we have developed the Hybrid Intelligent algorithm integrating fuzzy simulation and genetic algorithm, whose efficiency is verified by a numerical examples.
|
Report
(3 results)
Research Products
(16 results)