2000 Fiscal Year Final Research Report Summary

Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

Research Project

Project/Area Number	11650441
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Control engineering
Research Institution	Tohoku University
Principal Investigator	ABE Kenichi Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)
Co-Investigator(Kenkyū-buntansha)	HONMA Noriyasu Tohoku University, College of Medical Sciences, Associate Professor, 医療技術短期大学部, 助教授 (30282023)
Project Period (FY)	1999 – 2000
Keywords	Hidden MARKOV / Reinforcement Learning / Q-Learning / Labeling Q-learning / Learning Automaton / Swiching Q-Learning / Hierarchical Q-Learning
Research Abstract	Recent research on reinforcement learning (RL) algorithms has concentrated on partially observable Markov decision problems (POMDPs). A possible solution to POMDPs is to use history information to estimate state. Q values must be updated in the form reflecting past history of observation/action pairs. In this study, we developed two methods of reinforcement learning, which can solve certain types of POMDPs. The results are summarized as follows : (1) As a result of last Grant-in-Aid for Scientific Research (C)(2), we proposed Labeling Q-learning (LQ-learning), which has a new memory architecture of handling past history. In this study, we established a general framework of the LQ-learning. Various algorithms in this framework were devised, and we gave comparative study between these through simulation. The above LQ-learning, however, has the drawback that we must predefine the labeling mechanism. To overcome this drawback, we further devised a SOM (self-organizing feature map) approach of labeling, in which past history of observation/action pairs are partitioned into classes. The SOM has one-dimensional structure and the output nodes of the SOM produce labels. (2) We proposed a new type of hierarchical RL, called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. SQ-learning uses a hierarchical system of learning automata for switching module. The simulation results demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden. It is a future work to build a unified view by which LQ-learning and SQ-learning can be dealt with systematically.

Research Products
(28 results)

All Other

All Publications (28 results)

[Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).
- Description
  「研究成果報告書概要(欧文)」より

2000 Fiscal Year Final Research Report Summary

Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

Principal Investigator

ABE Kenichi Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

Research Products

[Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

Description

[Publications] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

Description

[Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

Description

[Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)

Description

[Publications] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

Description

[Publications] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

Description

[Publications] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

Description

[Publications] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

Description

[Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)

Description

[Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

Description

[Publications] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

Description

[Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

Description

[Publications] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

Description

[Publications] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)

Description

[Publications] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

Description

[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

Description

[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

Description

[Publications] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)

Description

[Publications] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

Description

[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

Description

[Publications] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

Description

[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)

Description

[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)

Description

[Publications] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

Description

[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

Description

[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

Description

[Publications] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

Description

[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).

Description