Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

Research Project

Project/Area Number	11650441
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Control engineering
Research Institution	Tohoku University
Principal Investigator	ABE Kenichi Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)
Co-Investigator(Kenkyū-buntansha)	HONMA Noriyasu Tohoku University, College of Medical Sciences, Associate Professor, 医療技術短期大学部, 助教授 (30282023)
Project Period (FY)	1999 – 2000
Project Status	Completed (Fiscal Year 2000)
Budget Amount *help	¥3,500,000 (Direct Cost: ¥3,500,000) Fiscal Year 2000: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 1999: ¥2,700,000 (Direct Cost: ¥2,700,000)
Keywords	Hidden MARKOV / Reinforcement Learning / Q-Learning / Labeling Q-learning / Learning Automaton / Swiching Q-Learning / Hierarchical Q-Learning / スイッチングQ-学習
Research Abstract	Recent research on reinforcement learning (RL) algorithms has concentrated on partially observable Markov decision problems (POMDPs). A possible solution to POMDPs is to use history information to estimate state. Q values must be updated in the form reflecting past history of observation/action pairs. In this study, we developed two methods of reinforcement learning, which can solve certain types of POMDPs. The results are summarized as follows : (1) As a result of last Grant-in-Aid for Scientific Research (C)(2), we proposed Labeling Q-learning (LQ-learning), which has a new memory architecture of handling past history. In this study, we established a general framework of the LQ-learning. Various algorithms in this framework were devised, and we gave comparative study between these through simulation. The above LQ-learning, however, has the drawback that we must predefine the labeling mechanism. To overcome this drawback, we further devised a SOM (self-organizing feature map) approach of labeling, in which past history of observation/action pairs are partitioned into classes. The SOM has one-dimensional structure and the output nodes of the SOM produce labels. (2) We proposed a new type of hierarchical RL, called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. SQ-learning uses a hierarchical system of learning automata for switching module. The simulation results demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden. It is a future work to build a unified view by which LQ-learning and SQ-learning can be dealt with systematically.

Report

(3 results)

2000 Annual Research Report Final Research Report Summary
1999 Annual Research Report

Research Products
(45 results)

All Other

All Publications (45 results)

[Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Masao Sakai: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. 281-284 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Haeyon Lee: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc.of Fifth Int.Symp.on Artificial Life and Robtics. 484-490 (2000)
- Related Report
  2000 Annual Research Report
[Publications] K.Sugawara: "Collective Behabior of Multi-agent System with Simple Interaction"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. 725-727 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Haeyon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 281-284 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Haeyon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robtics. Vol.1. 208-211 (2001)
- Related Report
  2000 Annual Research Report
[Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robtics. Vol.2. 478-481 (2001)
- Related Report
  2000 Annual Research Report
[Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"1999 IEEE International Conference on SMC. Vol. I. 484-489 (1999)
- Related Report
  1999 Annual Research Report
[Publications] 釜谷博行: "隠れマルコフ環境におけるスイッチングQ-学習"計測自動制御学会東北支部35周年記念講演会予稿集. 7-8 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Alireza Fatehi: "PLANT IDENTIFICATION BY SOM NEURAL NETWORKS"ECC'99. Time ID:BP-3. Paper ID:F190. (1999)
- Related Report
  1999 Annual Research Report
[Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- Related Report
  1999 Annual Research Report
[Publications] HaeYeon Lee: "Labeling Q-Learning For Non-Markovian Environments"1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- Related Report
  1999 Annual Research Report
[Publications] HaeYeon Lee: "Labeling Q-learning for partially observable markov decision process environments"AROB 5th '00. Vol.2. 281-284 (2000)
- Related Report
  1999 Annual Research Report
[Publications] Masao Sakai: "Complexity control method by a stochastic analysis for recurrent neural networks"AROB 5th '00. Vol.1. 484-487 (2000)
- Related Report
  1999 Annual Research Report
[Publications] Ikuo Yoshihara: "Extending prediction term of GP-based time series model"AROB 5th '00. Vol.1. 268-271 (2000)
- Related Report
  1999 Annual Research Report

Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

Principal Investigator

ABE Kenichi Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

¥3,500,000 (Direct Cost: ¥3,500,000)

Report

Research Products

[Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

Description

Related Report

[Publications] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

Description

Related Report

[Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

Description

Related Report

[Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)

Description

Related Report

[Publications] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

Description

Related Report

[Publications] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

Description

Related Report

[Publications] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

Description

Related Report

[Publications] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

Description

Related Report

[Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)

Description

Related Report

[Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

Description

Related Report

[Publications] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

Description

Related Report

[Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

Description

Related Report

[Publications] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

Description

Related Report

[Publications] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)

Description

Related Report

[Publications] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

Description

Related Report

[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

Description

Related Report

[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

Description

Related Report

[Publications] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)

Description

Related Report

[Publications] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

Description

Related Report

[Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

Description

Related Report

[Publications] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

Description

Related Report

[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)

Description

Related Report

[Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)

Description

Related Report

[Publications] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

Description

Related Report

[Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

Description