• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2000 Fiscal Year Final Research Report Summary

Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

Research Project

Project/Area Number 11650441
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Control engineering
Research InstitutionTohoku University

Principal Investigator

ABE Kenichi  Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

Co-Investigator(Kenkyū-buntansha) HONMA Noriyasu  Tohoku University, College of Medical Sciences, Associate Professor, 医療技術短期大学部, 助教授 (30282023)
Project Period (FY) 1999 – 2000
KeywordsHidden MARKOV / Reinforcement Learning / Q-Learning / Labeling Q-learning / Learning Automaton / Swiching Q-Learning / Hierarchical Q-Learning
Research Abstract

Recent research on reinforcement learning (RL) algorithms has concentrated on partially observable Markov decision problems (POMDPs). A possible solution to POMDPs is to use history information to estimate state. Q values must be updated in the form reflecting past history of observation/action pairs. In this study, we developed two methods of reinforcement learning, which can solve certain types of POMDPs. The results are summarized as follows :
(1) As a result of last Grant-in-Aid for Scientific Research (C)(2), we proposed Labeling Q-learning (LQ-learning), which has a new memory architecture of handling past history. In this study, we established a general framework of the LQ-learning. Various algorithms in this framework were devised, and we gave comparative study between these through simulation. The above LQ-learning, however, has the drawback that we must predefine the labeling mechanism. To overcome this drawback, we further devised a SOM (self-organizing feature map) approach of labeling, in which past history of observation/action pairs are partitioned into classes. The SOM has one-dimensional structure and the output nodes of the SOM produce labels.
(2) We proposed a new type of hierarchical RL, called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. SQ-learning uses a hierarchical system of learning automata for switching module. The simulation results demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden.
It is a future work to build a unified view by which LQ-learning and SQ-learning can be dealt with systematically.

  • Research Products

    (28 results)

All Other

All Publications (28 results)

  • [Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2002-03-26  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi